Having a deprecated API just randomly return failures is an awful idea!
Better to give an actual timeline (future version & date) for when deprecated functionality / functions will be removed, and in the meantime, if the language supports it, mark those functions as deprecated (e.g. C++ [[deprecated]] attribute) so that developers see compilation warnings if they failed to read the release notes.
Yep. I'll admit I've acted faster to hard set dates than some "in the future" message. I've also seen some tools become really noisy about deprecation spanning many lines AND repeating. Please don't log the same message over and over for each instance. Color or add emoji if you must to grab attention, but once is enough. It's annoying when you can't do anything about it at that time and have to sift through this extra noise when hunting down another issue in the CI log. Add a link that goes over it in more detail and how to migrate for that specific deprecation.
Yes, but animal/human brains (cortex) appear to have evolved to be prediction machines, originally mostly predicting evolving sensory inputs (how external objects behave), and predicting real-world responses to the animal's actions.
Language seems to be taking advantage of this pre-existing predictive architecture, and would have again learnt by predicting sensory inputs (heard language), which as we have seen is enough to induce ability to generate it too.
Yes, but at least now we're comparing artificial to real neural networks, so the way it works at least has a chance of being similar.
I do think that a transformer, a somewhat generic hierarchical/parallel predictive architecture, learning from prediction failure, has to be at least somewhat similar to how we learn language, as opposed to a specialized Chompyskan "language organ".
The main difference is perhaps that the LLM is only predicting based on the preceding sequence, while our brain is driving language generation by a combination of sequence prediction and the thoughts being expressed. You can think of the thoughts being a bias to the language generation process, a bit like language being a bias to a diffusion based image generator.
What would be cool would be if we could to some "mechanistic interpretability" work on the brain's language generation circuits, and perhaps discover something similar to induction heads.
> Yes, but at least now we're comparing artificial to real neural networks, so the way it works at least has a chance of being similar.
Indeed, and I wasn't even saying it's wrong, it may be pretty close.
> What would be cool would be if we could to some "mechanistic interpretability" work on the brain's language generation circuits, and perhaps discover something similar to induction heads.
Yeah, I wouldn't be surprised. And maybe the more we find out about the brain, it could lead to some new insights about how to improve AI. So we'd sort of converge from both sides.
>Yes, but at least now we're comparing artificial to real neural networks
Given that the only similarity between the two of is just the "network" structure I'd say that point is pretty weak. The name "artificial neural network" it's just an historical artifact and an abstraction totally disconnected from the real thing.
Sure, but ANNs are at least connectionist, learning connections/strengths and representations, etc - close enough at that level of abstraction that I think ANNs can suggest how the brain may be learning certain things.
I think it depends on what they use it for. For fantasy stuff like cartoons, aliens and (not fantasy) dinosaurs it may be ok, and I guess they could train on old hand-animated cartoons to retain that charm (and cartoon tropes like running in place but not moving) if they wanted to. If they use it to generate photo-realistic humans then it's going to be uncanny valley and just feel fake.
It would be interesting to see best effort at an AI dinosaur walking - a sauropod using the trained motion of an elephant perhaps, which may well be more animal-like than CGI attempts to do the same.
You can count me as an AGI sceptic to extent that I don't think LLMs are the approach that are going to get us there, but I'm equally confident that we will get there, and that predictive neural nets are the core of the right approach.
The article is a bit rambling, but the main claims seem to be:
1) Computers can't emulate brains due to architecture (locality, caching, etc) and power consumption
2) GPUs are maxxing out in terms of performance (and implicitly AGI has to use GPUs)
3) Scaling is not enough, since due to 2) scaling is close to maxxing out
4) AGI won't happen because he defines AGI as requiring robotics, and seeing scaling of robotic experience as a limiting factor
5) Superintelligence (which he associates with self-improving AGI) won't happen because it'll again require more compute
It's a strange set of arguments, most of which don't hold up, and both manages to miss what is actually wrong with the current approach, and to conceive of what different approach will get us to AGI.
1) Brains don't have some exotic architecture than somehow gives them an advantage over computers in terms of locality, etc. The cortex is in fact basically a 2-D structure - a sheet of cortical columns, with a combination of local and long distance connections.
Where brains are different from a von-neumann architecture is that compute & memory are one and the same, but if we're comparing communication speed between different cortical areas, or TPU/etc chips, then the speed advantage goes to the computer.
2) Even if AGI had to use matmul and systolic arrays, and GPUs are maxxing out in terms of FLOPs, we could still scale compute, if needed, just by having more GPUs and faster and.or wider interconnect.
3) As above, it seems we can scale compute just by adding more GPUs and faster interconnect if needed, but in any case I don't think inability to scale is why AGI isn't about to emerge from LLMs.
4) Robotics and AGI are two separate things. A person lying in a hospital bed still has a brain and human-level AGI. Robots will eventually learn individually on-the-job, just as non-embodied AGI instances will, so size of pre-training datasets/experience will become irrelevant.
5) You need to define intelligence before supposing what super-human intelligence is and how it may come about, but Dettmers just talks about superintelligence in hand-wavy fashion as something that AGI may design, and assumes that whatever it is will require more compute than AGI. In reality intelligence is prediction and is limited in domain by your predictive inputs, and in quality/degree by the sophistication of your predictive algorithms, neither of which necessarily need more compute.
What is REALLY wrong with the GPT LLM approach, and why it can't just be scaled to achieve AGI, is that it is missing key architectural and algorithmic components (such as incremental learning, and a half dozen others), and perhaps more fundamentally that auto-regressive self-prediction is just the wrong approach. AGI needs to learn to act and predict the consequences of it's own actions - it needs to predict external inputs, not generative sequences.
> The process of breaking a complex problem down into the right primitives requires great understanding of the original problem in the first place.
Yes, but with experience that just becomes a matter of recognizing problem and design patterns. When you see a parsing problem, you know that the simplest/best design pattern is just to define a Token class representing the units of the language (keywords, operators, etc), write a NextToken() function to parse characters to tokens, then write a recursive descent parser using that.
Any language may have it's own gotchas and edge cases, but knowing that recursive descent is pretty much always going to be a viable design pattern (for any language you are likely to care about), you can tackle those when you come to them.
That's a good point - recursive descent as a general lesson in program design, in addition to being a good way to write a parser.
Table driven parsers (using yacc/etc) used to be emphasized in old compiler writing books such as Aho & Ullman's famous "dragon (front cover) book". I'm not sure why - maybe part efficiency for the slower computers of the day, and part because in the infancy of computing a more theoretical/algorithmic approach seemed more sophisticated and preferable (the cannonical table driven parser building algorithm was one of Knuth's algorithms).
Nowadays it seems that recursive descent is the preferred approach for compilers because it's ultimately more practical and flexible. Table driven can still be a good option for small DSLs and simple parsing tasks, but recursive descent is so easy that it's hard to justify anything else, and LLM code generation now makes that truer than ever!
There is a huge difference in complexity between building a full-blown commercial quality optimizing compiler and a toy one built as a learning exercise. Using something like LLVM as a starting point for a learning exercise doesn't seem very useful (unless your goal is to build real compilers) since it's doing all the heavy lifting for you.
I guess you can argue about how much can be cut out of a toy compiler for it still to be a useful learning exercise in both compilers and tackling complex problems, but I don't see any harm in going straight from parsing to code generation, cutting out AST building and of course any IR and optimization. The problems this direct approach causes for code generation, and optimization, can be a learning lesson for why a non-toy compiler uses those!
A fun approach I used at work once, wanting to support a pretty major C subset as the language supported by a programmable regression test tool, was even simpler ... Rather than having the recursive descent parser generate code, I just had it generate executable data structures - subclasses of Statement and Expression base classes, with virtual Execute() and Value() methods respectively, so that the parsed program could be run by calling program->Execute() on the top level object. The recursive descent functions just returned these statement or expression values directly. To give a flavor of it, the ForLoopStatement subclass held the initialization, test and increment expression class pointers, and then the ForLoopStatement::Execute() method could just call testExpression->Value() etc.
Rather than having the recursive descent parser generate code, I just had it generate executable data structures - subclasses of Statement and Expression base classes, with virtual Execute() and Value() methods respectively, so that the parsed program could be run by calling program->Execute() on the top level object.
Better to give an actual timeline (future version & date) for when deprecated functionality / functions will be removed, and in the meantime, if the language supports it, mark those functions as deprecated (e.g. C++ [[deprecated]] attribute) so that developers see compilation warnings if they failed to read the release notes.
reply