Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Is this based on analogising LLMs to animal mental capacities, or based on a scientific study of these capacities? ie., is this confirmation bias, or science?

- On how embeddings work;

- On the observation that in very high-dimensional space you can encode a lot of information in relative arrangement of things;

- On the observation that the end result (LLMs) are too good at talking and responding like people in nuanced way for this to be uncorrelated;

- On noticing similarities behind embeddings in high-dimensional spaces and what we arrive when we try to express what we mean by "concept", "understanding" and "meaning", or even how we learn languages and acquire knowledge - there's a strong undertone of defining things in terms of similarity to other things, which themselves are defined the same way (recursively). Naively, it sounds like infinite regress, but it's exactly what embeddings are about.

- On the observation that the goal function for language model training is, effectively, "produce output that makes sense to humans", in fully general meaning of that statement. Given constraints on size and compute, this is pressuring the model to develop structures that are at least functionally equivalent to our own thinking process; even if we're not there yet, we're definitely pushing the models in that direction.

- On the observation that most of the failure modes of LLMs also happen to humans, up to and including "hallucinations" - but they mostly happen at the "inner monologue" / "train of thought" level, and we do extra things (like explicit "system 2" reasoning, or tools) to fix them before we write, speak or act.

- And finally, on the fact that researchers have been dissecting and studying inner workings of LLMs, and managed to find direct evidence of them encoding concepts and using them in reasoning; see e.g. the couple major Anthropic studies, in which they demonstrated the ability to identify concrete concepts, follow their "activations" during inference process, and even control the inference outcome by actively suppressing or amplifying those activations; the results are basically what you'd expect if you believed the "concepts" inside LLMs were indeed concepts as we understand them.

- Plus a bunch of other related observations and introspections, including but not limited to paying close attention to how my own kids (currently 6yo, 4yo and 1.5yo) develop their cognitive skills, and what are their failure modes. I used to joke that GPT-4 is effectively a 4yo that memorized half the Internet, after I noticed that stories produced by LLMs of that time and those of my own kid follow eerily similar patterns, up to and including what happens when the beginning falls out of the context window. I estimated that at 4yo, my eldest daughter had a context window of about 30s long, and I could see it grow with each passing week :).

That's in a gist, what adds up to my current perspective on LLMs. Might not be hard science, but I find a lot of things pointing in the direction of us narrowing down on the core functionality that also exists in our brain (but not the whole thing, obviously) - and very little that would point otherwise.

(I actively worry that it might be my mental model is too "wishy washy" and lets me interpret anything in a way that fits it. So far, I haven't noticed any warning signs, but I did notice that none of the quirks or failure modes feel surprising.)

--

I'm not sure if I got your videogame analogy the way you intended, but FWIW, we also learn and experience lots of stuff indirectly; the whole point of language and communication is to transfer understanding this way - and a lot of information is embodied in the larger patterns and structures of what we say (or don't say) and how we say it. LLM training data is not random, it's highly correlated with human experience, so the information for general understanding of how we think and perceive the world is encoded there, implicitly, and at least in theory the training process will pick up on it.

--

I don't have a firm opinion on some of the specifics you mention, just couple general heuristic/insights that tell me it could be possible we narrowed down on the actual thing our own minds are doing:

1. We don't know what drives our own mental processes either. It might be we discover LLMs are "cheating", but we might also discover they're converging to the same mechanisms/structures our own minds use. I don't have any strong reason to assume the former over the latter, because we're not designing LLMs to cheat.

2. Human brains are evolved, not designed. They're also the dumbest possible design evolution could arrive at - we're the first to cross the threshold after which our knowledge-based technological evolution outpaced natural evolution by orders of magnitude. All we've achieved to date, we did with a brain that was the nature's first prototype that worked.

3. Given the way evolution works - small, random, greedy increments that have to be incrementally useful at every step - it stands to reason that whatever the fundamental workings of a mind are, they must not be that complicated, and they can be built up incrementally through greedy optimization. Humans are a living proof of that.

4. (most speculative) It's unlikely there are multiple alternative implementations of thinking minds that are very different from each other, yet all are equally easy to reach through random walk, and that evolution just picked one of those and run with it. It's more likely that, when we get to that point (we might already be there), we'll find the same computational design nature did. But even if not, diffing ours and nature's solution will tell us much about ourselves.



> On the observation that most of the failure modes of LLMs also happen to human

That's assuming that LLMs operate according to how we read their text. What you're doing is reading llm chain-of-thought as-if said by a human, and imparting the implied capacities that would be implied if a human said it. But this is almost certainly not how LLMs work.

LLMs are replaying "linguisitc behaivour" which we take, often accurately, to be dispositive of mental states in people. They are not evidence of mental capacities and states in LLMs, for seemingly obvious reasons. When a person says, "I am hungry" it is, in verdical cases, caused by their hunger. When an LLM says it the cause is something like, "responding appropriately, accoring to a history of appropriate use of such words, on the occasion of a prompt which would, in ordinary historical cases, give this response".

The reason an LLM generates a text prima fascie never involves any associated capacities which would have been required for that text to have been written in the first place. Overcoming this leap of logic requires vastly more than "it seems to me".

> On how embeddings work

The space of necessary capacities is no exhausted by "embedding", by which you mean a (weakly) continuous mapping of historical exemplars into a space. Eg., logical relationships, composition, recursion, etc. are not mental capacities which can be implemented this way.

> We don't know what drives our own mental processes either.

Sure we do. At the level of enumerating mental capacities, their operation and so on, we can give very exhaustive lists. We do not know how even the most basic of these is implemented biologically, save I believe, we can say quite a lot about how properties of complex biological systems generically enable this.

But we have a lot of extremely carefully designed experiments to show the existence of relevant capacities in other animals. None of these experiments can be used on an LLM, because by design, any experiment we would run would immediately reveal the facade: any measurement of the GPU running the LLM and its environmental behaviour shows a total empirical lack of anything which could be experimentally measured.

We are, by the charaltan's design, only supposed to use token-in/token-out as "measuremnt". But this isn't a valid measure, becuase LLMs are constructed on historical cases of linguistic behaviour in people. We know, prior to any experiment, that the one thing designed to be a false measure, is the lingustic behaviour of the LLM.

Its as if we have constructed a digital thermometer to always replay historical temperature readings -- we know, by design, that these "readings" are therefore never indicative of any actual capacity of the device to measure temperature.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: