Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It will depend on exactly what you mean by 'planning ahead', but I think the fact that features which rhyme with a word appear before the model is trying to predict the word which needs to rhyme is good evidence the model is planning at least a little bit ahead: the model activations are not all just related to the next token.

(And I think it's relatively obvious that the models do this to some degree: it's very hard to write any language at all without 'thinking ahead' at least a little bit in some form, due to the way human language is structured. If models didn't do this and only considered the next token alone they would paint themselves into a corner within a single sentence. Early LLMs like GPT-2 were still pretty bad at this, they were plausible over short windows but there was no consistency to a longer piece of text. Whether this is some high-level abstracted 'train of thought', and how cohesive it is between different forms of it, is a different question. Indeed from the section of jailbreaking it looks like it's often caught out by conflicting goals from different areas of the network which aren't resolved in some logical fashion)



Modern transformer-based language models fundamentally lack structures and functions for "thinking ahead." And I don't believe that LLMs have emergently developed human-like thinking abilities. This phenomenon appears because language model performance has improved, and I see it as a reflection of future output token probabilities being incorporated into the probability distribution of the next token set in order to generate meaningful longer sentences. Humans have similar experiences. Everyone has experienced thinking about what to say next while speaking. However, in artificial intelligence language models, this phenomenon occurs mechanically and statistically. What I'm trying to say is that while this phenomenon may appear similar to human thought processes and mechanisms, I'm concerned about the potential anthropomorphic error of assuming machines have consciousness or thoughts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: