> it commits to that word, without knowing what the next word is going to be
Sounds like you may not have read the article, because it's exploring exactly that relationship and how LLMs will often have a 'target word' in mind that it's working toward.
Further, that's partially the point of thinking models, allowing LLMs space to output tokens that it doesn't have to commit to in the final answer.
That makes no difference. At some point it decides that it has predicted the word, and outputs it, and then it will not backtrack over it. Internally it may have predicted some other words and backtracked over those. But the fact it is, accepts a word, without being sure what the next one will be and the one after that and so on.
Externally, it manifests the generation of words one by one, with lengthy computation in between.
It isn't ruminating over, say, a five word sequence and then outputting five words together at once when that is settled.
> It isn't ruminating over, say, a five word sequence and then outputting five words together at once when that is settled.
True, and it's a good intuition that some words are much more complicated to generate than others and obviously should require more computation than some other words. For example if the user asks a yes/no question, ideally the answer should start with "Yes" or with "No", followed by some justification. To compute this first token, it can only do a single forward pass and must decide the path to take.
But this is precisely why chain-of-thought was invented and later on "reasoning" models. These take it "step by step" and generate sort of stream of consciousness monologue where each word follows more smoothly from the previous ones, not as abruptly as immediately pinning down a Yes or a No.
LLMs are an extremely well researched space where armies of researchers, engineers, grad and undergrad students, enthusiasts and everyone in between has been coming up with all manners of ideas. It is highly unlikely that you can easily point to some obvious thing they missed.
Sounds like you may not have read the article, because it's exploring exactly that relationship and how LLMs will often have a 'target word' in mind that it's working toward.
Further, that's partially the point of thinking models, allowing LLMs space to output tokens that it doesn't have to commit to in the final answer.