> and we don't know how it does it We know quite well how it does it. It's apply...

TeMPOraL · 2025-03-27T21:57:41 1743112661

That is not a useful explanation. "Applying extrapolation to its lossily compressed representation" is pretty much the definition of understanding something. The details and interpretation of the representation are what is interesting and unknown.

kazinator · 2025-03-28T01:44:50 1743126290

We can use data based on analyzing the frequency of ngrams in a text to generate sentences, and some of them will be pretty good, and fool a few people into believing that there is some solid language processing going on.

LLM AI is different in that it does produce helpful results, not only entertaining prose.

It is practical for users to day to replace most uses of web search with a query to a LLM.

The way the token prediction operates, it uncovers facts, and renders them into grammatically correct language.

Which is amazing given that, when the thing is generating a response that will be, say, 500 tokens long, when it has produced 200 of them, it has no idea what the remaining 300 will be. Yet it has committed to the 200; and often the whole thing will make sense when the remaining 300 arrive.

bradfox2 · 2025-03-28T02:23:13 1743128593

The research posted demonstrates the opposite of that within the scope of sequence lengths they studied. The model has future tokens strongly represented well in advance.