Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, it probably exists in the raw LLM and gets both significantly strengthened and has its range extended. Such that it dominates the model's behavior, making it several orders of magnitude more reliable in common usage. Kinda of like how "reasoning" exists in a weak, short range way in non-reasoning models. With RL that encourages reasoning, that machinery gets brought to the forefront and becomes more complex and capable.


So why did you feel the need to post that next-token prediction is not the reason this behavior emerge?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: