Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is simply not true.

Modern LLMs are trained by reinforcement learning where they try to solve a coding problem and receive a reward if it succeeds.

Data Processing Inequalities (from your link) aren't relevant: the model is learning from the reinforcement signal, not from human-written code.





Ok, then we can leave the training data out of the input, everybody happy.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: