Modern LLMs are trained by reinforcement learning where they try to solve a coding problem and receive a reward if it succeeds.
Data Processing Inequalities (from your link) aren't relevant: the model is learning from the reinforcement signal, not from human-written code.
reply
Modern LLMs are trained by reinforcement learning where they try to solve a coding problem and receive a reward if it succeeds.
Data Processing Inequalities (from your link) aren't relevant: the model is learning from the reinforcement signal, not from human-written code.