Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've heard this a few times with Claude. I have no way to know for sure, but I'm guessing the problem is as simple as their reward model. Likely they trained it on generating code with tests and provided rewards when those tests pass.

It isn't hard to see why someone rewarded this way might want to game the system.

I'm sure humans would never do the same thing, of course. /s



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: