Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

as soon as you publish a benchmark like this, it becomes worthless because it can be included in the training corpus


While I agree with you in principle give Claude 4 a try on something like: https://open.kattis.com/problems/low . I would expect this to have been included in the training material as well as solutions found on Github. I've tried providing the problem description and asking Claude Sonnet 4 to solve it and so far it hasn't been successful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: