Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you believe another thread the benchmarks are comparing Gemini-3 (probably thinking) to GPT-5.1 without thinking.

The person also claims that with thinking on the gap narrows considerably.

We'll probably have 3rd party benchmarks in a couple of days.



This is easily shown that the numbers are for GPT 5.1 thinking high.

Just go to the leaderboard website and see for yourself: https://arcprize.org/leaderboard




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: