Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tops our benchmark in an unprecedented way.

https://help.kagi.com/kagi/ai/llm-benchmark.html

High quality, to the point. Bit on the slow side. Indeed a very strong model.

Google is back in the game big time.



It should be in the "reasoning" category, right? (still topping the charts there)


Remarkable how few tokens it needed to get a much better score than other reasoning models. Any chance of contamination?


It makes me wonder how the token counting was implemented and if it missed the (not sent in API) reasoning.


Vaild concern, most likely thinking tokens were not counted due to API reporting changes.


That is some wide gap!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: