Tops our benchmark in an unprecedented way. https://help.kagi.com/kagi/ai/llm-be...

aoeusnth1 · 2025-03-26T00:12:33 1742947953

It should be in the "reasoning" category, right? (still topping the charts there)

causal · 2025-03-26T15:53:29 1743004409

Remarkable how few tokens it needed to get a much better score than other reasoning models. Any chance of contamination?

85392_school · 2025-03-26T16:56:23 1743008183

It makes me wonder how the token counting was implemented and if it missed the (not sent in API) reasoning.

freediver · 2025-03-26T18:03:25 1743012205

Vaild concern, most likely thinking tokens were not counted due to API reporting changes.

utopcell · 2025-03-26T01:55:06 1742954106

That is some wide gap!