Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Remarkable how few tokens it needed to get a much better score than other reasoning models. Any chance of contamination?


It makes me wonder how the token counting was implemented and if it missed the (not sent in API) reasoning.


Vaild concern, most likely thinking tokens were not counted due to API reporting changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: