Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FTA: Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.

You can find it right next to the image you are talking about.





To be fair to OP, I just added this to our blog after their comment, in response to the correct criticisms that our text didn't make it clear how bad GPT-5.2's labels are.

LLMs have always been very subhuman at vision, and GPT-5.2 continues in this tradition, but it's still a big step up over GPT-5.1.

One way to get a sense of how bad LLMs are at vision is to watch them play Pokemon. E.g.,: https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...

They still very much struggle with basic vision tasks that adults, kids, and even animals can ace with little trouble.


'Commented after article was already edited in response to HN feedback' award



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: