Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just tried your trademark benchmark on the new 4o Image Output, though it's not the same test:

https://imgur.com/a/xuPn8Yq



And the same thing with gemini 2.0 flash native image output.

https://imgur.com/a/V4YAkX5

It's sort of irrelevant though as the test is about SVGs.


Was that an actual SVG?


No that's GPT-4o native image output.


I wonder how far away we are from models which, given this prompt, generate that image in the first step in their chain-of-thought and then use it as a reference to generate SVG code.

It could be useful for much more than just silly benchmarks, there's a reason why physics students are taught to draw a diagram before attempting a problem.


Someone managed to get ChatGPT to render the image using GPT-4o, then save that image to a Code Interpreter container and run Python code with OpenCV to trace the edges and produce an SVG: https://bsky.app/profile/btucker.net/post/3lla7extk5c2u


Does this match the rules of your test, or is it cheating? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: