Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's actually quite fascinating if you watch it for 5 minutes. Some models are overall bad, but others nail it in one minute and butcher it in the next.

It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.



> model drift driven by just small, seemingly unimportant changes to the prompt

What changes to the prompt are you referring to?

According the comment on the site, the prompt is the following:

Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.

The prompt doesn't seem to change.


The time given to the model. So the difference between two generations is just somethng trivially different like: "12:35" vs 12:36"


presumably the time is replaced with the actual current time at each generation. I wonder if they are actually generated every minute or if all 6480 permutations (720 minutes in a day * 9 llms) were generated and just show on a schedule


It is really interesting to watch them for a while. QWEN keeps outputting some really abstract interpretations of a clock, KIMI is consistently very good, GPT5's results line up exactly with my experience with its code output (overly complex and never working correctly)


We can't know how much is about the prompt though and how much is just stochastic randomness in the behavior of that model on that prompt, right? I mean, even given identical prompts, even at temp 0, models don't always behave identically.... at least, as far as I know? Some of the reasons why are I think still a research question, but I think its a fact nonetheless.


Kimi seems the only reliable one which is a bit surprising, and GPT 4o is consistently better than GPT 5 which on the other hand is unfortunately not surprising at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: