I had no trouble getting it to generate an image of a five-legged dog first try, but I really was surprised at how badly it failed in telling me the number of legs when I asked it in a new context, showing it that image. It wrote a long defense of its reasoning and when pressed, made up demonstrably false excuses of why it might be getting the wrong answer while still maintaining the wrong answer.
Well, I tried a variation of a prompt I was messing with in Flash 2.5 the other day in a thread about AI-coded analog clock faces. Gemini Pro 3 Preview gave me a result far beyond what I saw with Flash 2.5, and got it right in a single shot.[0] I can't say I'm not impressed, even though it's a pretty constrained example.
> Please generate an analog clock widget, synchronized to actual system time, with hands that update in real time and a second hand that ticks at least once per second. Make sure all the hour markings are visible and put some effort into making a modern, stylish clock face. Please pay attention to the correct alignment of the numbers, hour markings, and hands on the face.
In its defence, the code actually specifically calls that edge case out and justifies it:
// Calculate rotations
// We use a cumulative calculation logic mentally, but here simple degrees work because of the transition reset trick or specific animation style.
// To prevent the "spin back" glitch at 360->0, we can use a simple tick without transition for the wrap-around,
// but for simplicity in this specific React rendering, we will stick to standard 0-360 degrees.
// A robust way to handle the spin-back on the second hand is to accumulate degrees, but standard clock widgets often reset.
Station clocks in Switzerland receive a signal from a master clock each minute that advances the minute hand, the seconds hand moves completely independent from the minute hand. This allows them to sync to the minute.
> The station clocks in Switzerland are synchronised by receiving an electrical impulse from a central master clock at each full minute, advancing the minute hand by one minute. The second hand is driven by an electrical motor independent of the master clock. It takes only about 58.5 seconds to circle the face; then the hand pauses briefly at the top of the clock. It starts a new rotation as soon as it receives the next minute impulse from the master clock.[3] This movement is emulated in some of the licensed timepieces made by Mondaine.
The video shows closer to 2 seconds for it to finally throw itself over in what could only be described as a "Thunk". I figured it would be a little more smooth.
in defense of 2.5 (Pro, at least), it was able to generate for me a metric UNIX clock as a webpage which I was amused by. it uses kiloseconds/megaseconds/etc. there are 86.4ks/day. The "seconds" hand goes around 1000 seconds, which ticks over the "hour" hand. Instead of saying 4am, you'd say it's 14.
as a calendar or "date" system, we start at UNIX time's creation, so it's currently 1.76 gigaseconds AUNIX. You might use megaseconds as the "week" and gigaseconds more like an era, e.g. Queen Elizabeth III's reign, persisting through the entire fourth gigasecond and into the fifth. The clock also displays teraseconds, though this is just a little purple speck atm. of course, this can work off-Earth where you would simply use 88.775ks as the "day"; the "dates" a Martian and Earthling share with each other would be interchangeable.
I can't seem to get anyone interested in this very serious venture, though... I guess I'll have to wait until the 50th or so iteration of Figure, whenever it becomes useful, to be able to build a 20-foot-tall physical metric UNIX clock in my front yard.
I made a few improvements... which all worked on the first try... except the ticking sound, which worked on the second try (the first try was too much like a "blip")
"Allow access to Google Drive to load this Prompt."
.... why? For what possible reason? No, I'm not going to give access to my privately stored file share in order to view a prompt someone has shared. Come on, Google.
Well, you also have to allow it to train on your data. Although this is not explicitly about your Google drive data, and probably requires you to submit a prompt yourself, the barriers here are way to weak/fuzzy for me consider granting access via any account with private info.
I'm assuming because AI Studio persisted, including shared, prompts are stored in Drive, and prompt sharing is implemented on top of Drive file sharing, so if AI Studio doesn't have access to Drive it doesn't have access to the shared prompt.
Because most likely (at least according to Hanlon's razor) they somehow decided that using Google Drive as the only persistent storage backing AI studio was a reasonable UX decision.
It probably makes some sense internally in big tech corporation logic (no new data storage agreements on top of the ones the user has already agreed to when signing up for Drive etc.), but as a user, I find it incredibly strange too – especially since the text chats are in some proprietary format I can't easily open on my local GDrive replica, but the images generated or uploaded just look like regular JPEGs and PNGs.
That is not the same prompt as the other person was using. In particular this doesn't provide the time to set the clock to, which makes the challenge a lot simpler. This also includes javascript.
The prompt the other person was using is:
```
Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.
```
This isn't using the same prompt or stack as the page from that post the other day; on aistudio it builds a web app across a few different files. It's still fairly concise but I don't think it's that much so.
> Please create a highly unusual 13-hour analog clock widget, synchronized to system time, with fully animated hands that move in real time, and not 12 but 13 hour markings - each will be spaced at not 5-minute intervals, but at 4-minute-37-second intervals. This makes room for all 13 hour markings. Please pay attention to the correct alignment of the 13 numbers and the 13 hour marks, as well as the alignment of the hands on the face.
This gave me a correct clock face on Gemini- after the model spent a lot of time thinking (and kind of thrashing in a loop for a while). The functionality isn't quite right, not that it entirely makes sense in the first place, but the face - at least in terms of the hour marks - looks OK to me.[0]
Even Gemini Flash did really well for me[0] using two prompts - the initial query and one to fix the only error I could identify.
> Please generate an analog clock widget, synchronized to actual system time, with hands that update in real time and a second hand that ticks at least once per second. Make sure all the hour markings are visible and put some effort into making a modern, stylish clock face.
Followed by:
> Currently the hands are working perfectly but they're translated incorrectly making then uncentered. Can you ensure that each one is translated to the correct position on the clock face?
Gemini has been doing this to me for the past few weeks at the end of basically every single response now, and it often seems to result in the subsequent responses getting off track and lower quality as all these extra tangets start polluting the context. Not to mention how distracting it is as it throws off the reply I was already halfway in the middle of composing by the time I read it.
In similar "news" Re: tangents, I've noticed Claude now suddenly starting to give up on problem analyses. After like three rounds of it trying to figure something out that I told it is a requirement, it suggests that we don't need to do that as it's a nice-to-have only anyway. If in auto-accept mode it'll start doing the other thing even. I have to catch it and tell it to not effing give up so easily and to stop giving me BS excuses turning hard requirements (literally the reason we're doing what we're doing) into a nice-to-have it can skip.
I guess they recently re-trained on too many "perpetual junior senior dev" stuff.
Add "Complete this request as a single task and do not ask any follow-up questions." Or some variation of that. They keep screwing with default behavior, but you can explicitly direct the LLM to override it.
This is why I wish chat UI's had separate categories of chats (like a few generic system prompts) that let you do more back-and-forth style discussions, or more "answers only" without adding any extra noise, or even an "exploration"/"tangent" slider.
The fact that system prompts / custom instructions have to be typed-in in every major LM chat UI is a missed opportunity IMO
I think AI should present those continuation prompts as dynamic buttons, like "Summarize", "Yes, explain more" etc. based on the AI's last message, like the NPC conversation dialogs in some RPGs
You can if you script the request yourself, or you could have a front end that lets you cut out those paragraphs from the conversation. I only say that because yesterday I followed this guide: https://fly.io/blog/everyone-write-an-agent/ except I had to figure out how to do it with Gemini API instead. The context is always just (essentially) a list of strings (or "parts" anyway, doesn't have to be strings) that you pass back to the model so you can make the context whatever you like. It shouldn't be too hard to make a frontend that lets you edit the context, and fairly easy to mock up if you just put the request in a script that you add to.
I've definitely noticed Gemini's tendency to return the image basically unchanged, but not noticed it being worse or better for images of people. When I tested by having it change aspects of a photo of me, I found it was far more likely to cooperate when I'd specify, for instance, "change the hair from long to short" rather than "Make the hair short" (the latter routinely failed completely).
It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.
There are a lot of caveats but in general the "spend 2 days" thing is a lot less true now IMHO thanks to LLM's that can write mostly correct elisp from basic specifications. YMMV of course. I have found this can also open up to being a lot more than "maybe 5% more efficient" for niche applications. It's the closest environment I've used to where the friction between "I wish my editor could do <x>" and actually having the feature almost disappears.
It is, though? I also had the same map come up about five times, showing a picture of the Parthenon and the Athenian acropolis, but it consistently insisted that these were of Orchomenos, which also had an acropolis but AFAIK is not the same; they are about 80 miles apart.
These were extremely helpful to read for insights on how to go back and retry different prompts instead, IMHO. I find it to be a significant step back in usability to lose those although I can understand the argument that they weren't directly useful on their own outside of that use case.
My thought is, for carrying a laptop, I'd rather have a backpack or messenger bag made of paper than a fabric grocery bag. The primary reasons I wouldn't carry a laptop in a grocery bag don't include the fact it's made of paper or the sounds of it rustling (though, it is nice to not have to worry about this one ripping).
reply