I've noticed that image models are particularly bad at modifying popular concepts in novel ways (way worse "generalization" than what I observe in language models).
This is it. They’re language models which predict next tokens probabilistically and a sampler picks one according to the desired ”temperature”. Any generalization outside their data set is an artifact of random sampling: happenstance and circumstance, not genuine substance.
However: do humans have that genuine substance? Is human invention and ingenuity more than trial and error, more than adaptation and application of existing knowledge? Can humans generalize outside their data set?
A yes-answer here implies belief in some sort of gnostic method of knowledge acquisition. Certainly that comes with a high burden of proof!
Yes. Humans can perform abduction, extrapolating given information to new information. LLMs cannot, they can only interpolate new data based on existing data.
The proof is that humans do it all the time and that you do it inside your head as well. People need to stop with this absurd level of rampant skepticism that makes them doubt their own basic functions.
the concept is too nebulous to "prove" but the fact im operating a machine (relatively) skillfully to write to you shows we are in fact able to generalise. This wasn't planned, we came up with this. Same with cars etc. We're quite good at the whole "tool use" thing
Yes, but they are reasoning within their dataset, which will contain multiple example of html+css clocks.
They are just struggling to produce good results because they are language models and don’t have great spatial reasoning skills, because they are language models.
Their output normally has all the elements, just not in the right place/shape/orientation.
They definitely don't completely fail to generalise. You can easily prove that by asking them something completely novel.
Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.
Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?
Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.
Like, the response to "... The surgeon (who is male and is the boy's father) says: I can't operate on this boy! He's my son! How is this possible?" used to be "The surgeon is the boy's mother"
The response to "... At each door is a guard, each of which always lies. What question should I ask to decide which door to choose?" would be an explanation of how asking the guard what the other guard would say would tell you the opposite of which door you should go through.
Also, they're fundamentally bad at math. They can draw a clock because they've seen clocks, but going further requires some calculations they can't do.
For example, try asking Nano Banana to do something simpler, like "draw a picture of 13 circles." It likely will not work.