Is this really a surprise? I'd hazard a guess that the ability to program and be...

Is this really a surprise? I'd hazard a guess that the ability to program and beyond that - to create new programming languages - requires more than just probabilistic text prediction. LLMs work for programming languages where they have enough existing corpus to basically ape a programmer having seen similar enough text. A real programmer can take the concepts of one programming language and express them in another, without having to have digested gigabytes of raw text.

There may be emergent abilities that arise in these models purely due to how much information they contain, but I'm unconvinced that their architecture allows them to crystallize actual understanding. E.g. I'm sceptical that there'd be an area in the LLM weights that encodes the logic behind arithmetic and gives rise to the model actually modelling arithmetic as opposed to just probabilistically saying that the text `1+1=` tended to be followed by the letter `2`.