"ab" is 4 bytes, while a 64 bit pointer to an interned string is 8 bytes. It would seem that the savings of CSV would be better analogized to static typing - the type is defined once, then each record only contains data.
I had the same intuition as the original comment. But no, the relative sizes of data formats aren't that straightforward. One could intern symbols to say 16 bit values, or one could infer structural types and compress the data that way. But those are both creating additional assumptions and processing that likely aren't done by commonly available tools.
You still have to pay for a 64 bit pointer to an un-interned string though. It's the fact that the keys are repeated for each record that overwhelms the size of the keys or pointers themselves, when you have millions of records. If you are using fixed structures, then the keys are implicit and don't need to be repeated or even represented as pointers, just known fixed offsets into the structures.
I wrote this about "Representing and Editing JSON with Spreadsheets":
I had the same intuition as the original comment. But no, the relative sizes of data formats aren't that straightforward. One could intern symbols to say 16 bit values, or one could infer structural types and compress the data that way. But those are both creating additional assumptions and processing that likely aren't done by commonly available tools.