I don't have any exact references, but multiple finetuning datasets have used cu...

I don't have any exact references, but multiple finetuning datasets have used curated GPT-3/4 conversations as training data. It's less that they're overtly superior to human data, and more that they're less-bad and more abundantly available.

> Like, how could this even theoretically work?

I'm not really an expert on it either, but my understanding is that it works the same way curating human data works. You sift through the garbage, nonsense, impolite and incoherent AI responses and only include the exemplary conversations in your training set.

It feels kinda like the "monkeys on typewriters writing shakespeare" parable. If you have enough well-trained AIs generate enough conversations, eventually enough of them will be indistinguishable enough from human data to be usable for training.