AI is going to be in a different place in 5 or 10 years.

kjkjadksj · on Dec 13, 2023

Thats what they were saying about crypto too

pixl97 · on Dec 13, 2023

That's what they were saying about the internet too.

xcv123 · on Dec 13, 2023

That's a non sequitur.

Cryptocurrencies failure to live up to the hype has almost nothing to do with algorithms or technological issues.

k__ · on Dec 13, 2023

Hopefully.

However, I had the impression there doesn't exist enough training data to make that place different in a meaningful way.

Still, I think, letting some skilled UX designers loose on input methods could improve things quite a bit, even if the models won't get "smarter".

visarga · on Dec 13, 2023

I did a back-of-the envelope calculation, OpenAI has 100M monthly active users, assume 10K tokens per user per month usage ($20 would pay for 600K tokens on the API) then they generate 1T tokens per month.

This dataset would be focused on human interests (in domain for users) and containing AI errors (in domain for the model). It's LLM empowered with human in the loop and tools - code execution, search, APIs. So it is a good basis for the next dataset. I think OpenAI has amassed about as much chat log text as there is organic data was used for GPT-4, which was rumoured to be 13T tokens.

It's surprising how much synthetic data can be generated per year. And OpenAI can do this with human in the loop for free, if the paying users pay for everyone. We then benefit 6-12 months later when the open source models trained with data exfiltrated from OpenAI models catch up.