Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> training off of data generated by another AI is generally a bad idea

Ah. So if I understand this... once the internet becomes completely overrun with AI-generated articles of no particular substance or importance, we should not bulk-scrape that internet again to train the subsequent generation of models.

I look forward to that day.



That's already happened. Its well established now that the internet is tainted. After essentially ChatGPT's public release, a non-insignificant amount of internet content is not written by humans.


Yes, this is a real and serious concern that AI researchers have.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: