Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People say this, but when it comes to AI models, the training data is not owned by these companies/groups, so it cannot be "open sourced" in any sense. And the training code is basically accessing that training data that cannot be open sourced, therefore it also cannot be shared. So the full open source model you wish to have can only provide subpar results.


They could easily list the data used though. These datasets are mostly known and floating around. When they are constructed, instructions for replication could be provided too


They could, but even if they give this list the detractors will still say it is not open source.


yes and as a bonus they may get sued, which in the long-term, makes free / offline models to not be viable

It would be so much better if all models were trained with LibGen.


Isn't this the same situation that any codebase faces when one thinks about open sourcing it? I can't legally open source the code I don't own.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: