Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The source data for the training needs to be public and freely licensed too, otherwise its IMO not an open source model.


Is that really necessary if the resulting model was actually available and comprehensible?

Personally I can't say I care as much about what the training set is, I want to know what's actually in the model and used at runtime/interpretation.


Yes, you can't know what kind of poisoning was done in the initial training data set, and you can't review the data, you can't review any human inputs, and you can't retrain from scratch. All those are things the model author can do, downstream folks/companies/governments should be able to do them too. Otherwise it isn't open source.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: