The source data for the training needs to be public and freely licensed too, oth...

_heimdall · on Dec 22, 2024

Is that really necessary if the resulting model was actually available and comprehensible?

Personally I can't say I care as much about what the training set is, I want to know what's actually in the model and used at runtime/interpretation.

pabs3 · on Dec 23, 2024

Yes, you can't know what kind of poisoning was done in the initial training data set, and you can't review the data, you can't review any human inputs, and you can't retrain from scratch. All those are things the model author can do, downstream folks/companies/governments should be able to do them too. Otherwise it isn't open source.