Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

During pre-training the model is learning next-token prediction, which is naturally additive. Even if you added DEL as a token it would still be quite hard to change the data so that it can be used in a mext-token prediction task Hope that helps




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: