Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

you might enjoy this read, which is an up-to-date document from this year laying out what was the state of the art 20 years ago:

https://web.stanford.edu/~jurafsky/slp3/3.pdf

Essentially you just count every n-gram that's actually in the corpus, and "fill in the blanks" for all the 0s with some simple rules for smoothing out the probability.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: