Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is that? I don't know about Chinese, but surely Japanese has much better entropy in bytes? As would other languages with more expressive character sets.


That's the issue. English can be represented with 7 bits. Good luck doing that for any logographic language.

And that doesn't even take into account that since English (and a lot of alphabet based languages) use spaces to mark where words begin and end. In Japanese, you can have a word that consists of a kanji plus a few hiragana characters as a grammatical marker. But there's no space between that word and the next. How do you know decide where to insert a line break?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: