How is that? I don't know about Chinese, but surely Japanese has much better entropy in bytes? As would other languages with more expressive character sets.
That's the issue. English can be represented with 7 bits. Good luck doing that for any logographic language.
And that doesn't even take into account that since English (and a lot of alphabet based languages) use spaces to mark where words begin and end. In Japanese, you can have a word that consists of a kanji plus a few hiragana characters as a grammatical marker. But there's no space between that word and the next. How do you know decide where to insert a line break?