-
I could split every two characters which would be a bit more efficient, but it's just added complexity and it sounds like in some cases it might mess things up?
hi, please DO NOT add spaces or split whatever if you are unsure!
the following is an example (but may be only those who understand chinese can comprehend)
https://www.ptt.cc/bbs/Learn_Buddha/M.1315569690.A.648.htmlThis is a joke from a famous movie in chinese.
original, NO spaces:
本人林大福將大樹街石屋租于恩人黃老十一家
未能報恩萬一不交租亦可收回黃公年租銀兩三
十萬不能轉租別人立此爲據本人兒孫不得有違Good guy read this as:
本人林大福_將大樹街石屋租于恩人黃老十一家_未能報恩萬一不交租亦可
收回黃公年租銀兩三十_萬不能轉租別人_立此爲據本人兒孫不得有違Bad guy read this as:
本人林大福_將大樹街石屋租于恩人_黃老十一家未能報恩萬一不交租
亦可收回黃公年租銀兩三十萬_不能轉租別人_立此爲據本人兒孫不得有違which is a complete opposite meaning!
thank you.
-
I could split every two characters which would be a bit more efficient, but it's just added complexity and it sounds like in some cases it might mess things up?
what do you guys mean by every two characters?
a chinese word = a chinese character. in the past, , it take up double the byte of that of an ASCII character.
e.g.for the number of bytes you store "AB",
you can only store 1 chinese "中".thanks
Thanks - well, that's promising. I'd propose:
U+4E00..U+9FFF
? This includes Japanese too) then encode each character as its own image. This will use a bit more memory, but will have the effect of wrapping each character whenever there is a new line (otherwise they will still be placed inline)I could split every two characters which would be a bit more efficient, but it's just added complexity and it sounds like in some cases it might mess things up?