[Solved] message was cropped and render un-usable, sometimes, for chinese fonts.

You are reading a single comment by @ccchan and its replies. Click here to read the full conversation.

•

ccchan

ps, so gpt4 say trad. chinese and simpl. chinese have union, and each have unique parts. and these 3 parts are all within the 1st block of 20000 characters.

gpt4 record:

You:

i dont understand, for the english "old", tell me the code for the chinese word 舊and 旧 then.

ChatGPT:

The Chinese characters "舊" (Traditional) and "旧" (Simplified), both meaning "old" in English, have different Unicode code points. Here are their respective codes:

舊 (Traditional Chinese): The Unicode code for this character is U+820A.

旧 (Simplified Chinese): The Unicode code for this character is U+65E7.

These codes are hexadecimal representations of their respective positions in the Unicode character set. You can use these codes in various digital systems to represent these specific characters.

You:

so, the 1st block of 20000 you mentioned will include both the traditional and simplified daily use chinese characters?

ChatGPT:

Yes, the first block, "CJK Unified Ideographs" (U+4E00 to U+9FFF), includes both Traditional and Simplified Chinese characters commonly used in daily life. This block is designed to encompass a wide range of frequently used Chinese characters, irrespective of whether they are in Traditional or Simplified form.

The Unicode system aims to unify characters with the same etymological roots, so many characters that have both Traditional and Simplified forms are encoded only once in this block. The distinction between Traditional and Simplified in Unicode is often a matter of font rendering and context rather than separate encoding.

For specific characters that have different forms in Traditional and Simplified Chinese, like "舊" (U+820A) and "旧" (U+65E7), they are encoded separately, as they have distinct graphical representations. Such characters may appear in the same block (as is the case with these examples) or in different blocks, depending on their Unicode allocation.

You:

will some trad. or simpl. chinese have the same coding then?

ChatGPT:

Yes, many Traditional and Simplified Chinese characters share the same Unicode code point. This is because Unicode is designed to unify characters that are essentially the same, regardless of regional script variations. The distinction between Traditional and Simplified Chinese is often a matter of how the character is rendered (i.e., displayed or printed), rather than how it is encoded in Unicode.

For example, a character that has the same form in both Traditional and Simplified Chinese scripts will have only one Unicode code point. It's the font and language settings of the user's system that determine how the character is displayed.

In cases where the Traditional and Simplified forms of a character are graphically different, they are assigned separate Unicode code points, as seen with "舊" (U+820A) and "旧" (U+65E7).

Therefore, while there are distinct Traditional and Simplified forms for some characters, many others are encoded just once in Unicode, covering both scripts. This approach reduces redundancy and simplifies text processing across different languages that use Chinese characters.

You:

show me one that is same in trad. and simpl chinese.

ChatGPT:

A common example of a character that is the same in both Traditional and Simplified Chinese is the character for "water," which is written as 水 in both scripts. In Unicode, this character has the code point U+6C34.

This character exemplifies many instances where the Traditional and Simplified forms of a Chinese character are identical, and therefore, they share the same Unicode encoding. The character 水 (U+6C34) is used in a wide range of contexts and is a fundamental character in both Traditional and Simplified Chinese writing systems.

[Solved] message was cropped and render un-usable, sometimes, for chinese fonts.

About

Espruino