Avatar for houshou_m

houshou_m

Member since Oct 2023 • Last active Nov 2023
  • 0 conversations
  • 22 comments

Most recent activity

  • in Bangle.js
    Avatar for houshou_m

    I updated to 2v19.75 after seeing this thread since I'd like to wear my watch on my right hand and then noticed that my messages are no longer being retained on the watch. The notification that I got a message will appear, and then the message will disappear as if I had dismissed it. This happens with all messages, and they don't show up in the messaging app either. I updated to .79 today, but the issue persists. Is this problem unique to me?

  • in Bangle.js
    Avatar for houshou_m

    I think it would be a good idea to edit the title of this thread to include "[Solved]" in it without the quotes so that others can see this issue has been taken care of.

  • in Bangle.js
    Avatar for houshou_m

    I'm pleased to say that I can also report that the issue seems to be resolved now. No problems with either the initial notification or opening the message in the messaging app.

  • in Bangle.js
    Avatar for houshou_m

    This site has a download to a text file containing every character in the Big5 encoding scheme for Traditional Chinese characters, sorted by frequency of use. Although it's from the 90s, the corpus it pulls from is huge, having over a million tokens. It was compiled by an academic source.

    For Simplified Chinese, one can consult the PRC's Table of General Standard Chinese Characters (Tongyong Fangui Hanzi Biao 通用规范汉字表). It consists of over 8000 characters and is divided into three tiers from most to least frequent, per the government's analysis. You can download a text file of it here at Wikisource. Once downloaded, you will see that each tier is enclosed in curly brackets.

    Per the document, tiers one and two consist of 3500 & 3000 characters respectively and meet the needs of the sectors for education and publication. Tier three consists of an additional 1605 characters and includes characters you'll see in names, technical jargon, and idioms from classical literature (i.e. archaic words). The characters within each tier are not sorted by frequency (the first person pronoun wo 我 appears in spot 761, for example), so this source is more providing batches of characters one is likely to encounter.

    Between the two of them, the Traditional Chinese source has the better frequency sorting, but in the grand scheme of things, both should fulfill the need of knowing which characters are essential. Now, there's no definite answer to the question of how many characters you should take from each. Due to the poor organization of the PRC's character list, I wouldn't take less than 3500 so that the essentials can be covered. In the Traditional character list, that would cover characters up to the 99.68 percentile in usage.

    EDIT: I have talked it over with a colleague, and I think using 2500-3000 of the most common characters would sufficient if we're hurting for space. This site sorts Simplified characters by frequency, but they don't offer a text file of it. I don't really know what you'd need to make the locales, so do let me know. Depending on how you go about things, you don't need a separate list for Traditional characters, for example.

    EDIT 2 This would make an excellent reference for characters to have. Use unicode, of course, though.

  • in Bangle.js
    Avatar for houshou_m

    There's a new cutting edge update --- 2v19.67. This update might resolve the issue of Chinese text displaying improperly within the messaging app itself, which is where you end up if you tap on the message when it is displayed. This is a separate issue to the matter of text not displaying properly when first shown on the watch. I have been working with 2v19.60 and can confirm that Chinese text now displays properly outside of the app --- i.e. when one views the message without interacting with it.

    @Gordon I will get back to you regarding a list of common Chinese characters. I am in fact aware of one that isn't sourced from an unthinking algorithm, but I would like to see if there are alternatives.

  • in Bangle.js
    Avatar for houshou_m

    Gordon, I sent you a direct message with my watch's log and a description of the testing I did. Please let me know if there is anything else I can do.

    @ccchan You can get your friends to send you quotes from Mencius if you just ask. I'm sure they'll do it for free. Mine did :)

  • in Bangle.js
    Avatar for houshou_m

    Yes, for all intents and purposes, those characters are equivalent. Chinese also has the optional type of comma for when enumerating items (as in, "milk, butter, eggs"). This is also the standard comma used in Japanese.

    I did some tests with the nightly build of Bangle.js Gadgetbridge (commit 1aadc04fd) today, and unfortunately nothing seems to have changed. Chinese text was still running off the screen, and no line wrapping was being done. Also, I don't know if I was just experiencing some exceptionally well-timed bad luck, but I was not always being notified on my watch when a message containing Chinese was sent to me --- this despite the fact that I never encounter this issue with messages using ASCII characters in the body. The messages would simply be completely absent from my watch, even when there was a lull of 10 seconds between the messages. I do not have Gadgetbridge set to limit messages if multiple messages come in too quickly, and I of course had the screen of my phone off while performing these tests, which I did using two different instant messaging platforms.

    I also tested to see how Japanese would be handled. Although I received the Japanese messages on my watch, the device strangely would not buzz to alert me to them. Furthermore, the messages were all blank. Perhaps these are two related issues? These issues occurred with messages that had both English and Japanese, as well as only Japanese.

    I took pictures to show the issues I have described and have attached them to this message. I hope this helps. The Chinese message sent in full is supposed to be: 孟子見梁惠王,王立於沼上,顧鴻鴈麋鹿,曰:「賢者亦樂此乎?」

  • in Bangle.js
    Avatar for houshou_m

    @Gordon I'm happy to contribute to the development of this device. As for your suggested change, I think that's a fantastic start to tackling this matter. It really might be enough! I will test this out for a few days before reporting back to you. I will also see how it fairs with Japanese.

    Additionally, I want to clarify my suggestion in the space below, since it seems that I was not able to communicate what I meant well. I apologize for that. I was writing late at night and wasn't as careful about my words as I should have been.

    I am assuming that there is a character limit to each line of text which the device observes with the default messaging app. If that is the case, then should the punctuation fix not be enough, it would be ideal to be able to wrap the lines in such a way as to preserve a number of characters which is divisible by two. So if there is a limit of, say, 13 characters per line, we wrap at the 12th. Would that be possible?

    To give an example of what I mean using ccchan's original message, let's say hypothetically that the Bangle.js does have a 13 character limit with the default messaging app. The message should be split up like so:

    【知乎】你的验证码是 6 (12)
    98449,此验证码用于 (12)
    登录知乎或重置密码。10 (13)
    分钟内有效。 (6)

    Each line of text is wrapped perfectly in this example. No Chinese is split up in a way that splits a two syllable word across two lines. I had the second to last line be 13 characters on the assumption that Gadgetbridge would be able to handle whitespace normally, even within a message consisting of both full- and half-width characters.

    @ccchan I hope the above explanation clarifies what I mean. We are only concerned with the number of characters per line. There is no need for any kind of advanced programming to deal with splitting lines of Chinese text this way. Gadgetbridge only needs to be able to count characters and be aware of character limits. Again, I apologize for not being clear.

  • in Bangle.js
    Avatar for houshou_m

    While in Classical Chinese it was certainly the case that one character most often wrote one word, in modern Mandarin, it is more often the case that a word consists of two syllables, represented by two characters. This change can actually start to be seen as early as Xunzi, who uses a noticeably larger amount of disyllabic words than Mencius. Let's look again at the example sentence from the first article from the Universal Declaration of Human Rights, this time adding spaces between words*:

    人人 生 而 自由﹐在 尊嚴 和 權利 上 一律 平等。他們 賦有 理性 和 良心﹐並 應 以 兄弟 關係 的 精神 互相 對待。

    Sum of monosyllabic words (one graph, one word): 10
    Sum of disyllabic words (two graphs, one word): 15

    *Note that for the purposes of this conversation, a word should be understood as something one would encounter in daily life and also something that one would find in a dictionary of modern Mandarin.

  • in Bangle.js
    Avatar for houshou_m

    Hello, Gordon,

    As with all languages, it's certainly the case that if you unnaturally break things up, there's the chance of introducing ambiguity. However, it's also the case that context will resolve the matter most of the time. And in the worst case scenario, it's not as if the watch is the only means by which we can read the messages. We can think about the well known joke in English about the importance of commas for a comparable example: "Let's eat grandpa!"

    Unfortunately, although I am familiar with the linguistic and orthographic side of this problem, I am not familiar with the coding side. But a method that could theoretically be used to handle run on English sentences such as "Thequickbrownfoxjumpsoverthelazydog" would be our best starting point.

    One solution I can imagine would be to cut the line once it approaches the side margin of the screen at a point where the preceding characters form a block divisible by two and then starting a new line; the software could go on like this until it both reaches the bottom margin and cannot reasonably shrink the bitmaps down any further, at which point it must necessarily elide any remaining text.

    So applying this proposal for how to cut up lines to article 1 of the Universal Declaration of Human Rights, we would get the following if we assume a character limit of 11 per line (characters bolded to show where words get split up):

    人人生而自由﹐在尊嚴 (10 characters)
    和權利上一律平等。他 (10 characters)
    賦有理性和良心﹐並 (10 characters)
    應以兄弟關係的精神互 (10 characters)
    對待。 (4 characters)

    As you can see from the above, only two words get split up (thanks in part to punctuation in Mandarin being full-width, hence taking up the same space as a "normal" character). These splits do not result in any ambiguity in this example either.

Actions