• This site has a download to a text file containing every character in the Big5 encoding scheme for Traditional Chinese characters, sorted by frequency of use. Although it's from the 90s, the corpus it pulls from is huge, having over a million tokens. It was compiled by an academic source.

    For Simplified Chinese, one can consult the PRC's Table of General Standard Chinese Characters (Tongyong Fangui Hanzi Biao 通用规范汉字表). It consists of over 8000 characters and is divided into three tiers from most to least frequent, per the government's analysis. You can download a text file of it here at Wikisource. Once downloaded, you will see that each tier is enclosed in curly brackets.

    Per the document, tiers one and two consist of 3500 & 3000 characters respectively and meet the needs of the sectors for education and publication. Tier three consists of an additional 1605 characters and includes characters you'll see in names, technical jargon, and idioms from classical literature (i.e. archaic words). The characters within each tier are not sorted by frequency (the first person pronoun wo 我 appears in spot 761, for example), so this source is more providing batches of characters one is likely to encounter.

    Between the two of them, the Traditional Chinese source has the better frequency sorting, but in the grand scheme of things, both should fulfill the need of knowing which characters are essential. Now, there's no definite answer to the question of how many characters you should take from each. Due to the poor organization of the PRC's character list, I wouldn't take less than 3500 so that the essentials can be covered. In the Traditional character list, that would cover characters up to the 99.68 percentile in usage.

    EDIT: I have talked it over with a colleague, and I think using 2500-3000 of the most common characters would sufficient if we're hurting for space. This site sorts Simplified characters by frequency, but they don't offer a text file of it. I don't really know what you'd need to make the locales, so do let me know. Depending on how you go about things, you don't need a separate list for Traditional characters, for example.

    EDIT 2 This would make an excellent reference for characters to have. Use unicode, of course, though.


