Unicode联合会打算使UTF-16用完字符吗? [英] Does the Unicode Consortium Intend to make UTF-16 run out of characters?
问题描述
当前版本的UTF-16仅能编码1,112,064个不同的数字(代码点); 0x0-0x10FFFF
.
The current version of UTF-16 is only capable of encoding 1,112,064 different numbers(code points); 0x0-0x10FFFF
.
Unicode联盟是否打算使UTF-16用完字符?
Does the Unicode Consortium Intend to make UTF-16 run out of characters?
即设置一个代码点> 0x10FFFF
i.e. make a code point > 0x10FFFF
如果没有,为什么有人会为utf-8解析器编写代码,使其能够接受5或6个字节的序列?因为它将在其功能中添加不必要的指令.
If not, why would anyone write the code for a utf-8 parser to be able to accept 5 or 6 byte sequences? Since it would add unnecessary instructions to their function.
1,112,064还不够,我们实际上需要更多字符吗?我的意思是:我们快用完了吗?
Isn't 1,112,064 enough, do we actually need MORE characters? I mean: How quickly are we running out?
推荐答案
截至2011年我们已经消耗了109,449个字符并留作应用程序使用(6,400 + 131,068):
为超过860,000个未使用的字符留出空间; CJK扩展名E (约10,000个字符)和另外85个集合就足够了;因此,如果您与 Ferengi文化接触,我们应该做好准备.
leaving room for over 860,000 unused chars; plenty for CJK extension E(~10,000 chars) and 85 more sets just like it; so that in the event of contact with the Ferengi culture, we should be ready.
2003年11月, IETF 限制了UTF-8以U + 10FFFF结尾,且 RFC 3629 ,以匹配UTF-16字符编码的约束:UTF -8解析器不应接受会使utf-16集溢出的5个或6个字节序列,或4个字节序列中大于0x10FFFF
In November 2003 the IETF restricted UTF-8 to end at U+10FFFF with RFC 3629, in order to match the constraints of the UTF-16 character encoding: a UTF-8 parser should not accept 5 or 6 byte sequences that would overflow the utf-16 set, or characters in the 4 byte sequence that are greater than 0x10FFFF
Please put edits listing sets that pose threats on the size of the unicode code point limit here if they are over 1/3 the Size of the CJK extension E(~10,000 chars):
- CJK扩展名E (约10,000个字符)
- Ferengi文化人物(约5,000个字符)
- CJK extension E(~10,000 chars)
- Ferengi culture characters(~5,000 chars)
这篇关于Unicode联合会打算使UTF-16用完字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!