Unicode联合会打算使UTF-16用完字符吗? [英] Does the Unicode Consortium Intend to make UTF-16 run out of characters?

查看:103
本文介绍了Unicode联合会打算使UTF-16用完字符吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前版本的UTF-16仅能编码1,112,064个不同的数字(代码点); 0x0-0x10FFFF.

The current version of UTF-16 is only capable of encoding 1,112,064 different numbers(code points); 0x0-0x10FFFF.

Unicode联盟是否打算使UTF-16用完字符?

Does the Unicode Consortium Intend to make UTF-16 run out of characters?

即设置一个代码点> 0x10FFFF

i.e. make a code point > 0x10FFFF

如果没有,为什么有人会为utf-8解析器编写代码,使其能够接受5或6个字节的序列?因为它将在其功能中添加不必要的指令.

If not, why would anyone write the code for a utf-8 parser to be able to accept 5 or 6 byte sequences? Since it would add unnecessary instructions to their function.

1,112,064还不够,我们实际上需要更多字符吗?我的意思是:我们快用完了吗?

Isn't 1,112,064 enough, do we actually need MORE characters? I mean: How quickly are we running out?

推荐答案

截至2011年我们已经消耗了109,449个字符并留作应用程序使用(6,400 + 131,068):

为超过860,000个未使用的字符留出空间; CJK扩展名E (约10,000个字符)和另外85个集合就足够了;因此,如果您与 Ferengi文化接触,我们应该做好准备.

leaving room for over 860,000 unused chars; plenty for CJK extension E(~10,000 chars) and 85 more sets just like it; so that in the event of contact with the Ferengi culture, we should be ready.

2003年11月, IETF 限制了UTF-8以U + 10FFFF结尾,且 RFC 3629 ,以匹配UTF-16字符编码的约束:UTF -8解析器不应接受会使utf-16集溢出的5个或6个字节序列,或4个字节序列中大于0x10FFFF

In November 2003 the IETF restricted UTF-8 to end at U+10FFFF with RFC 3629, in order to match the constraints of the UTF-16 character encoding: a UTF-8 parser should not accept 5 or 6 byte sequences that would overflow the utf-16 set, or characters in the 4 byte sequence that are greater than 0x10FFFF

如果它们超出

Please put edits listing sets that pose threats on the size of the unicode code point limit here if they are over 1/3 the Size of the CJK extension E(~10,000 chars):

  • CJK extension E(~10,000 chars)
  • Ferengi culture characters(~5,000 chars)

这篇关于Unicode联合会打算使UTF-16用完字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆