Java为什么将UTF-16用于内部文本表示 [英] Why does Java use UTF-16 for the internal text representation

查看:250
本文介绍了Java为什么将UTF-16用于内部文本表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java使用UTF-16作为内部文本表示形式。但为什么?在我看来,UTF-8更灵活。

Java uses UTF-16 for the internal text representation. But why? UTF-8 as it seems to me is more flexible.

来自Wiki:


UTF-8需要8、16、24或32位(一到四个八位位组)到
来编码Unicode字符,UTF-16需要16或32位到
来编码一个字符,并且UTF-32始终需要32位来编码
字符。

UTF-8 requires either 8, 16, 24 or 32 bits (one to four octets) to encode a Unicode character, UTF-16 requires either 16 or 32 bits to encode a character, and UTF-32 always requires 32 bits to encode a character.


推荐答案

Java是在将Unicode指定为一组16位代码点的时代设计和首次实现的。这就是为什么 char 是16位类型的原因,也是为什么 String 被建模为 char 。

Java was designed and first implemented back in the days when Unicode was specified to be a set of 16 bit code-points. That is why char is a 16 bit type, and why String is modeled as a sequence of char.

现在,如果Java设计人员能够预见到Unicode将添加额外的代码平面,他们可能会> 1 选择了32位 char 类型。

Now, if the Java designers had been able to foresee that Unicode would add extra "code planes", they might1 have opted for a 32 bit char type.

Java 1.0于1996年1月问世。在1996年7月发布了Unicode 2.0(引入了更高的代码平面和代理机制)。

Java 1.0 came out in January 1996. Unicode 2.0 (which introduced the higher code planes and the surrogate mechanism) was released in July 1996.

相信Java的某些版本至少在某种程度上已经使用UTF-8作为字符串的表示形式。但是,仍然需要将此映射到 String API中指定的方法,因为这是Java应用程序所需要的。如果主要的内部表示形式是UTF-8而不是UTF-16,那么这样做会效率低下。

Internally, I believe that some versions of Java have used UTF-8 as the representation for strings, at least at some level. However, it is still necessary to map this to the methods specified in the String API because that is what Java applications require. Doing that if the primary internal representation is UTF-8 rather than UTF-16 is going to be inefficient.

在您建议他们应该更改之前字符串API。 ...考虑已经存在多少万亿行依赖于当前字符串API的Java代码。

And before you suggest that they should "just change the String APIs" ... consider how many trillions of lines of Java code already exist that depend on the current String APIs.

对于它的价值,大多数(如果不是全部)支持Unicode的编程语言都通过16位 char wchar 类型。

For what it is worth, most if not all programming languages that support Unicode do it via a 16 bit char or wchar type.

1-...可能不是,要记住,内存在这些内存中要贵得多的日子里,程序员对此更加担心。

这篇关于Java为什么将UTF-16用于内部文本表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆