UTF-16的要点是什么？ [英] What's the point of UTF-16?

查看：126 发布时间：2016/11/19 14:33:39 utf-8 character-encoding utf-16 utf utf-32

本文介绍了UTF-16的要点是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从来没有理解UTF-16编码的要点。如果你需要能够将字符串当作随机访问（即代码点与代码单元相同），那么你需要UTF-32，因为UTF-16仍然是可变长度。如果你不需要这个，那么与UTF-8相比，UTF-16似乎是一个巨大的空间浪费。 UTF-16比UTF-8和UTF-32的优点是什么？为什么Windows和Java使用它作为其本地编码？

解决方案

当Windows NT被设计时，UTF-16不存在（NT 3.51诞生于1993年，而UTF-16于1996年诞生于Unicode 2.0标准）;有UCS-2，那时，足以保持每个字符可用的Unicode，所以1代码点= 1代码单元等价实际上是真的 - 字符串不需要可变长度的逻辑。

他们后来移动到UTF-16，以支持整个Unicode字符集;然而，它们不能移动到UTF-8或UTF-32，因为这将破坏API接口（其他事情）之间的二进制兼容性。

对于Java，我不太确定;因为它在1995年发布我怀疑UTF-16已经在空气中（即使它还没有标准化），但我认为与基于NT的操作系统的兼容性可能在他们的选择（连续

$ UTF-8 < - > UTF-16每次调用Windows API的转换会导致一些减速）维基百科解释说，即使对于Java，它也以相同的方式：

：它最初支持UCS-但是在J2SE 5.0中移动到UTF-16。

因此，一般来说，当你在某些API / Framework中看到UTF-16时，它是因为它以UCS-2 （避免字符串管理算法中的复杂性），但它移动到UTF-16以支持BMP外的代码点，仍然保持相同的代码单元大小。

I've never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still variable length. If you don't need this, then UTF-16 seems like a colossal waste of space compared to UTF-8. What are the advantages of UTF-16 over UTF-8 and UTF-32 and why do Windows and Java use it as their native encoding?
解决方案
When Windows NT was designed UTF-16 didn't exist (NT 3.51 was born in 1993, while UTF-16 was born in 1996 with the Unicode 2.0 standard); there was instead UCS-2, which, at that time, was enough to hold every character available in Unicode, so the 1 code point = 1 code unit equivalence was actually true - no variable-length logic needed for strings.

They moved to UTF-16 later, to support the whole Unicode character set; however they couldn't move to UTF-8 or to UTF-32, because this would have broken binary compatibility in the API interface (among the other things).

As for Java, I'm not really sure; since it was released in ~1995 I suspect that UTF-16 was already in the air (even if it wasn't standardized yet), but I think that compatibility with NT-based operating systems may have played some role in their choice (continuous UTF-8 <-> UTF-16 conversions for every call to Windows APIs can introduce some slowdown).

Edit

Wikipedia explains that even for Java it went in the same way: it originally supported UCS-2, but moved to UTF-16 in J2SE 5.0.

So, in general when you see UTF-16 used in some API/Framework it is because it started as UCS-2 (to avoid complications in the string-management algorithms) but it moved to UTF-16 to support the code points outside the BMP, still maintaining the same code unit size.

这篇关于UTF-16的要点是什么？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

UTF-16的要点是什么？ [英] What's the point of UTF-16?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

UTF-16的要点是什么？ [英] What&#39;s the point of UTF-16?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

UTF-16的要点是什么？ [英] What's the point of UTF-16?

登录关闭