没有任何理由在UTF-8更喜欢UTF-16? [英] Is there any reason to prefer UTF-16 over UTF-8?
问题描述
检查UTF-16和UTF-8的属性,我找不到任何理由,更喜欢UTF-16。
Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16.
然而,检查了Java和C# ,它看起来像字符串和字符有默认为UTF-16。我想,这可能是由于历史原因,或者出于性能的原因,也没有找到任何信息。
However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information.
任何人都知道为什么这些语言选择UTF-16?而且没有任何正当理由对我来说,做到这一点,以及
Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well?
编辑:同时我也发现的这个答案,这似乎是相关的,有一些有趣的链接
Meanwhile I've also found this answer, which seems relevant and has some interesting links.
推荐答案
东亚语言通常需要UTF-16更少的存储(2字节足以让99%的东亚语言字符),比UTF-8(通常为3个字节是必需的)。
East Asian languages typically require less storage in UTF-16 (2 bytes is enough for 99% of East-Asian language characters) than UTF-8 (typically 3 bytes is required).
当然,对于西lanagues,UTF-8,通常是小的(1个字节,而不是2)。 UTF-16的用户模式应用对于像HTML(其中有很多的标记),它是大同小异的混合文件。
Of course, for Western lanagues, UTF-8 is usually smaller (1 byte instead of 2). For mixed files like HTML (where there's a lot of markup) it's much of a muchness.
处理的略的比处理UTF-8,因为代理对表现在几乎相同的方式,结合字符的行为更容易。所以UTF-16通常可以处理为一个固定大小的编码
Processing of UTF-16 for user-mode applications is slightly easier than processing UTF-8, because surrogate pairs behave in almost the same way that combining characters behave. So UTF-16 can usually be processed as a fixed-size encoding.
这篇关于没有任何理由在UTF-8更喜欢UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!