没有任何理由在UTF-8更喜欢UTF-16? [英] Is there any reason to prefer UTF-16 over UTF-8?

查看:136
本文介绍了没有任何理由在UTF-8更喜欢UTF-16?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查UTF-16和UTF-8的属性,我找不到任何理由,更喜欢UTF-16。

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16.

然而,检查了Java和C# ,它看起来像字符串和字符有默认为UTF-16。我想,这可能是由于历史原因,或者出于性能的原因,也没有找到任何信息。

However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information.

任何人都知道为什么这些语言选择UTF-16?而且没有任何正当理由对我来说,做到这一点,以及

Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well?

编辑:同时我也发现的这个答案,这似乎是相关的,有一些有趣的链接

Meanwhile I've also found this answer, which seems relevant and has some interesting links.

推荐答案

东亚语言通常需要UTF-16更少的存储(2字节足以让99%的东亚语言字符),比UTF-8(通常为3个字节是必需的)。

East Asian languages typically require less storage in UTF-16 (2 bytes is enough for 99% of East-Asian language characters) than UTF-8 (typically 3 bytes is required).

当然,对于西lanagues,UTF-8,通常是小的(1个字节,而不是2)。 UTF-16的用户模式应用对于像HTML(其中有很多的标记),它是大同小异的混合文件。

Of course, for Western lanagues, UTF-8 is usually smaller (1 byte instead of 2). For mixed files like HTML (where there's a lot of markup) it's much of a muchness.

处理的的比处理UTF-8,因为代理对表现在几乎相同的方式,结合字符的行为更容易。所以UTF-16通常可以处理为一个固定大小的编码

Processing of UTF-16 for user-mode applications is slightly easier than processing UTF-8, because surrogate pairs behave in almost the same way that combining characters behave. So UTF-16 can usually be processed as a fixed-size encoding.

这篇关于没有任何理由在UTF-8更喜欢UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆