UTF-8和UTF-16之间是否存在巨大差异? [英] Is there a drastic difference between UTF-8 and UTF-16
问题描述
我调用了一个webservice,它给了我一个具有UTF-8编码的响应xml。我在java中使用 getAllHeaders()
方法检查了它。
I call a webservice, that gives me back a response xml that has UTF-8 encoding. I checked that in java using getAllHeaders()
method.
现在,在我的java代码中,我接受了这个响应然后对它进行一些处理。然后,将其传递给不同的服务。
Now, in my java code, I take that response and then do some processing on it. And later, pass it on to a different service.
现在,我搜索了一下,发现默认情况下,Java中字符串的编码是UTF-16。
Now, I googled a bit and found out that by default the encoding in Java for strings is UTF-16.
在我的回复xml中,其中一个元素有一个字符É。现在这搞砸了我对其他服务的后处理请求。
In my response xml, one of the elements had a character É. Now this got screwed in the post processing request that I make to a different service.
它发送了一些乱七八糟的东西而不是发送É。现在我想知道,这两种编码真的会有很大不同吗?如果我想知道什么将从UTF-8转换为UTF-16,那我该怎么办呢?
Instead of sending É, it sent some jibberish stuff. Now I wanted to know, will there be really a lot of difference in the two of these encodings? And if I wanted to know what will É convert from UTF-8 to UTF-16, then how can I do that?
谢谢
推荐答案
两个UTF- 8和UTF-16是可变长度编码。但是,在UTF-8中,一个字符可能占用至少8位,而在UTF-16中,字符长度从16位开始。
Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits.
主要UTF-8优点:
- 基本的ASCII字符,如数字,没有
重音符号的拉丁字符等占用一个字节,与US-ASCII $相同b $ b代表。这样所有US-ASCII字符串都变为有效的UTF-8,
在许多情况下提供了良好的向后兼容性。 - 没有空字节,允许使用以空字符结尾的字符串,这个
也引入了很多向后兼容性。
主要UTF-8缺点:
- 许多常见字符的长度不同,这会减慢索引
并严重计算字符串长度。
主要UTF-16专业人士:
Main UTF-16 pros:
- 最合理的字符,如拉丁文,西里尔文,中文,日文
可以用2个字节表示。除非真正奇特的字符需要
,否则这意味着UTF-16的16位子集可以用作
固定长度编码,从而加快索引速度。
主要UTF-16缺点:
Main UTF-16 cons:
- US-ASCII中有很多空字节字符串,这意味着没有
以null结尾的字符串和大量浪费的内存。
一般来说,UTF-16是通常更适合内存中表示,而UTF-8非常适合文本文件和网络协议
In general, UTF-16 is usually better for in-memory representation while UTF-8 is extremely good for text files and network protocol
这篇关于UTF-8和UTF-16之间是否存在巨大差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!