UTF-8编码和解码问题 [英] UTF-8 Encoding and decoding issue
问题描述
我在将文本与UTF-8编码之间相互转换时遇到问题.这里我有字节数组,
I'm having a problem with converting text from and to UTF-8 encoding. Here I have byte array,
byte[] c = new byte[] { 1, 2, 200 };
我正在将其转换为UTF-8字符串并返回字节数组,
I'm converting it to UTF-8 string and back to byte array,
Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(c));
据我了解,我应该从中得到的是一个3字节的数组.正确的?但是,这就是我要得到的.
According to my understand what i should be expecting from this is an array with 3 bytes. right? But here's what I'm getting.
byte[5] { 1, 2, 239, 191, 189 }
这是什么原因?
我了解239, 191, 189
组合在 UTF-8特价中称为REPLACEMENT CHARACTER
What's the reason for this?
I understand the 239, 191, 189
combination is called REPLACEMENT CHARACTER
in UTF-8 Specials.
这也是一个更大问题的一部分.
Also this is part of a bigger problem.
推荐答案
并非所有字节序列都是有效的UTF-8.看来您的数组(1、2、200)在UTF-8中无效(这就是为什么要添加此特殊错误字符的原因)
Not all sequences of bytes are valid UTF-8. It seems that your array (1, 2, 200) is invalid in UTF-8 (that's why this special error character is added)
MSDN谈到Encoding.UTF8:
MSDN says about Encoding.UTF8:
它返回提供Unicode字节顺序的UTF8Encoding对象 标记(BOM).要实例化不提供BOM的UTF8编码, 调用UTF8Encoding构造函数的任何重载.
It returns a UTF8Encoding object that provides a Unicode byte order mark (BOM). To instantiate a UTF8 encoding that doesn't provide a BOM, call any overload of the UTF8Encoding constructor.
1)没有BOM( https://en.wikipedia.org/wiki/Byte_order_mark ).
1) There are no BOM (https://en.wikipedia.org/wiki/Byte_order_mark) in your example.
2)200-前导字节.它后面必须有足够的连续字节
2) 200 - a leading byte. It must be followed by enough continuation bytes
这篇关于UTF-8编码和解码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!