在C#中的字符串/字符编码什么GetBytes会()之间的GetString())的区别,并转换(? [英] In C# String/Character Encoding what is the difference between GetBytes(), GetString() and Convert()?

查看:1518
本文介绍了在C#中的字符串/字符编码什么GetBytes会()之间的GetString())的区别,并转换(?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有麻烦Unicode字符串转换为UTF-8字符串通过网络发送:

We are having trouble getting a Unicode string to convert to a UTF-8 string to send over the wire:

// Start with our unicode string.
string unicode = "Convert: \u10A0";

// Get an array of bytes representing the unicode string, two for each character.
byte[] source = Encoding.Unicode.GetBytes(unicode);

// Convert the Unicode bytes to UTF-8 representation.
byte[] converted = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, source);

// Now that we have converted the bytes, save them to a new string.
string utf8 = Encoding.UTF8.GetString(converted);

// Send the converted string using a Microsoft function.
MicrosoftFunc(utf8);



虽然我们已经转换的字符串为UTF-8,它没有到达为UTF-8。

Although we have converted the string to UTF-8, it's not arriving as UTF-8.

推荐答案

一个非常困扰和混乱早晨起床后,我们找到了答案,这个问题。

After a much troubled and confusing morning, we found the answer to this problem.

我们缺少的关键点,这是使这个很混乱,是字符串类型总是以16位(2字节)的Unicode 编码。这意味着,当我们对字节做的GetString(),它们会自动被重新编码成Unicode的幕后,我们没有更好的比我们都摆在首位。

The key point we were missing, which was making this very confusing, was that string types are always encoded in 16-bit (2-byte) Unicode. This means that when we do a GetString() on the bytes, they are automatically being re-encoded into Unicode behind the scenes and we are no better off than we were in the first place.

当我们开始得到字符错误,并在另一端双字节数据,我们知道出事了,但在我们有代码一目了然,我们看不到任何东西错误。学习是我们前面解释后,我们意识到,我们需要,如果我们希望保持编码发送字节数组。幸运的是,MicrosoftFunc()有一个重载能够采取一个字节数组而不是一个字符串。这意味着我们可以将unicode字符串转换为我们的选择的编码,然后把它送上正是因为我们期望它。该代码更改为:

When we started to get character errors, and double byte data at the other end, we knew something was wrong but at a glance of the code we had, we couldn't see anything wrong. After learning what we have explained above, we realised that we needed to send the byte array if we wanted to preserve the encoding. Luckily, MicrosoftFunc() had an overload which was able to take a byte array instead of a string. This meant that we could convert the unicode string to an encoding of our choice and then send it off exactly as we expect it. The code changed to:

// Convert from a Unicode string to an array of bytes (encoded as UTF8).
byte[] source = Encoding.UTF8.GetBytes(unicode); 

// Send the encoded byte array directly! Do not send as a Unicode string.
MicrosoftFunc(source);



摘要:



所以在最后,从上面我们可以看出:

Summary:

So in conclusion, from the above we can see that:


  • GetBytes会()除其他事项外,确实是Encoding.Convert( )从Unicode的(因为字符串都是以Unicode编码)的和指定的编码功能从调用,并返回编码的字节数组。

  • 的GetString( )除其他事项外,确实是Encoding.Convert()从指定的编码函数被调用从为Unicode的(因为字符串都是以Unicode编码)的,并返回一个字符串对象。

  • 转换()其实一种编码的字节数组转换为另一种编码的另一个字节数组。显然,弦不能使用的(因为字符串都是以Unicode编码)

  • GetBytes() amongst other things, does an Encoding.Convert() from Unicode (because strings are always Unicode) and the specified encoding the function was called from and returns an array of encoded bytes.
  • GetString() amongst other things, does an Encoding.Convert() from the specified encoding the function was called from to Unicode (because strings are always Unicode) and returns it as a string object.
  • Convert() actually converts a byte array of one encoding to another byte array of another encoding. Obviously strings cannot be used (because strings are always Unicode).

这篇关于在C#中的字符串/字符编码什么GetBytes会()之间的GetString())的区别,并转换(?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆