编码中文字符串 [英] Encode a chinese string

查看:79
本文介绍了编码中文字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我需要使用C#对中文字符串进行编码.我知道使用ASCIIEncoding类,我们可以

编码8位字符(普通英语).但是我认为中文的大小是2个字节

一个角色.请帮助我

在此先感谢

Hi,

I need to encode a chinese string using C#.I know using ASCIIEncoding class we can

encode 8 bit character(Normal english language).But I think in chinese size is 2 byte

for one character.Please help me

Thanks in advance

推荐答案

这可能有助于解决您的问题,

将汉字转换为Unicode [
It might helps to solve your problem,

Converting chinese character to unicode[^]


您可以阅读给定的文章
将C#字符串编码为Byte [](字节数组),然后返回再次[
You can read Given article
Encoding C# strings as Byte[] (Byte Arrays) and back again[^]


首先,.NET原生支持覆盖所有代码点的Unicode UTF-16编码.实际上,代码点是用2字节或4字节编码的,它不仅覆盖BMP(基本多语言平面,0到0xFFFF),而且覆盖其上方的所有字符.据我所知,所有中文代码点都位于BMP中.因此,所有其他支持完整Unicode的UTF:UTF-8和UTF-32.因此,您不需要任何编码"中文,它已经受支持.

您只需要使用编码来读取/写入要流式传输的数据.最好使用UTF-8,它是大多数应用程序(包括Web)的标准事实.使用System.Text.Encoding.UTF8System.Text.UTF8Encoding,请参阅
http://msdn.microsoft.com/zh-cn/library/system.text.encoding.aspx [ http://unicode.org/ [ ^ ],
http://unicode.org/faq/utf_bom.html [
First of all, .NET natively support Unicode UTF-16 encoding which covers all code points; actually, the code point is encoded with 2 bytes or 4 bytes, which covers not only BMP (Base Multilingual Plane, 0 to 0xFFFF) but all characters above it. To best of my knowledge, all Chinese code points sit in BMP. So so all other UTFs supporting full Unicode: UTF-8 and UTF-32. So, you don''t need anything to "encode" Chinese, it is already supported.

You need to use encoding only to read/write data to stream. Prefer UTF-8, which is the standard de-facto for most applications including the Web. Use System.Text.Encoding.UTF8 and System.Text.UTF8Encoding, see http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^].

You might need better understanding of Unicode. This is not a 16-bit code! It standardize mapping between characters as cultural entities regardless of concrete glyphs and integer values understood in its abstract mathematical meaning, regardless of bit presentation of data in computers. The code points go well above 0xFFFF. On top of this, there are UTFs.

See:
http://unicode.org/[^],
http://unicode.org/faq/utf_bom.html[^].

—SA


这篇关于编码中文字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆