我应该使用数据类型来读取中文和英文字符 [英] data type should i use to read chinese and english characters

查看:86
本文介绍了我应该使用数据类型来读取中文和英文字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我应该使用哪种数据类型来读取流中的中文和英文字符?

我应该使用字节还是Char?

Which data type should i use to read chinese and english characters from a stream?

Should i use Byte or Char?

推荐答案

不是字节!怎么会这样?您应该使用System.String这是Unicode字符串,因此它同时支持大多数语言.当您通过网络或任何其他类型的流进行通信时,无论如何,所有文本数据都将转换为字节数组,或从字节数组转换为字节数组,但是每个字符都使用不同数量的字节,从1到4,因为Unicode支持0范围内的代码点.到0x10FFF.具体的表示方式取决于用于序列化的UTF.在内部,.NET(和Windows本身)在内存中使用UTF-16LE,其中每个字符占用一个2字节的单词或两个称为代理对的单词,之后的字符需要使用基本多语言平面(i)(BMP),它需要前00至0xFFFF个代码点(不包括为代理自身保留的特殊范围).

所有UTF都是等效的.尽管其名称显示位数,但它们都支持所有代码点.在文件中,通常是由BOM表检测到的.请参阅:

http://en.wikipedia.org/wiki/Unicode/ [ http://en.wikipedia.org/wiki/Code_point/ [ http://en.wikipedia.org/wiki/Byte_order_mark/ [ http://unicode.org/ [ ^ ],
http://unicode.org/faq/utf_bom.html [ http://msdn.microsoft.com/en-us/library/system.text. encoding.aspx [^ ].

您可以直接使用方法GetBytes(文本到字节数组)和GetChars(获取Unicode字符).

例如,要从字节数组中获取字符串:
Not byte! How can it be? You should use System.String which is the Unicode string, so it supports most languages at the same time. When you communicate through the network or any other kind of stream, all the text data is converted to/from the array of bytes anyway, but each character takes different number of bytes, 1 to 4, because Unicode supports code points in the range 0 to 0x10FFF. The particular presentation depends on UTF used for serialization. Internally, in memory, .NET (and Windows itself) uses UTF-16LE, where each character takes a 2-byte words or two such words called surrogate pairs, which is needed for characters beyond Base Multilingual Plane (BMP) which takes first 00 to 0xFFFF code points (excluding special ranges reserved for surrogates themselves).

All UTFs are equivalent. Despite their names showing number of bits, they all support all code points. In the files, there are usually detected by the BOM. Please see:

http://en.wikipedia.org/wiki/Unicode/[^],
http://en.wikipedia.org/wiki/Code_point/[^],
http://en.wikipedia.org/wiki/Byte_order_mark/[^],

http://unicode.org/[^],
http://unicode.org/faq/utf_bom.html[^].



In memory, you always work with strings. When you need to pass the via network or persist it in the file, you choose some encoding which presents the text in the form of array if characters and visa versa. You need to choose only one of UTFs. Prefer UTF-8. To do it directly, use the class System.Text.Encoding or/and its derived classes for every particular encoding. Please see:
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^].

You can directly use the methods GetBytes (text to array of bytes) and GetChars (to get Unicode characters).

For example, to get a string from array of bytes:
byte[] data = //let's say, received from network...

//...
string value = new string(System.Text.Encoding.UTF8.ToChars(data));



—SA



—SA





Nvarchar在数据库端用于插入数据,并且在传递参数u时在前端需要在值之前添加N!

验证以下链接!

http://forums.asp.net/t/1427585. aspx/1?C + datatype + for + all + world + world [
Hi,


Nvarchar in database side for inserting data and in front end while passing parameter u need to add Nbefore the value!

verify the below link!

http://forums.asp.net/t/1427585.aspx/1?C+datatype+for+all+world+languages[^]

Happy coding!!!


这篇关于我应该使用数据类型来读取中文和英文字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆