使用8位ascii char读取文件 [英] Reading a file with 8-bit ascii char

查看:140
本文介绍了使用8位ascii char读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含有效8位字符代码的ascii文本文件.
如何读取此文件并将8位字符转换为有效的
unicode?我知道我可以对文件进行UTF8编码或读取字节
然后对其进行编码.但这一切都假设我了解8位
事先输入密码.

是否有任何方法可以读取文件并自动执行
转换?

James Johnson

I have an ascii text file that contains valid 8-bit character codes.
How do I read this file and have the 8-bit char translated into valid
unicode? I know I could UTF8 encode the file or could read the bytes
and then encode it. But this all assumes that I know about the 8-bit
codes before hand.

Is there any method that will read the file and automaticly do the
conversion?

James Johnson

推荐答案

只需阅读

Simply read with

System.IO.StreamReader reader =
   new System.IO.StreamReader(fileName,  System.Text.Encoding.ASCII);


或者更普遍地,自动检测编码:


or, more universally, auto-detect the encoding:

System.IO.StreamReader reader =
   new System.IO.StreamReader(fileName,  true);


它会根据您的ASCII数据为您提供Unicode字符串.原则上,这就是您所需要的.您可以使用
写回


It will give you Unicode string(s) based on your ASCII data. In principle, this is all you need. You can write it back with

bool appendOrNot = //something
System.IO.StreamWriter writer =
   new System.IO.StreamWriter(fileName,  appendOrNot, System.Text.Encoding.UTF8);



一般来说,由于文本数据始终是Unicode,因此更喜欢仅在输出中使用Unicode UTF之一.字符和字符串数据内部唯一支持的文本编码是UTF-16.所有其他编码仅作为持久性支持.它们在内存中以字节数组形式表示,与字符边界无关,这可能会有所不同(在UTF-8中,字符大小为1-4字节,在UTF-16中为1或2个16位字(两个字称为代理对,在UTF-32中-始终是一个32位字).请参见上面的最后两个链接.

请参阅:

http://msdn.microsoft.com/en-us/library/system.io. streamreader.aspx [ ^ ],
http://msdn.microsoft.com/en-us/library/system.io. streamwriter.aspx [^ ], http://msdn.microsoft.com/en-us/library/f5f5x7kt. aspx [ ^ ];

http://msdn.microsoft.com/en-us/library/system.text. encoding.aspx [^ ];

您还需要了解Unicode和BOM的工作原理:
http://unicode.org/ [ ^ ],
http://unicode.org/faq/utf_bom.html [



As you text data is, generally speaking, always Unicode, prefer using on output only one of Unicode UTFs. The only text encoding supported in character and string data internally is UTF-16. All other encodings are only supported as persistence; they are represented in memory as arrays of bytes, with no regards to characters boundaries, which can vary (in UTF-8, character size is 1-4 bytes, in UTF-16 — one or two 16-bit words (two words called surrogate pair, in UTF-32 — always one 32-bit word). Please see two very last links above.

Please see:

http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx[^],
http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx[^],http://msdn.microsoft.com/en-us/library/f5f5x7kt.aspx[^];

http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^];

you also need to understand how Unicode and BOM work:
http://unicode.org/[^],
http://unicode.org/faq/utf_bom.html[^].

BOM (or its absence) is used for auto-detection of encoding mentioned above.



Apparently, auto-detecting of the encoding by BOM is needed only in one case: if the encoding is some Unicode UTF, you know what encoding is that, but BOM is not present. Such things happen. This is also explained in the last Unicode article referenced above.

—SA


您好WBurgMo,

没有诸如有效的8位ASCII码"之类的东西(请参见
http://en.wikipedia.org/wiki/ASCII [http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx [
Hello WBurgMo,

there is no such thing like "valid 8-bit ASCII code" (see
http://en.wikipedia.org/wiki/ASCII[^]).

If you have plain ASCII 7-bit text, you may use the ASCIIEncoding to read the data. If you have some 8-bit extension of the ASCII 7-bit encoding, you must specify the code page as described in http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^] (see the constructor that takes the code page as argument).

Note: you must give that information about the code page from outside, i.e. there is no way to deduce from the 8-bit ASCII-extended text, what code page it is.

Cheers

Andi


这篇关于使用8位ascii char读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆