流阅读器UTF-8,如何解码特殊字符? [英] StreamReader & UTF-8, how to decode special characters?

查看:94
本文介绍了流阅读器UTF-8,如何解码特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好-我需要您的帮助.我正在努力寻找几天摆脱此问题的方法,但是以某种方式我没有对其进行管理:我喜欢使用一些.NET代码来自动阅读和解释我的电子邮件.

基本上可以正常工作,只有一些UTF字符打扰了我的工作.发生的事情是这样的:电子邮件标题指出邮件是使用UTF-8编码的.为了阅读我的邮件,我使用StreamReader类中的ReadLine().我将返回值存储在String类对象中.

据我所知,StreamReader默认设置为UTF-8.我还读过String类对象是unicode.因为UTF-8也是unicode,所以我不明白在正常文本中我得到的返回值为"= C3 = A4"或"= E2 = 80 = 9C".

此外:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream);</pre>
我试过了:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream, Encoding::UTF8, false);</pre>

Hi folks - I need your help. I am tring and searching to get rid of this problem for several days but somehow I did not manage it: I like to use some .NET code to read and interpret my e-mails automatically.

Basically it works fine, only some UTF characters are disturbing my work. This is what happens: E-mail header says a mail is encoded with UTF-8. For to read my mails I use ReadLine() from StreamReader class. I store the return values in a String class object.

As far as I know, StreamReader is set to UTF-8 by default. I have also read that String class objects are unicode. Because UTF-8 also is unicode I do not understand that I get return values as "=C3=A4" or "=E2=80=9C" within the normal text.

Besides:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream);</pre>
I have tried:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream, Encoding::UTF8, false);</pre>
and

<pre lang="midl">Encoding ^enc = Encoding::GetEncoding("utf-8");<br />
StreamReader^ reader = gcnew StreamReader(sslstream, enc, false);</pre><br />
(where false is to prevent automatic search for some start up byte orders for encoding indentifiers)<br />



什么都没改变,我也不知道为什么...

我发现奇怪的是(在调试StreamReader对象时)我发现StreamReader的"CurrentEncoding"-值设置为
CurrentEncoding = 0x00c6bfa4 {CodePageASCII = 20127 ISO_8859_1 = 28591 ...}

我认为编码模式是个问题.当StreamReader尝试以ASCII模式读取邮件时,它必须存在特殊字符问题.唯一的问题是,我怎么能强迫它切换到unicode/UFT-8.创建StreamReader对象时,它似乎没有作用-无论我做什么.

你能帮我吗?非常感谢!



Nothing changes and I don''t know why...

What I find strange is (when debugging the StreamReader object) that I find StreamReader''s "CurrentEncoding"-Value set to
CurrentEncoding = 0x00c6bfa4 { CodePageASCII=20127 ISO_8859_1=28591 ...}

I think the encoding mode is the problem. When StreamReader tries to read the mail in ASCII mode it must have a problem with special characters. The only questions is, how can I force it to switch to unicode/UFT-8. It seems to have no effect - whatever I do - when creating the StreamReader object.

Can you help? Thanks a lot!

推荐答案

首先,通常不应假定在StreamReader中进行编码.使用其他接受布尔参数bool detectEncodingFromByteOrderMarks的构造函数.该API在流的开头接受BOM.

有关更多信息,请参见 http://en.wikipedia.org/wiki/Byte-order_mark [ ^ ]和以下内​​容:
First of all, you usually should not assume encoding in StreamReader. Use the other constructors, those accepting Boolean parameter bool detectEncodingFromByteOrderMarks. This API accept BOM at the beginning of the stream.

For more information, see this http://en.wikipedia.org/wiki/Byte-order_mark[^] and this: http://www.unicode.org/faq/utf_bom.html#BOM[^].

—SA


这篇关于流阅读器UTF-8,如何解码特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆