流阅读器UTF-8,如何解码特殊字符? [英] StreamReader & UTF-8, how to decode special characters?
问题描述
大家好-我需要您的帮助.我正在努力寻找几天摆脱此问题的方法,但是以某种方式我没有对其进行管理:我喜欢使用一些.NET代码来自动阅读和解释我的电子邮件.
基本上可以正常工作,只有一些UTF字符打扰了我的工作.发生的事情是这样的:电子邮件标题指出邮件是使用UTF-8编码的.为了阅读我的邮件,我使用StreamReader类中的ReadLine().我将返回值存储在String类对象中.
据我所知,StreamReader默认设置为UTF-8.我还读过String类对象是unicode.因为UTF-8也是unicode,所以我不明白在正常文本中我得到的返回值为"= C3 = A4"或"= E2 = 80 = 9C".
此外:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream);</pre>
我试过了:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream, Encoding::UTF8, false);</pre>
和
Hi folks - I need your help. I am tring and searching to get rid of this problem for several days but somehow I did not manage it: I like to use some .NET code to read and interpret my e-mails automatically.
Basically it works fine, only some UTF characters are disturbing my work. This is what happens: E-mail header says a mail is encoded with UTF-8. For to read my mails I use ReadLine() from StreamReader class. I store the return values in a String class object.
As far as I know, StreamReader is set to UTF-8 by default. I have also read that String class objects are unicode. Because UTF-8 also is unicode I do not understand that I get return values as "=C3=A4" or "=E2=80=9C" within the normal text.
Besides:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream);</pre>
I have tried:
<pre lang="midl">StreamReader^ reader = gcnew StreamReader(sslstream, Encoding::UTF8, false);</pre>
and
<pre lang="midl">Encoding ^enc = Encoding::GetEncoding("utf-8");<br />
StreamReader^ reader = gcnew StreamReader(sslstream, enc, false);</pre><br />
(where false is to prevent automatic search for some start up byte orders for encoding indentifiers)<br />
什么都没改变,我也不知道为什么...
我发现奇怪的是(在调试StreamReader对象时)我发现StreamReader的"CurrentEncoding"-值设置为
CurrentEncoding = 0x00c6bfa4 {CodePageASCII = 20127 ISO_8859_1 = 28591 ...}
我认为编码模式是个问题.当StreamReader尝试以ASCII模式读取邮件时,它必须存在特殊字符问题.唯一的问题是,我怎么能强迫它切换到unicode/UFT-8.创建StreamReader对象时,它似乎没有作用-无论我做什么.
你能帮我吗?非常感谢!
Nothing changes and I don''t know why...
What I find strange is (when debugging the StreamReader object) that I find StreamReader''s "CurrentEncoding"-Value set to
CurrentEncoding = 0x00c6bfa4 { CodePageASCII=20127 ISO_8859_1=28591 ...}
I think the encoding mode is the problem. When StreamReader tries to read the mail in ASCII mode it must have a problem with special characters. The only questions is, how can I force it to switch to unicode/UFT-8. It seems to have no effect - whatever I do - when creating the StreamReader object.
Can you help? Thanks a lot!
推荐答案
首先,通常不应假定在StreamReader
中进行编码.使用其他接受布尔参数bool detectEncodingFromByteOrderMarks
的构造函数.该API在流的开头接受BOM
.
有关更多信息,请参见 http://en.wikipedia.org/wiki/Byte-order_mark [ ^ ]和以下内容: ^ ].
—SA
First of all, you usually should not assume encoding inStreamReader
. Use the other constructors, those accepting Boolean parameterbool detectEncodingFromByteOrderMarks
. This API acceptBOM
at the beginning of the stream.
For more information, see this http://en.wikipedia.org/wiki/Byte-order_mark[^] and this: http://www.unicode.org/faq/utf_bom.html#BOM[^].
—SA
这篇关于流阅读器UTF-8,如何解码特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!