可能是编码问题? [英] Possible Encoding issue?

查看:103
本文介绍了可能是编码问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,

我有一个应用程序,可以下载电子邮件,然后解析附件.

附件是一个HTML模板,其中包含各种数据.

刚开始编写解析代码时,我使用了直接从电子邮件中保存的HTML文件的副本,一切都很好.

现在,我下载了一封电子邮件,并使用一个函数StoreToFile()将文件保存在我正在使用的ActiveUp.Net.Mail库中.然后,我使用流读取器打开文件以通过解析代码运行它.

我的问题是,当我读取已自动下载的文件时,streamreader.readline()带回了随机性.但是在记事本中同时查看原始文件和下载的文件时,它们是相同的...

示例:
原始文件的第一行:
<!DOCTYPE html PUBLIC-//W3C//DTD XHTML 1.1//EN""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

新文件中的第一个ReadLine(在记事本中看起来相同):
行"

< \ 0!\ 0d \ 0o \ 0c \ 0t \ 0y \ 0p \ 0e \ 0> \ 0字符串

我以为它可能被编码为base 64或其他尝试将其转换回原来的格式,但是失败的原因是抱怨无效字符...

有谁知道这是怎么回事?它把我推向高处...

任何帮助将不胜感激...

Hello there

I have an application that downloads emails, then parses an attachment.

The attachment is an HTML template with various bits of data in it.

When I first started writing the parsing code, I used a copy of the HTML file saved directly from an email, and everything was fine.

Now, I download an email, and save the file using a function called StoreToFile() in the ActiveUp.Net.Mail library I''m using. Then I open the file with a streamreader to run it through my parsing code.

The problem I have is that when I read the file that has been downloaded automatically, the streamreader.readline() brings back a load of randomness. But when looking at both original file and the downloaded file side by side in notepad, they are identical...

Example:
First line from original file:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

First ReadLine from new file (Which looks identical in notepad):
line "

<\0!\0d\0o\0c\0t\0y\0p\0e\0>\0" string

I thought it maybe encoded as base 64 or something so tried converting it back, but that failed complaining of invalid characters...

Does anybody have any ideas what is going on with this? Its driving me up the wall...

Any help would be greatly appreciated...

推荐答案

第二个字节为空?非常强烈地闻到16位Unicode编码.如果文件开头没有字节顺序标记,则您的流读取器可能会尝试将其读取为8位ASCII码.我不是这个领域的专家,但是我确信有一种方法可以让StoreToFile()编写ASCII或(最好是)让您的streamreader读取Unicode.

玩得开心,
彼得
Every second byte a null? Smells VERY strongly of a 16-bit Unicode encoding. If there isn''t a byte order mark at the start of the file, your streamreader may be trying to read it as 8 bit ASCII. I''m no expert in this particular area, but I''m sure there is a way to either get StoreToFile() to write ASCII or (preferably) your streamreader to read Unicode.

Have fun,
Peter


感谢彼得

你是对的.对于可能遇到此问题的任何人,这是我为克服该问题所做的工作:
Thanks Peter

You are right. For anyone that may come across this problem, here is what I did to get past it:
System.Text.Encoding myEncoding;
myEncoding = Text.Encoding.Unicode;
Using(StreamReader sr = New StreamReader(@"c:\Temp\temp.txt", myEncoding))
{
//My Code
}



再次感谢彼得.



Thanks again peter.


这篇关于可能是编码问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆