我如何从一个可能的Windows 1252'的ANSI'转换连接codeD上传文件UTF8在.NET? [英] How do I convert from a possibly Windows 1252 'ANSI' encoded uploaded file to UTF8 in .NET?
问题描述
我有在ASP.NET网页,用来上传文件中的文件上传
控制,其内容(在流)在被处理后面的C#code和输出页面上后,使用 HtmlEn code
。
I've got a FileUpload
control in an ASP.NET web page which is used to upload a file, the contents of which (in a stream) are processed in the C# code behind and output on the page later, using HtmlEncode
.
不过,一些这方面的输出正在成为错位,特别是符号'£'是作为统一code FFFD替换字符输出。我跟踪下来到输入文件,它是Windows 1252('ANSI')EN codeD。
But, some of this output is becoming mangled, specifically the symbol '£' is output as the Unicode FFFD REPLACEMENT CHARACTER. I've tracked this down to the input file, which is Windows 1252 ('ANSI') encoded.
现在的问题是,
-
我如何确定该文件是否为EN codeD为1252或UTF8?它可以是,以及
How do I determine whether the file is encoded as 1252 or UTF8? It could be either, and
我如何将其转换为UTF8如果是在Windows 1252,preserving£等?符号
How do I convert it to UTF8 if it is in Windows 1252, preserving the symbol £ etc?
我在网上看了,但找不到满意的答案。
I've looked online but cannot find a satisfactory answer.
推荐答案
如果您知道该文件是带的Windows 1252 codeD,你可以打开一个StreamReader文件,并通过适当的编码。这就是:
If you know that the file is encoded with Windows 1252, you can open the file with a StreamReader and pass the proper encoding. That is:
StreamReader reader = new StreamReader("filename", Encoding.GetEncoding("Windows-1252"), true);
真实的告诉它来设置基于字节顺序标记编码在文件的前面,如果他们在那里。否则,它会打开它作为Windows的1252。
The "true" tells it to set the encoding based on the byte order marks at the front of the file, if they're there. Otherwise it opens it as Windows-1252.
您就可以读取该文件,如果你要转换为UTF-8,写信给你已经与endcoding打开的文件。
You can then read the file and, if you want to convert to UTF-8, write to a file that you've opened with that endcoding.
简短的回答你的第一个问题是,没有确定文件的编码100%满意的方式。如果存在字节顺序标记,你可以决定的Uni $ C $的什么味道c那么它是,但没有BOM,你坚持使用启发式,以确定编码。
The short answer to your first question is that there isn't a 100% satisfactory way to determine the encoding of a file. If there are byte order marks, you can determine what flavor of Unicode it is, but without the BOM, you're stuck with using heuristics to determine the encoding.
我没有为启发式很好的参考。你可能会搜索如何确定记事本的字符集。我记得看到一些关于前一段时间。
I don't have a good reference for the heuristics. You might search for "how does Notepad determine the character set". I recall seeing something about that some time ago.
在实践中,我发现以下为大多数我做什么工作:
In practice, I've found the following to work for most of what I do:
StreamReader reader = new StreamReader("filename", Encoding.Default, true);
大多数我读的文件是那些我使用.NET的StreamWriter的创造,而且他们在UTF-8的BOM。我得到通常写有一些工具,不理解的Uni code或code页面,我只是把它当作一个字节流,这Encoding.Default做的好。其他的文件
Most of the files I read are those that I create with .NET's StreamWriter, and they're in UTF-8 with the BOM. Other files that I get are typically written with some tool that doesn't understand Unicode or code pages, and I just treat it as a stream of bytes, which Encoding.Default does well.
这篇关于我如何从一个可能的Windows 1252'的ANSI'转换连接codeD上传文件UTF8在.NET?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!