我如何从一个可能的Windows 1252'的ANSI'转换连接codeD上传文件UTF8在.NET? [英] How do I convert from a possibly Windows 1252 'ANSI' encoded uploaded file to UTF8 in .NET?

查看:140
本文介绍了我如何从一个可能的Windows 1252'的ANSI'转换连接codeD上传文件UTF8在.NET?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有在ASP.NET网页,用来上传文件中的文件上传控制,其内容(在流)在被处理后面的C#code和输出页面上后,使用 HtmlEn code

I've got a FileUpload control in an ASP.NET web page which is used to upload a file, the contents of which (in a stream) are processed in the C# code behind and output on the page later, using HtmlEncode.

不过,一些这方面的输出正在成为错位,特别是符号'£'是作为统一code FFFD替换字符输出。我跟踪下来到输入文件,它是Windows 1252('ANSI')EN codeD。

But, some of this output is becoming mangled, specifically the symbol '£' is output as the Unicode FFFD REPLACEMENT CHARACTER. I've tracked this down to the input file, which is Windows 1252 ('ANSI') encoded.

现在的问题是,


  1. 我如何确定该文件是否为EN codeD为1252或UTF8?它可以是,以及

  1. How do I determine whether the file is encoded as 1252 or UTF8? It could be either, and

我如何将其转换为UTF8如果是在Windows 1252,preserving£等?符号

How do I convert it to UTF8 if it is in Windows 1252, preserving the symbol £ etc?

我在网上看了,但找不到满意的答案。

I've looked online but cannot find a satisfactory answer.

推荐答案

如果您知道该文件是带的Windows 1252 codeD,你可以打开一个StreamReader文件,并通过适当的编码。这就是:

If you know that the file is encoded with Windows 1252, you can open the file with a StreamReader and pass the proper encoding. That is:

StreamReader reader = new StreamReader("filename", Encoding.GetEncoding("Windows-1252"), true);

真实的告诉它来设置基于字节顺序标记编码在文件的前面,如果他们在那里。否则,它会打开它作为Windows的1252。

The "true" tells it to set the encoding based on the byte order marks at the front of the file, if they're there. Otherwise it opens it as Windows-1252.

您就可以读取该文件,如果你要转换为UTF-8,写信给你已经与endcoding打开的文件。

You can then read the file and, if you want to convert to UTF-8, write to a file that you've opened with that endcoding.

简短的回答你的第一个问题是,没有确定文件的编码100%满意的方式。如果存在字节顺序标记,你可以决定的Uni $ C $的什么味道c那么它是,但没有BOM,你坚持使用启发式,以确定编码。

The short answer to your first question is that there isn't a 100% satisfactory way to determine the encoding of a file. If there are byte order marks, you can determine what flavor of Unicode it is, but without the BOM, you're stuck with using heuristics to determine the encoding.

我没有为启发式很好的参考。你可能会搜索如何确定记事本的字符集。我记得看到一些关于前一段时间。

I don't have a good reference for the heuristics. You might search for "how does Notepad determine the character set". I recall seeing something about that some time ago.

在实践中,我发现以下为大多数我做什么工作:

In practice, I've found the following to work for most of what I do:

StreamReader reader = new StreamReader("filename", Encoding.Default, true);

大多数我读的文件是那些我使用.NET的StreamWriter的创造,而且他们在UTF-8的BOM。我得到通常写有一些工具,不理解的Uni code或code页面,我只是把它当作一个字节流,这Encoding.Default做的好。其他的文件

Most of the files I read are those that I create with .NET's StreamWriter, and they're in UTF-8 with the BOM. Other files that I get are typically written with some tool that doesn't understand Unicode or code pages, and I just treat it as a stream of bytes, which Encoding.Default does well.

这篇关于我如何从一个可能的Windows 1252'的ANSI'转换连接codeD上传文件UTF8在.NET?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆