识别C#中的编码格式 [英] Identify the encoding format in C#

查看:176
本文介绍了识别C#中的编码格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

有谁知道如何识别文件的编码格式.
我有一个使用SJIS(Shift-JIS)的文件和另一个使用EUC-JP编码格式的文件.
是否可以通过编程方式识别给定文件(例如.html,.mts或.xml)是否为EUC-JP或UTF-8格式的SJIS.谷歌找到的大多数代码帮助都只能识别UTF-8.尝试了以下

Hi All,

Do anyone have an idea about how to identify the encoding format of a file.
I have a file which is in SJIS(Shift-JIS) and another file in EUC-JP encoded format.
Is there any way programmatically to identify whether the given file, say .html,.mts or .xml is in SJIS of EUC-JP or UTF-8 format. Most of the code help found in google helps to identify only UTF-8. Tried the following

using (var reader = new System.IO.StreamReader(path, true))
{
    var currentEncoding = reader.CurrentEncoding;
}



但是,即使对于SJIS编码的文件格式,它也会返回UTF-8.真的好难过

在xml文件中进行编码



But this returns UTF-8 for even SJIS encoded file formats. So really struck up

Encoding in xml file

<?xml version="1.0" encoding="SJIS"?> 
<?xml version="1.0" encoding="EUC-JP"?> 



谁能帮我.

在此先感谢



Can anyone please assist me.

Thanks in Advance

推荐答案

当然,您的尝试没有任何意义.请看我的评论.如果缺少HTTP-EQUIVE字符集信息,则只能基于某些统计信息进行识别,这实际上是HTML页面非常脏,质量无法接受的标志.

但是,我只是有一个主意,但使用它也不容易.

在某些浏览器中会自动检测编码.例如,我正在使用Mozilla Seamonkey.它是开源的,因此您可以下载源代码并查看它是如何完成的.您可以尝试其他开放源代码浏览器,例如Chromium:
http://en.wikipedia.org/wiki/List_of_web_browsers [
Of course your attempts make no sense. Please see my comments. The recognition can only be based on some statistics, if HTTP-EQUIVE charset information is missing, which is in fact a sign of a very dirty HTML page, unacceptable quality.

However, I just got an idea, but using it is also not simple.

There is a auto-detection of encoding in some browsers. I''m using Mozilla Seamonkey, for example. It is open source, so you can download the source code and see how it is done. You can try other open-source browsers, like Chromium:
http://en.wikipedia.org/wiki/List_of_web_browsers[^].

Before doing anything with the source code, try this feature on available already-built browsers, to make sure this feature works well for encodings you are interested in. Don''t expect a simple solution: this is a difficult problem. I''m not sure if solving it makes sense at all.

—SA


这篇关于识别C#中的编码格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆