在C#中获取文件的编码类型 [英] Fetch the Encoding type of a File in C#

查看:131
本文介绍了在C#中获取文件的编码类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我正在尝试查找具有BOM,ANSI等的unicode,utf8,utf8等文件的编码类型.我能够找到除ANSI(Encoding.Default/Windows-1252)之外的所有编码类型.我无法区分ANSI和UTF8.尝试了不同的自定义类,例如(Ude,TextFileEncodingDetector等),它猜测了该类,但并不完全正确.有什么办法呢?

Hi,
I was trying to find the encoding type of a file like unicode, utf8, utf8 with BOM, ANSI etc. I was able to find all the encoding type but ANSI(Encoding.Default/Windows- 1252). I am not able to differentiate ANSI and UTF8. Tried different custom class like (Ude, TextFileEncodingDetector etc) which guesses it but not exactly right. Is there any way to do it?

推荐答案

除非文档中使用的任何字符> = 0x80,否则1252和UTF-8将无法区分(除非存在BOM) ).

如果确实使用了> = 0x80字符,则需要检查文档中的指示性指示符,请参阅:

http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout
Unless the document uses any characters >= 0x80, 1252 and UTF-8 would be indistinguishable (unless a BOM is present).

If it does use characters >= 0x80, it would be a matter of checking the documents for tell-tale indicators, see:

http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout


http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html [ ^ ]


这篇关于在C#中获取文件的编码类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆