找到任何文件的编码有效途径 [英] Effective way to find any file's Encoding
问题描述
是一个最常见的问题,这件事情是模糊的我,因为我现在不太多了。
Yes is a most frequent question, and this matter is vague for me and since i don't now much about it.
但我想一个很precise的方式找到一个文件的编码。
所以precise记事本++是。
But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.
感谢。
推荐答案
的 StreamReader.CurrentEncoding
属性很少返回正确的文本文件编码为我。我有更大的成功确定文件的字节序,通过分析它的字节顺序标记(BOM):
The StreamReader.CurrentEncoding
property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM):
/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
// Read the BOM
var bom = new byte[4];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
file.Read(bom, 0, 4);
}
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
return Encoding.ASCII;
}
作为一个方面说明,你可能要修改这个方法的最后一行返回 Encoding.Default
代替,因此编码为操作系统的当前ANSI $ C $ ç页面默认情况下返回。
As a side note, you may want to modify the last line of this method to return Encoding.Default
instead, so the encoding for the OS's current ANSI code page is returned by default.
这篇关于找到任何文件的编码有效途径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!