有效的方法来找到任何文件的编码 [英] Effective way to find any file's Encoding
问题描述
但是我想要一个非常精确的方式找到文件编码。
如Notepad ++那样精确。
StreamReader.CurrentEncoding
属性很少为我返回正确的文本文件编码。通过分析其字节顺序标记(BOM)可以确定文件的字节顺序:
///<总结>
///通过分析其字节顺序标记(BOM)来确定文本文件的编码。
///当检测到文本文件的字节顺序失败时,默认为ASCII。
///< / summary>
///< param name =filename>要分析的文本文件。< / param>
///< returns>检测到的编码。< / returns>
public static Encoding GetEncoding(string filename)
{
//读取BOM
var bom = new byte [4];
使用(var file = new FileStream(filename,FileMode.Open,FileAccess.Read))
{
file.Read(bom,0,4);
}
//分析BOM
如果(bom [0] == 0x2b&& bom [1] == 0x2f&& bom [2] == 0x76)return Encoding.UTF7;
if(bom [0] == 0xef&& bom [1] == 0xbb&& bom [2] == 0xbf)return Encoding.UTF8;
if(bom [0] == 0xff&& bom [1] == 0xfe)return Encoding.Unicode; // UTF-16LE
if(bom [0] == 0xfe&& bom [1] == 0xff)return Encoding.BigEndianUnicode; // UTF-16BE
if(bom [0] == 0&& bom [1] == 0&& bom [2] == 0xfe&& bom [3] == 0xff)return Encoding.UTF32;
return Encoding.ASCII;
}
作为附注,您可能需要修改此方法的最后一行要返回 Encoding.Default
,所以默认情况下会返回操作系统当前的ANSI代码页的编码。
Yes is a most frequent question, and this matter is vague for me and since I don't know much about it.
But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.
The StreamReader.CurrentEncoding
property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM):
/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
// Read the BOM
var bom = new byte[4];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
file.Read(bom, 0, 4);
}
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
return Encoding.ASCII;
}
As a side note, you may want to modify the last line of this method to return Encoding.Default
instead, so the encoding for the OS's current ANSI code page is returned by default.
这篇关于有效的方法来找到任何文件的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!