查找任何文件编码的有效方法 [英] Effective way to find any file's Encoding
问题描述
是的,这是一个最常见的问题,这个问题对我来说很模糊,因为我对此知之甚少.
Yes is a most frequent question, and this matter is vague for me and since I don't know much about it.
但我想要一种非常精确的方法来查找文件编码.像 Notepad++ 一样精确.
But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.
推荐答案
StreamReader.CurrentEncoding
属性很少为我返回正确的文本文件编码.通过分析文件的字节顺序标记 (BOM),我在确定文件的字节序方面取得了更大的成功.如果文件没有 BOM,则无法确定文件的编码.
The StreamReader.CurrentEncoding
property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM). If the file does not have a BOM, this cannot determine the file's encoding.
*更新 4/08/2020 以包含 UTF-32LE 检测并返回正确的 UTF-32BE 编码
*UPDATED 4/08/2020 to include UTF-32LE detection and return correct encoding for UTF-32BE
/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
// Read the BOM
var bom = new byte[4];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
file.Read(bom, 0, 4);
}
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true); //UTF-32BE
// We actually have no idea what the encoding is if we reach this point, so
// you may wish to return null instead of defaulting to ASCII
return Encoding.ASCII;
}
这篇关于查找任何文件编码的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!