有效的方法来找到任何文件的编码 [英] Effective way to find any file's Encoding

查看:111
本文介绍了有效的方法来找到任何文件的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是的,这是一个最常见的问题,对我来说这个问题是模糊的,因为我不太了解。



但是我想要一个非常精确的方式找到文件编码。
如Notepad ++那样精确。

解决方案

StreamReader.CurrentEncoding 属性很少为我返回正确的文本文件编码。通过分析其字节顺序标记(BOM)可以确定文件的字节顺序:

  ///<总结> 
///通过分析其字节顺序标记(BOM)来确定文本文件的编码。
///当检测到文本文件的字节顺序失败时,默认为ASCII。
///< / summary>
///< param name =filename>要分析的文本文件。< / param>
///< returns>检测到的编码。< / returns>
public static Encoding GetEncoding(string filename)
{
//读取BOM
var bom = new byte [4];
使用(var file = new FileStream(filename,FileMode.Open,FileAccess.Read))
{
file.Read(bom,0,4);
}

//分析BOM
如果(bom [0] == 0x2b&& bom [1] == 0x2f&& bom [2] == 0x76)return Encoding.UTF7;
if(bom [0] == 0xef&& bom [1] == 0xbb&& bom [2] == 0xbf)return Encoding.UTF8;
if(bom [0] == 0xff&& bom [1] == 0xfe)return Encoding.Unicode; // UTF-16LE
if(bom [0] == 0xfe&& bom [1] == 0xff)return Encoding.BigEndianUnicode; // UTF-16BE
if(bom [0] == 0&& bom [1] == 0&& bom [2] == 0xfe&& bom [3] == 0xff)return Encoding.UTF32;
return Encoding.ASCII;
}

作为附注,您可能需要修改此方法的最后一行要返回 Encoding.Default ,所以默认情况下会返回操作系统当前的ANSI代码页的编码。


Yes is a most frequent question, and this matter is vague for me and since I don't know much about it.

But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.

解决方案

The StreamReader.CurrentEncoding property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM):

/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
    // Read the BOM
    var bom = new byte[4];
    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
    {
        file.Read(bom, 0, 4);
    }

    // Analyze the BOM
    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
    return Encoding.ASCII;
}

As a side note, you may want to modify the last line of this method to return Encoding.Default instead, so the encoding for the OS's current ANSI code page is returned by default.

这篇关于有效的方法来找到任何文件的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆