如何在文件编码不明使用ReadAllText [英] How to use ReadAllText when file encoding unknown

查看:567
本文介绍了如何在文件编码不明使用ReadAllText的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即时阅读的文件是 ReadAllText

    String[] values = File.ReadAllText(@"c:\\c\\file.txt").Split(';');

    int i = 0;

    foreach (String s in values)
    {
        System.Console.WriteLine("output: {0} {1} ", i, s);
        i++;
    }

如果我尝试阅读一些文件,我得到有时错误的字符后面(的​​ÖÜÄÀ...)。输出类似,其原因是与编码一些麻烦'?':

If I try to read some files I get sometimes the the wrong character back (for ÖÜÄÀ...). The output is like '?', its because there is some trouble with the encoding:

output: 0 TEST
output: 1 A??O?

一个解决办法是设置在ReadAllText编码,可以说像 ReadAllText(@C:\\ \\Çfile.txt的,Encoding.UTF8),可以解决这个问题。但是,如果我仍然会得到什么?作为输出?如果我不知道该文件的编码?而如果每一个文件有不同的编码?什么是用C#做到这一点的最好方法是什么?谢谢

One solution would be to set the encoding in ReadAllText, lets say something like ReadAllText(@"c:\\c\\file.txt", Encoding.UTF8) that could fix the problem. But what if I would still get '?' as output? What if I dont know the encoding of the file? And what if every single file got a different encoding? What would be the best way to do it with c#? Thank you

推荐答案

可靠地做到这一点的唯一方法是寻找字节顺序标记在文本文件的开头。 (此Blob更普遍地重presents字符编码所使用的字节顺序,而且编码 - 如UTF8,UTF16,UTF32)。不幸的是,这种方法仅适用于统一code为基础的编码,并没有在这之前(对此不太可靠的方法,必须使用)。

The only way to reliably do this is to look for byte order marks at the start of the text file. (This blob more generally represents the endianness of character encoding used, but also the encoding - e.g. UTF8, UTF16, UTF32). Unfortunately, this method only works for Unicode-based encodings, and nothing before that (for which much less reliable methods must be used).

的<一个href="http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx"><$c$c>StreamReader类型支持检测这些标志,以确定编码 - 你只需要一个标志传递给该参数,因为这样的:

The StreamReader type supports detecting these marks to determine the encoding - you simply need to pass a flag to the parameter as such:

new System.IO.StreamReader("path", true)

您可以检查 stremReader.CurrentEncoding 的值来确定所使用的文件的编码。然而,需要注意的是,如果没有字节编码标记存在,那么 CurrentEncoding 将默认为 Encoding.Default

You can then check the value of stremReader.CurrentEncoding to determine the encoding used by the file. Note however that if no byte encoding marks exist, then CurrentEncoding will default to Encoding.Default.

参考$ C $的CProject的解决方案来检测编码

这篇关于如何在文件编码不明使用ReadAllText的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆