确定文件编码 [英] Determine File Encoding

查看:32
本文介绍了确定文件编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


任何人都可以指出下面的方法中有任何明显的缺陷

来确定文件的可能编码吗?我知道编号类型的数字很小,但这只是因为我需要使用的

可能性是一个小清单。

Hi there,

Can anyone point out any really obvious flaws in the methodology below
to determine the likely encoding of a file, please? I know the number
of types of encoding is small, but that is only because the
possibilities I need to work with is a small list.

私有字符串determineFileEncoding(FileStream strm)
{long / originalSize = strm.Length;
StreamReader rdr = new StreamReader(strm);

strm.Position = 0;
System.Text.UTF8Encoding unic = new System.Text.UTF8Encoding();
byte [] inputFile = unic.GetBytes(rdr.ReadToEnd());
if(inputFile.Length == originalSize)
{
返回" UTF8" ;;
}

strm.Position = 0;
System.Text .UnicodeEncoding unic2 = new System.Text.UnicodeEncoding();
byte [] inputFile2 = unic2.GetBytes(rdr.ReadToEnd());
if(inputFile2.Length == originalSize)
{
返回" Unicode" ;;
}

strm.Position = 0;
System.Text.UTF7Encoding unic3 = new System.Text.UTF7Encoding();
byte [] inputFile3 = u nic3.GetBytes(rdr.ReadToEnd());
if(inputFile3.Length == originalSize)
{
返回" UTF7" ;;
}

System.Text.ASCIIEncoding unic4 = new System.Text.ASCIIEncoding();
byte [] inputFile4 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile4.Length == originalSize )
{
返回Ascii;
}
返回未知;
}
private string determineFileEncoding(FileStream strm)
{
long originalSize = strm.Length;
StreamReader rdr = new StreamReader(strm);

strm.Position = 0;
System.Text.UTF8Encoding unic = new System.Text.UTF8Encoding();
byte[] inputFile = unic.GetBytes(rdr.ReadToEnd());
if(inputFile.Length == originalSize)
{
return "UTF8";
}

strm.Position = 0;
System.Text.UnicodeEncoding unic2 = new System.Text.UnicodeEncoding();
byte[] inputFile2 = unic2.GetBytes(rdr.ReadToEnd());
if(inputFile2.Length == originalSize)
{
return "Unicode";
}

strm.Position = 0;
System.Text.UTF7Encoding unic3 = new System.Text.UTF7Encoding();
byte[] inputFile3 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile3.Length == originalSize)
{
return "UTF7";
}

System.Text.ASCIIEncoding unic4 = new System.Text.ASCIIEncoding();
byte[] inputFile4 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile4.Length == originalSize)
{
return "Ascii";
}

return "Not known";
}




提前致谢

Marc。



Thanks in advance
Marc.

推荐答案

为什么要读取整个文件以确定编码。难道你不能从开头的

指标字节中看出来吗?

如果我不太了解编码,请原谅我,但你的算法出现

表面效率极低。


-

--- Nick Malik [微软]

MCSD,CFPS,认证Scrummaster
http://blogs.msdn.com / nickmalik


免责声明:本论坛中表达的意见是我自己的意见,而不是我雇主的b $ b代表。

我不代表我的雇主回答问题。我只是一个帮助程序员的
程序员。

-

" Marc Jennings" <马********** @ community.nospam>在消息中写道

news:ch ******************************** @ 4ax.com ...
Why read the entire file to determine the encoding. Can''t you tell from the
indicator bytes at the beginning?

Forgive me if I don''t know much about encoding, but your algorithm appears
wildly inefficient on its face.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I''m just a
programmer helping programmers.
--
"Marc Jennings" <Ma**********@community.nospam> wrote in message
news:ch********************************@4ax.com...
你好,

任何人都可以指出下面的方法中确实有明显的缺陷来确定文件的可能编码吗?我知道编码类型的数量很小,但这只是因为我需要处理的可能性是一个小列表。
Hi there,

Can anyone point out any really obvious flaws in the methodology below
to determine the likely encoding of a file, please? I know the number
of types of encoding is small, but that is only because the
possibilities I need to work with is a small list.
private string determineFileEncoding(FileStream strm)
{long / originalSize = strm.Length;
StreamReader rdr = new StreamReader(strm);

strm.Position = 0;
System.Text.UTF8Encoding unic = new System.Text.UTF8Encoding();
byte [] inputFile = unic.GetBytes(rdr.ReadToEnd());
if(inputFile.Length == originalSize )
{
返回UTF8;
}

strm.Position = 0;
System.Text.UnicodeEncoding unic2 = new System.Text .UnicodeEncoding();
byte [] inputFile2 = unic2.GetBytes(rdr.ReadToEnd());
if(inputFile2.Length == originalSize)
{
返回" Unicode" ;;
}

strm.Position = 0;
System.Text.UTF7Encoding unic3 = new System.Text.UTF7Encoding();
byte [] inputFile3 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile3.Length == originalSize)
{
返回" UTF7" ;;
}

System.Text.ASCIIEncoding unic4 = new System。 Text.ASCIIEncoding();
byte [] inputFile4 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile4.Length == originalSize)
{
返回" ; Ascii;
}

返回未知;
}
private string determineFileEncoding(FileStream strm)
{
long originalSize = strm.Length;
StreamReader rdr = new StreamReader(strm);

strm.Position = 0;
System.Text.UTF8Encoding unic = new System.Text.UTF8Encoding();
byte[] inputFile = unic.GetBytes(rdr.ReadToEnd());
if(inputFile.Length == originalSize)
{
return "UTF8";
}

strm.Position = 0;
System.Text.UnicodeEncoding unic2 = new System.Text.UnicodeEncoding();
byte[] inputFile2 = unic2.GetBytes(rdr.ReadToEnd());
if(inputFile2.Length == originalSize)
{
return "Unicode";
}

strm.Position = 0;
System.Text.UTF7Encoding unic3 = new System.Text.UTF7Encoding();
byte[] inputFile3 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile3.Length == originalSize)
{
return "UTF7";
}

System.Text.ASCIIEncoding unic4 = new System.Text.ASCIIEncoding();
byte[] inputFile4 = unic3.GetBytes(rdr.ReadToEnd());
if(inputFile4.Length == originalSize)
{
return "Ascii";
}

return "Not known";
}



提前致谢
Marc。



Thanks in advance
Marc.



我不得不原谅你对编码不太了解。我知道

甚至更少。我同意算法*非常不合理,但实际上我还没有得到线索。 :-)这是从谷歌学习的乐趣。


周三,2005年6月1日06:27:22 -0700,Nick Malik [微软]

< ni ******* @ hotmail.nospam.com>写道:
I have to forgive you for not knowing too much about encoding. I know
even less. I agree that the algorithm *is* wildly inneficient, but
the fact is that I have not got a clue. :-) Such are the joys of
learning from Google.

On Wed, 1 Jun 2005 06:27:22 -0700, "Nick Malik [Microsoft]"
<ni*******@hotmail.nospam.com> wrote:
为什么要读取整个文件来确定编码。难道你不能从开头的
指标字节中看出来吗?

如果我对编码知之甚少,请原谅我,但你的算法看起来非常低效它的面孔。

-
--- Nick Malik [微软]
MCSD,CFPS,认证Scrummaster
http://blogs.msdn.com/nickmalik

免责声明:本论坛发表的意见均为我自己,而不是代表我的雇主。
我不代表我的雇主回答问题。我只是一个帮助程序员的程序员。
Why read the entire file to determine the encoding. Can''t you tell from the
indicator bytes at the beginning?

Forgive me if I don''t know much about encoding, but your algorithm appears
wildly inefficient on its face.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I''m just a
programmer helping programmers.






查看带有bool参数的StreamReader构造函数/>
从字节顺序标记确定编码。另请查看

Encoding.GetPreamble()方法。

" Marc Jennings"写道:
Check out the StreamReader constructors that take a bool argument to
determine the encoding from the byte order mark. Also check out the
Encoding.GetPreamble() method.
"Marc Jennings" wrote:
我不得不原谅你对编码不太了解。我知道
甚至更少。我同意算法*非常不合理,但事实是我没有线索。 :-)这是从谷歌学习的乐趣。

2005年6月1日星期三06:27:22 -0700,Nick Malik [微软]< ni ******* @ hotmail.nospam.com>写道:
I have to forgive you for not knowing too much about encoding. I know
even less. I agree that the algorithm *is* wildly inneficient, but
the fact is that I have not got a clue. :-) Such are the joys of
learning from Google.

On Wed, 1 Jun 2005 06:27:22 -0700, "Nick Malik [Microsoft]"
<ni*******@hotmail.nospam.com> wrote:
为什么要读取整个文件来确定编码。难道你不能从开头的
指标字节中看出来吗?

如果我对编码知之甚少,请原谅我,但你的算法看起来非常低效它的面孔。

-
--- Nick Malik [微软]
MCSD,CFPS,认证Scrummaster
http://blogs.msdn.com/nickmalik

免责声明:本论坛发表的意见均为我自己,而不是代表我的雇主。
我不代表我的雇主回答问题。我只是一名帮助程序员的程序员。
Why read the entire file to determine the encoding. Can''t you tell from the
indicator bytes at the beginning?

Forgive me if I don''t know much about encoding, but your algorithm appears
wildly inefficient on its face.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I''m just a
programmer helping programmers.




这篇关于确定文件编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆