Encode :: Guess可以告诉iso-8859-1中的utf-8吗? [英] Can Encode::Guess tell utf-8 from iso-8859-1?

查看:150
本文介绍了Encode :: Guess可以告诉iso-8859-1中的utf-8吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串$ data,以utf-8编码.我假设我不知道此字符串是utf-8还是iso-8859-1.我想使用Perl Encode :: Guess模块​​来查看它是一个还是另一个.我在弄清楚该模块的工作方式时遇到了麻烦.

I have a string $data, encoded in utf-8. I assume that I don't know whether this string is utf-8 or iso-8859-1. I want to use the Perl Encode::Guess module to see if it's one or the other. I'm having trouble figuring out how this module works.

我尝试了以下四种方法(来自 http://perldoc.perl.org/Encode/Guess.html ):

I have tried the four following methods (from http://perldoc.perl.org/Encode/Guess.html) :

use Encode::Guess qw/utf8 latin1/;

my $decoder = guess_encoding($data);

print "$decoder\n";

结果:iso-8859-1或utf8

Result: iso-8859-1 or utf8

use Encode::Guess qw/utf8 latin1/;

my $enc = guess_encoding($data, qw/utf8 latin1/);
ref($enc) or die "Can't guess: $enc";
my $utf8 = $enc->decode($data); 

print "$utf8\n";

结果:不能猜测:iso-8859-1或utf8在encodage-windows.pl第25行,第18110行.

Result: Can't guess: iso-8859-1 or utf8 at encodage-windows.pl line 25, line 18110.

use Encode::Guess qw/utf8 latin1/;

my $decoder = Encode::Guess->guess($data);
die $decoder unless ref($decoder);
my $utf8 = $decoder->decode($data);

print "$utf8\n";

结果:位于encodage-windows.pl第30行,第18110行的iso-8859-1或utf8.

Result: iso-8859-1 or utf8 at encodage-windows.pl line 30, line 18110.

use Encode::Guess qw/utf8 latin1/;

my $utf8 = Encode::decode("Guess", $data);

print "$utf8\n";

结果:位于/usr/local/lib/perl5/Encode.pm第175行的iso-8859-1或utf8.

Result: iso-8859-1 or utf8 at /usr/local/lib/perl5/Encode.pm line 175.

我的第一个问题是:我应该使用其中一种方法(如果有)? 我的第二个问题:要使这项工作有效,我应该做出哪些更改?

My first question is: which one of these methods am I supposed to use (if any)? And my second question: what changes should I make to make this work?

推荐答案

我通常一次检查一次可能的编码,就像这样

I normally check the possible encodings one at a time, like this

my $decoder = guess_encoding($data, 'utf8');
$decoder = guess_encoding($data, 'iso-8859-1') unless ref $decoder;
die $decoder unless ref $decoder;

printf "Decoding as %s\n\n", $decoder->name;
$data = $decoder->decode($data);

如果可能的话,它选择UTF-8,否则尝试ISO-8859-1,然后选择它或错误,因此对于每种编码它都变成简单的是/否结果,并且没有办法提出两个可能的结果(这就是您得到的错误).

If possible it chooses UTF-8, otherwise it tries ISO-8859-1, and either chooses that or errors, so it becomes a simple yes/no result for each encoding and there is no way for it to come up with two possible results (which is the error you're getting).

这篇关于Encode :: Guess可以告诉iso-8859-1中的utf-8吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆