我如何猜测Perl中字符串的编码? [英] How can I guess the encoding of a string in Perl?

查看：104 发布时间：2020/5/25 18:41:42 perl unicode string

本文介绍了我如何猜测Perl中字符串的编码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Unicode字符串，不知道它的编码是什么.当Perl程序读取此字符串时，Perl将使用默认编码吗?如果是这样，我怎么知道它是什么?

I have a Unicode string and don't know what its encoding is. When this string is read by a Perl program, is there a default encoding that Perl will use? If so, how can I find out what it is?

我正在尝试摆脱输入中的非ASCII字符.我在某个论坛上找到了它，

I am trying to get rid of non-ASCII characters from the input. I found this on some forum that will do it:

my $line = encode('ascii', normalize('KD', $myutf), sub {$_[0] = ''});

在未指定输入编码的情况下，上述方法将如何工作?是否应指定如下所示?

How will the above work when no input encoding is specified? Should it be specified like the following?

my $line = encode('ascii', normalize('KD', decode($myutf, 'input-encoding'), sub {$_[0] = ''});

推荐答案

要找出未知编码在哪种编码中使用，您只需尝试一下即可.模块 Encode :: Detect 和 Encode :: Detective .)

To find out in which encoding something unknown uses, you just have to try and look. The modules Encode::Detect and Encode::Guess automate that. (If you have trouble compiling Encode::Detect, try its fork Encode::Detective instead.)

use Encode::Detect::Detector;
my $unknown = "\x{54}\x{68}\x{69}\x{73}\x{20}\x{79}\x{65}\x{61}\x{72}\x{20}".
              "\x{49}\x{20}\x{77}\x{65}\x{6e}\x{74}\x{20}\x{74}\x{6f}\x{20}".
              "\x{b1}\x{b1}\x{be}\x{a9}\x{20}\x{50}\x{65}\x{72}\x{6c}\x{20}".
              "\x{77}\x{6f}\x{72}\x{6b}\x{73}\x{68}\x{6f}\x{70}\x{2e}";
my $encoding_name = Encode::Detect::Detector::detect($unknown);
print $encoding_name; # gb18030

use Encode;
my $string = decode($encoding_name, $unknown);

我发现encode 'ascii'是摆脱非ASCII字符的la脚解决方案.一切都将替换为问号；这太有损了而无用.

I find encode 'ascii' is a lame solution for getting rid of non-ASCII characters. Everything will be substituted with questions marks; this is too lossy to be useful.

# Bad example; don't do this.
use utf8;
use Encode;
my $string = 'This year I went to 北京 Perl workshop.';
print encode('ascii', $string); # This year I went to ?? Perl workshop.

如果您想要可读的ASCII文本，建议改用 Text :: Unidecode .这也是一种有损编码，但不如上面的encode可怕.

If you want readable ASCII text, I recommend Text::Unidecode instead. This, too, is a lossy encoding, but not as terrible as plain encode above.

use utf8;
use Text::Unidecode;
my $string = 'This year I went to 北京 Perl workshop.';
print unidecode($string); # This year I went to Bei Jing  Perl workshop.

但是，如果可以的话，请避免使用那些有损编码.如果以后要撤消操作，请选择PERLQQ或XMLCREF之一.

However, avoid those lossy encodings if you can help it. In case you want to reverse the operation later, pick either one of PERLQQ or XMLCREF.

use utf8;
use Encode qw(encode PERLQQ XMLCREF);
my $string = 'This year I went to 北京 Perl workshop.';
print encode('ascii', $string, PERLQQ);  # This year I went to \x{5317}\x{4eac} Perl workshop.
print encode('ascii', $string, XMLCREF); # This year I went to &#x5317;&#x4eac; Perl workshop.

这篇关于我如何猜测Perl中字符串的编码?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何猜测Perl中字符串的编码? [英] How can I guess the encoding of a string in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何猜测Perl中字符串的编码? [英] How can I guess the encoding of a string in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭