我如何猜测 Perl 中字符串的编码? [英] How can I guess the encoding of a string in Perl?

查看：21 发布时间：2021/12/10 18:10:17 perl unicode string

本文介绍了我如何猜测 Perl 中字符串的编码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 Unicode 字符串，但不知道它的编码是什么.当 Perl 程序读取此字符串时，是否有 Perl 将使用的默认编码?如果是这样，我怎样才能知道它是什么?

I have a Unicode string and don't know what its encoding is. When this string is read by a Perl program, is there a default encoding that Perl will use? If so, how can I find out what it is?

我试图从输入中去除非 ASCII 字符.我在一些论坛上发现了这个:

I am trying to get rid of non-ASCII characters from the input. I found this on some forum that will do it:

my $line = encode('ascii', normalize('KD', $myutf), sub {$_[0] = ''});

当没有指定输入编码时，上述如何工作?应该像下面这样指定吗?

How will the above work when no input encoding is specified? Should it be specified like the following?

my $line = encode('ascii', normalize('KD', decode($myutf, 'input-encoding'), sub {$_[0] = ''});

推荐答案

要找出未知的东西在哪种编码中使用，您只需尝试查看即可.模块 Encode::Detect 和 Encode::Guess 自动化.(如果您在编译 Encode::Detect 时遇到问题，请尝试使用它的 fork Encode::Detective.)

To find out in which encoding something unknown uses, you just have to try and look. The modules Encode::Detect and Encode::Guess automate that. (If you have trouble compiling Encode::Detect, try its fork Encode::Detective instead.)

use Encode::Detect::Detector;
my $unknown = "x{54}x{68}x{69}x{73}x{20}x{79}x{65}x{61}x{72}x{20}".
              "x{49}x{20}x{77}x{65}x{6e}x{74}x{20}x{74}x{6f}x{20}".
              "x{b1}x{b1}x{be}x{a9}x{20}x{50}x{65}x{72}x{6c}x{20}".
              "x{77}x{6f}x{72}x{6b}x{73}x{68}x{6f}x{70}x{2e}";
my $encoding_name = Encode::Detect::Detector::detect($unknown);
print $encoding_name; # gb18030

use Encode;
my $string = decode($encoding_name, $unknown);

我发现 encode 'ascii' 是摆脱非 ASCII 字符的蹩脚解决方案.一切都会被问号代替；这太有损而无用.

I find encode 'ascii' is a lame solution for getting rid of non-ASCII characters. Everything will be substituted with questions marks; this is too lossy to be useful.

# Bad example; don't do this.
use utf8;
use Encode;
my $string = 'This year I went to 北京 Perl workshop.';
print encode('ascii', $string); # This year I went to ?? Perl workshop.

如果你想要可读的 ASCII 文本，我推荐 Text::Unidecode.这也是一种有损编码，但不像上面的纯 encode 那样糟糕.

If you want readable ASCII text, I recommend Text::Unidecode instead. This, too, is a lossy encoding, but not as terrible as plain encode above.

use utf8;
use Text::Unidecode;
my $string = 'This year I went to 北京 Perl workshop.';
print unidecode($string); # This year I went to Bei Jing  Perl workshop.

但是，如果可以，请避免使用那些有损编码.如果您想稍后撤消操作，请选择 PERLQQ 或 XMLCREF 之一.

However, avoid those lossy encodings if you can help it. In case you want to reverse the operation later, pick either one of PERLQQ or XMLCREF.

use utf8;
use Encode qw(encode PERLQQ XMLCREF);
my $string = 'This year I went to 北京 Perl workshop.';
print encode('ascii', $string, PERLQQ);  # This year I went to x{5317}x{4eac} Perl workshop.
print encode('ascii', $string, XMLCREF); # This year I went to &#x5317;&#x4eac; Perl workshop.

这篇关于我如何猜测 Perl 中字符串的编码?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何猜测 Perl 中字符串的编码? [英] How can I guess the encoding of a string in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何猜测 Perl 中字符串的编码? [英] How can I guess the encoding of a string in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭