Perl 的使用编码编译指示破坏 UTF 字符串 [英] Perl's use encoding pragma breaking UTF strings
问题描述
我对 Perl 和编码编译指示有疑问.
I have a problem with Perl and Encoding pragma.
(我在任何地方都使用 utf-8,在输入、输出、perl 脚本本身.我不想使用其他编码,永远不会.)
(I use utf-8 everywhere, in input, output, the perl scripts themselves. I don't want to use other encoding, never ever.)
不过.当我写
binmode(STDOUT, ':utf8');
use utf8;
$r = "\x{ed}";
print $r;
我看到字符串 "í"(这是我想要的 - 并且什么是 U+00ED Unicode 字符).但是当我像这样添加使用编码"编译指示时
I see the string "í" (which is what I want - and what is U+00ED unicode char). But when I add the "use encoding" pragma like this
binmode(STDOUT, ':utf8');
use utf8;
use encoding 'utf8';
$r = "\x{ed}";
print $r;
我看到的只是一个盒子字符.为什么?
all I see is a box character. Why?
此外,当我添加 Data::Dumper 并让 Dumper 像这样打印新字符串时
Moreover, when I add Data::Dumper and let the Dumper print the new string like this
binmode(STDOUT, ':utf8');
use utf8;
use encoding 'utf8';
$r = "\x{ed}";
use Data::Dumper;
print Dumper($r);
我看到 perl 将字符串 更改为 "\x{fffd}"
.为什么?
I see that perl changed the string to "\x{fffd}"
. Why?
推荐答案
use encoding 'utf8'
已损坏.它没有将 \x{ed}
解释为代码点 U+00ED,而是将其解释为单字节 237,然后尝试将其解释为 UTF-8.哪个当然失败了,所以它最终用替换字符 U+FFFD 替换它,字面意思是"".
use encoding 'utf8'
is broken. Rather than interpreting \x{ed}
as the code point U+00ED, it interprets it as the single byte 237 and then tries to interpret that as UTF-8. Which of course fails, so it winds up replacing it with the replacement character U+FFFD, literally "�".
坚持使用 use utf8
来指定您的源代码是 UTF-8,以及 binmode
或 open pragma 指定文件句柄的编码.
Just stick with use utf8
to specify that your source is in UTF-8, and binmode
or the open pragma to specify the encoding for your file handles.
这篇关于Perl 的使用编码编译指示破坏 UTF 字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!