Perl 的使用编码编译指示破坏 UTF 字符串 [英] Perl's use encoding pragma breaking UTF strings

查看:63
本文介绍了Perl 的使用编码编译指示破坏 UTF 字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Perl 和编码编译指示有疑问.

I have a problem with Perl and Encoding pragma.

(我在任何地方都使用 utf-8,在输入、输出、perl 脚本本身.我不想使用其他编码,永远不会.)

(I use utf-8 everywhere, in input, output, the perl scripts themselves. I don't want to use other encoding, never ever.)

不过.当我写

binmode(STDOUT, ':utf8');
use utf8;
$r = "\x{ed}";
print $r;

我看到字符串 "í"(这是我想要的 - 并且什么是 U+00ED Unicode 字符).但是当我像这样添加使用编码"编译指示时

I see the string "í" (which is what I want - and what is U+00ED unicode char). But when I add the "use encoding" pragma like this

binmode(STDOUT, ':utf8');
use utf8;
use encoding 'utf8';
$r = "\x{ed}";
print $r;

我看到的只是一个盒子字符.为什么?

all I see is a box character. Why?

此外,当我添加 Data::Dumper 并让 Dumper 像这样打印新字符串时

Moreover, when I add Data::Dumper and let the Dumper print the new string like this

binmode(STDOUT, ':utf8');
use utf8;
use encoding 'utf8';
$r = "\x{ed}";
use Data::Dumper;
print Dumper($r);

我看到 perl 将字符串 更改为 "\x{fffd}".为什么?

I see that perl changed the string to "\x{fffd}". Why?

推荐答案

use encoding 'utf8' 已损坏.它没有将 \x{ed} 解释为代码点 U+00ED,而是将其解释为单字节 237,然后尝试将其解释为 UTF-8.哪个当然失败了,所以它最终用替换字符 U+FFFD 替换它,字面意思是"".

use encoding 'utf8' is broken. Rather than interpreting \x{ed} as the code point U+00ED, it interprets it as the single byte 237 and then tries to interpret that as UTF-8. Which of course fails, so it winds up replacing it with the replacement character U+FFFD, literally "�".

坚持使用 use utf8 来指定您的源代码是 UTF-8,以及 binmodeopen pragma 指定文件句柄的编码.

Just stick with use utf8 to specify that your source is in UTF-8, and binmode or the open pragma to specify the encoding for your file handles.

这篇关于Perl 的使用编码编译指示破坏 UTF 字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆