为什么我的 Perl 测试因使用编码“utf8"而失败? [英] Why do my Perl tests fail with use encoding 'utf8'?
问题描述
我对这个测试脚本感到困惑:
I'm puzzled with this test script:
#!perl
use strict;
use warnings;
use encoding 'utf8';
use Test::More 'no_plan';
ok('áá' =~ m/á/, 'ok direct match');
my $re = qr{á};
ok('áá' =~ m/$re/, 'ok qr-based match');
like('áá', $re, 'like qr-based match');
三个测试都失败了,但我期望 use encoding 'utf8'
会同时升级文字 áá
和 qr
-基于正则表达式到 utf8 字符串,从而通过测试.
The three tests fail, but I was expecting that the use encoding 'utf8'
would upgrade both the literal áá
and the qr
-based regexps to utf8 strings, and thus passing the tests.
如果我删除 use encoding
行,测试会按预期通过,但我不知道为什么它们会在 utf8
模式下失败.
If I remove the use encoding
line the tests pass as expected, but I can't figure it out why would they fail in utf8
mode.
我在 Mac OS X(系统版本)上使用 perl 5.8.8.
I'm using perl 5.8.8 on Mac OS X (system version).
推荐答案
不要使用 编码
pragma.它坏了.(Juerd Waalboer 在 YAPC::EU 2k8 上做了一个很棒的演讲,他提到了这一点.)
Do not use the encoding
pragma. It’s broken. (Juerd Waalboer gave a great talk where he mentioned this at YAPC::EU 2k8.)
它至少同时做两件不属于一起的事情:
It does at least two things at once that do not belong together:
- 它为您的源文件指定编码.
- 它指定文件输入/输出的编码.
为了增加侮辱的伤害,它也以一种破碎的方式做 #1:它重新解释 xNN
序列作为未解码的八位字节,而不是像代码点一样对待它们,并对它们进行解码,防止你能够表达您指定的编码之外的字符,并使您的源代码根据编码具有不同的含义.这简直是大错特错.
And to add injury to insult it also does #1 in a broken fashion: it reinterprets xNN
sequences as being undecoded octets as opposed to treating them like codepoints, and decodes them, preventing you from being able to express characters outside the encoding you specified and making your source code mean different things depending on the encoding. That’s just astonishingly wrong.
仅使用 ASCII 或 UTF-8 编写源代码.在后一种情况下,utf8
pragma 是正确的使用方法.如果您不想使用 UTF-8,但确实想包含非 ASCII 字符,请显式转义或解码它们.
Write your source code in ASCII or UTF-8 only. In the latter case, the utf8
pragma is the correct thing to use. If you don’t want to use UTF-8, but you do want to include non-ASCII charcters, escape or decode them explicitly.
并显式使用 I/O 层或使用 open
pragma 让 I/O 自动正确转码.
And use I/O layers explicitly or set them using the open
pragma to have I/O automatically transcoded properly.
这篇关于为什么我的 Perl 测试因使用编码“utf8"而失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!