为什么我的 Perl 测试因使用编码“utf8"而失败? [英] Why do my Perl tests fail with use encoding 'utf8'?

查看:21
本文介绍了为什么我的 Perl 测试因使用编码“utf8"而失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这个测试脚本感到困惑:

I'm puzzled with this test script:

#!perl

use strict;
use warnings;
use encoding 'utf8';
use Test::More 'no_plan';

ok('áá' =~ m/á/, 'ok direct match');

my $re = qr{á};
ok('áá' =~ m/$re/, 'ok qr-based match');

like('áá', $re, 'like qr-based match');

三个测试都失败了,但我期望 use encoding 'utf8' 会同时升级文字 ááqr-基于正则表达式到 utf8 字符串,从而通过测试.

The three tests fail, but I was expecting that the use encoding 'utf8' would upgrade both the literal áá and the qr-based regexps to utf8 strings, and thus passing the tests.

如果我删除 use encoding 行,测试会按预期通过,但我不知道为什么它们会在 utf8 模式下失败.

If I remove the use encoding line the tests pass as expected, but I can't figure it out why would they fail in utf8 mode.

我在 Mac OS X(系统版本)上使用 perl 5.8.8.

I'm using perl 5.8.8 on Mac OS X (system version).

推荐答案

不要使用 编码 pragma.它坏了.(Juerd Waalboer 在 YAPC::EU 2k8 上做了一个很棒的演讲,他提到了这一点.)

Do not use the encoding pragma. It’s broken. (Juerd Waalboer gave a great talk where he mentioned this at YAPC::EU 2k8.)

它至少同时做两件不属于一起的事情:

It does at least two things at once that do not belong together:

  1. 它为您的源文件指定编码.
  2. 它指定文件输入/输出的编码.

为了增加侮辱的伤害,它也以一种破碎的方式做 #1:它重新解释 xNN 序列作为未解码的八位字节,而不是像代码点一样对待它们,并对它们进行解码,防止你能够表达您指定的编码之外的字符,并使您的源代码根据编码具有不同的含义.这简直是​​大错特错.

And to add injury to insult it also does #1 in a broken fashion: it reinterprets xNN sequences as being undecoded octets as opposed to treating them like codepoints, and decodes them, preventing you from being able to express characters outside the encoding you specified and making your source code mean different things depending on the encoding. That’s just astonishingly wrong.

仅使用 ASCII 或 UTF-8 编写源代码.在后一种情况下,utf8 pragma 是正确的使用方法.如果您不想使用 UTF-8,但确实想包含非 ASCII 字符,请显式转义或解码它们.

Write your source code in ASCII or UTF-8 only. In the latter case, the utf8 pragma is the correct thing to use. If you don’t want to use UTF-8, but you do want to include non-ASCII charcters, escape or decode them explicitly.

并显式使用 I/O 层或使用 open pragma 让 I/O 自动正确转码.

And use I/O layers explicitly or set them using the open pragma to have I/O automatically transcoded properly.

这篇关于为什么我的 Perl 测试因使用编码“utf8"而失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆