'use utf8;' 的使用给我“印刷中的宽字符" [英] Use of 'use utf8;' gives me 'Wide character in print'

查看:15
本文介绍了'use utf8;' 的使用给我“印刷中的宽字符"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我运行以下 Perl 程序:

If I run the following Perl program:

perl -e 'use utf8; print "鸡
";'

我收到此警告:

Wide character in print at -e line 1.

如果我运行这个 Perl 程序:

If I run this Perl program:

perl -e 'print "鸡
";'

我没有收到警告.

我认为在 Perl 脚本中使用 UTF-8 字符需要 use utf8.为什么这不起作用,我该如何解决?我正在使用 Perl 5.16.2.如果这是在文件中而不是命令行上的一行代码,我会遇到同样的问题.

I thought use utf8 was required to use UTF-8 characters in a Perl script. Why does this not work and how can I fix it? I'm using Perl 5.16.2. I have the same issue if this is in a file instead of being a one liner on the command line.

推荐答案

不使用 use utf8 Perl 将您的字符串解释为单字节字符序列.从这里可以看出,您的字符串中有四个字节:

Without use utf8 Perl interprets your string as a sequence of single byte characters. There are four bytes in your string as you can see from this:

$ perl -E 'say join ":", map { ord } split //, "鸡
";'
233:184:161:10

前三个字节组成你的字符,最后一个是换行符.

The first three bytes make up your character, the last one is the line-feed.

print 的调用将这四个字符发送到 STDOUT.然后您的控制台会计算出如何显示这些字符.如果您的控制台设置为使用 UTF8,那么它会将这三个字节解释为您的单个字符,这就是显示的内容.

The call to print sends these four characters to STDOUT. Your console then works out how to display these characters. If your console is set to use UTF8, then it will interpret those three bytes as your single character and that is what is displayed.

如果我们加入utf8 模块,事情就不一样了.在这种情况下,Perl 将您的字符串解释为两个字符.

If we add in the utf8 module, things are different. In this case, Perl interprets your string as just two characters.

$ perl -Mutf8 -E 'say join ":", map { ord } split //, "鸡
";'
40481:10

默认情况下,Perl 的 IO 层假定它使用单字节字符.所以当你尝试打印一个多字节字符时,Perl 认为有问题并给你一个警告.与以往一样,您可以通过包含 use diagnostics 来获得对此错误的更多解释.它会说:

By default, Perl's IO layer assumes that it is working with single-byte characters. So when you try to print a multi-byte character, Perl thinks that something is wrong and gives you a warning. As ever, you can get more explanation for this error by including use diagnostics. It will say this:

(S utf8) Perl 遇到了一个宽字符 (>255) 时出乎意料一.默认情况下,此警告对于 I/O(如打印)是启用的.最简单的消除此警告的方法只是将 :utf8 层添加到输出,例如binmode 标准输出,':utf8'.关机的另一种方式警告是不添加警告'utf8';但这通常更接近作弊.通常,您应该明确标记带有编码的文件句柄,参见 open 和 perlfunc/binmode.

(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode.

正如其他人指出的那样,您需要告诉 Perl 接受多字节输出.有很多方法可以做到这一点(一些示例请参见 Perl Unicode 教程).最简单的方法之一是使用 -CS 命令行标志 - 它告诉三个标准文件句柄(STDIN、STDOUT 和 STDERR)处理 UTF8.

As others have pointed out you need to tell Perl to accept multi-byte output. There are many ways to do this (see the Perl Unicode Tutorial for some examples). One of the simplest ways is to use the -CS command line flag - which tells the three standard filehandles (STDIN, STDOUT and STDERR) to deal with UTF8.

$ perl -Mutf8 -e 'print "鸡
";'
Wide character in print at -e line 1.
鸡

对比

$ perl -Mutf8 -CS -e 'print "鸡
";'
鸡

Unicode 是一个庞大而复杂的领域.正如您所见,许多简单的程序似乎都在做正确的事情,但出于错误的原因.当您开始修复程序的一部分时,事情通常会变得更糟,直到您修复了所有程序.

Unicode is a big and complex area. As you've seen, many simple programs appear to do the right thing, but for the wrong reasons. When you start to fix part of the program, things will often get worse until you've fixed all of the program.

这篇关于'use utf8;' 的使用给我“印刷中的宽字符"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆