开放式STDIN/STDOUT如何正确处理并使用utf8编码? [英] How open STDIN/STDOUT handles and work with utf8 encoding correctly?

查看:306
本文介绍了开放式STDIN/STDOUT如何正确处理并使用utf8编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码中包含utf8字符.因此,我这样做:

I have utf8 characters in my code. So I do:

use utf8;

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line; # Wide character in print at ...

然后我认为我的STDOUT应该在utf8中:

Then I thought that my STDOUT should be in utf8:

use utf8;
use open IO => ':utf8 :std';

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line; # Wide character in print at ...

为什么当我说Perl在源代码包含utf8个字符时使用utf8时出现错误?

Why when I say perl to use utf8 while my source code has utf8 characters I get the error?

同时:

没有错误:

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line;

没有错误:

use open IO => ':utf8 :std';

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line;

我应该如何打开文件句柄并与utf8一起正常使用?

How I should open my filehandles and work correctly with utf8?

UPD
其实我有这段代码.不匹配:

UPD
Actually I have this code. It do not match:

use open IO => ':utf8 :std';

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match =  $line =~ m/(вiд|от|від)/i;
print "$line -> $1 \n";

很遗憾,正则表达式不匹配.输出为:

Unfortunately regex is not matched. The output is:

ЗГ. РАХ. №382 ВIД 03.02.2020Р ->

然后我添加utf8 pragma:

Then I add utf8 pragma:

use utf8;
use open IO => ':utf8 :std';

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match =  $line =~ m/(вiд|от|від)/i;
print "$line -> $1 \n";

现在正则表达式已匹配,但发出警告

Now regex is matched, but warning is issued

Wide character in print at t2.pl line 17.
ЗГ. РАХ. №382 ВIД 03.02.2020Р -> ВIД

推荐答案

在IRC中感谢@Grinnz

Thank @Grinnz in IRC

下一个代码有效:

use utf8;
use open ':encoding(UTF-8)', ':std';

my $line =  'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match =  $line =~ m/(вiд|от|від)/i;
print "$line -> $1 \n";

注意: @Grinnz建议使用 https://metacpan.org/pod/open::layers 因为:std is not a layer, it must be its own argument in the list

Notices: @Grinnz adviced to use https://metacpan.org/pod/open::layers because :std is not a layer, it must be its own argument in the list

我也不应该使用:utf8 因为

警告:请勿使用此层从UTF-8字节转换,因为无效的UTF-8或二进制数据将导致Perl字符串格式错误.用于输出时,它不太可能产生无效的UTF-8,尽管它将在EBCDIC系统上产生UTF-EBCDIC.最好使用:encoding(UTF-8)层(带连字符),因为它将确保有效的UTF-8字节和有效的Unicode字符之间的转换.

CAUTION: Do not use this layer to translate from UTF-8 bytes, as invalid UTF-8 or binary data will result in malformed Perl strings. It is unlikely to produce invalid UTF-8 when used for output, though it will instead produce UTF-EBCDIC on EBCDIC systems. The :encoding(UTF-8) layer (hyphen is significant) is preferred as it will ensure translation between valid UTF-8 bytes and valid Unicode characters.

这篇关于开放式STDIN/STDOUT如何正确处理并使用utf8编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆