utf8和不同的utp8有何不同? [英] How differs the open pragma with different utf8?

查看:132
本文介绍了utf8和不同的utp8有何不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这三个版本的行为都不同吗?

Do these three versions all behave differently?

use open qw( :encoding(UTF-8) :std );  
use open qw( :encoding(UTF8) :std );  
use open qw( :utf8 :std );  

推荐答案

首先,:utf8仅将文本标记为UTF-8,而不检查其是否有效.有关信息,请参见有关PerlMonks的帖子.

Firstly, :utf8 only markes the text as UTF-8 it does not check that it is valid. See this post on PerlMonks for information.

:encoding是PerlIO perl perldoc perliol

:encoding is an Extension Layer to PerlIO, perl perldoc perliol

:encoding"使用编码; 使该层可用,尽管PerlIO.pm知道"在哪里可以找到它.这是一个接受参数的图层示例:open($ fh,<:encoding(iso-8859-7)",$ pathname);

":encoding" use Encoding; makes this layer available, although PerlIO.pm "knows" where to find it. It is an example of a layer which takes an argument as it is called thus: open( $fh, "<:encoding(iso-8859-7)", $pathname );

在常见问题解答perldoc perlunifaq

:encoding"和:utf8"之间有什么区别?由于UTF-8是Perl的内部格式之一,因此您通常可以跳过编码或解码步骤,然后操作直接使用UTF8标志.代替:encoding(UTF-8)",您可以简单地使用:utf8",如果数据已经在内部表示为UTF8,则跳过编码步骤.在编写时,这被普遍认为是一种良好的行为,但是在读取时可能会很危险,因为当字节序列无效时,这会导致内部不一致.使用:utf8"进行输入有时可能会导致安全漏洞,因此请改用:encoding(UTF-8)".可以使用"_utf8_on"和"_utf8_off"代替解码"和编码",但是这被认为是不好的样式.尤其是"_utf8_on"可能很危险,原因与:utf8"相同.有一些简单的捷径可供选择.请参见perlrun中的"-C".

What is the difference between ":encoding" and ":utf8"? Because UTF-8 is one of Perl's internal formats, you can often just skip the encoding or decoding step, and manipulate the UTF8 flag directly. Instead of ":encoding(UTF-8)", you can simply use ":utf8", which skips the encoding step if the data was already represented as UTF8 internally. This is widely accepted as good behavior when you're writing, but it can be dangerous when reading, because it causes internal inconsistency when you have invalid byte sequences. Using ":utf8" for input can sometimes result in security breaches, so please use ":encoding(UTF-8)" instead. Instead of "decode" and "encode", you could use "_utf8_on" and "_utf8_off", but this is considered bad style. Especially "_utf8_on" can be dangerous, for the same reason that ":utf8" can. There are some shortcuts for oneliners; see "-C" in perlrun.

"UTF-8"和"utf8"有什么区别?"UTF-8"是官方标准. "utf8"是Perl接受的自由方式.如果您必须与不太自由的事物进行交流,则可以考虑使用"UTF-8".如果您必须与过于自由的事物进行交流,则可能必须使用"utf8".完整说明在Encode中. "UTF-8"在内部被称为"utf-8-strict".本教程始终使用UTF-8,即使在内部实际使用utf8的情况下,也是如此,因为区分起来很难,而且几乎没有关系.例如,utf8可以用于Unicode中不存在的代码点(例如9999999),但是如果将其编码为UTF-8,则会得到一个替换字符(默认;有关更多信息,请参见编码中的处理格式错误的数据").处理方法.)好吧,如果您坚持认为:内部格式"是utf8,而不是UTF-8. (当不是其他编码时.)

What's the difference between "UTF-8" and "utf8"? "UTF-8" is the official standard. "utf8" is Perl's way of being liberal in what it accepts. If you have to communicate with things that aren't so liberal, you may want to consider using "UTF-8". If you have to communicate with things that are too liberal, you may have to use "utf8". The full explanation is in Encode. "UTF-8" is internally known as "utf-8-strict". The tutorial uses UTF-8 consistently, even where utf8 is actually used internally, because the distinction can be hard to make, and is mostly irrelevant. For example, utf8 can be used for code points that don't exist in Unicode, like 9999999, but if you encode that to UTF-8, you get a substitution character (by default; see "Handling Malformed Data" in Encode for more ways of dealing with this.) Okay, if you insist: the "internal format" is utf8, not UTF-8. (When it's not some other encoding.)

open编译指示(即use open)仅设置用于输入和输出的默认PerlIO层:std执行以下操作,

The open pragma (ie., use open) only sets the default PerlIO layers for input and output; :std does the following,

:std"子编译指令本身无效,但是如果与:utf8"或:encoding"子编译指令结合使用,它将转换标准文件句柄(STDIN,STDOUT,STDERR)以符合为输入/输出手柄.例如,如果输入和输出都选择为:encoding(utf8)",则:std"表示STDIN,STDOUT和STDERR也在:encoding(utf8)"中.另一方面,如果仅将输出选择为:encoding(koi8r)",则:std"将仅使STDOUT和STDERR处于"koi8r". :locale"子用法隐式打开了:std".

The ":std" subpragma on its own has no effect, but if combined with the ":utf8" or ":encoding" subpragmas, it converts the standard filehandles (STDIN, STDOUT, STDERR) to comply with encoding selected for input/output handles. For example, if both input and out are chosen to be ":encoding(utf8)", a ":std" will mean that STDIN, STDOUT, and STDERR are also in ":encoding(utf8)". On the other hand, if only output is chosen to be in ":encoding(koi8r)", a ":std" will cause only the STDOUT and STDERR to be in "koi8r". The ":locale" subpragma implicitly turns on ":std".

所以:std是一个子实用程序(特定于open.pm),它设置标准流以接收Unicode输入perl :utf8,如上所述.

So :std is a subpragma (open.pm specific) that sets the Standard Streams to receive Unicode Input perl :utf8 as above.

这篇关于utf8和不同的utp8有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆