我是否正确使用utf8 :: is_utf8? [英] Am I using utf8::is_utf8 correctly?

查看:139
本文介绍了我是否正确使用utf8 :: is_utf8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这正常工作吗?一些错误消息已经被解码,而一些需要解码的错误消息却得到了正确的输出.

Does this work correctly? Some error messages are already decode and some need do be decoded do get a correct output.

#!/usr/bin/env perl
use warnings;
use strict;
use utf8;
use open qw(:utf8 :std);
use Encode qw(decode_utf8);

# ...

if ( not eval{
    # some error-messages (utf8) are decoded some are not
    1 }
) {
    if ( utf8::is_utf8 $@ ) {
        print $@;
    }
    else {
        print decode_utf8( $@ );
    }
}

推荐答案

我正确使用utf8 :: is_utf8吗?

Am I using utf8::is_utf8 correctly?

不. utf8::is_utf8的任何使用都是错误的,因为您永远不要使用它!使用utf8::is_utf8猜测字符串的语义是 Unicode错误.除了调试Perl或XS模块时检查变量的内部状态外,utf8::is_utf8没有用.

No. Any use of utf8::is_utf8 is incorrect as you should never use it! Using utf8::is_utf8 to guess at semantics of a string is what's known as an instance of The Unicode Bug. Except for inspecting the internal state of variables when debugging Perl or XS module, utf8::is_utf8 has no use.

它不指示变量中的值是否使用UTF-8编码.实际上,这是不可能可靠知道的.例如,"\xC3\xA9"是否生成使用UTF-8编码的字符串?好吧,没有办法知道!这取决于我是指"é""é"还是其他完全不同的词.

It does not indicate whether the value in a variable is encoded using UTF-8 or not. In fact, that's impossible to know reliably. For example, does "\xC3\xA9" produce a string that's encoded using UTF-8 or not? Well, there's no way to know! It depends on whether I meant "é", "é" or something entirely different.

如果该变量可能包含编码和解码的字符串,则由您使用第二个变量来跟踪它.我强烈建议不要这样做.只需解码从外部传入的所有内容即可.

If the variable may contain both encoded and decoded strings, it's up to you to track that using a second variable. I strongly advise against this, though. Just decode everything as it comes in from the outside.

如果真的不能,则最好选择尝试解码$@并忽略错误. 不太可能,不是UTF-8的可读性就是有效的UTF-8

If you really can't, your best bet it to try to decode $@ and ignore errors. It's very unlikely that something readable that isn't UTF-8 would be valid UTF-8.

# $@ is sometimes encoded. If it's not,
# the following will leave it unchanged.
utf8::decode($@);

print $@;

这篇关于我是否正确使用utf8 :: is_utf8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆