如何在Perl中仅匹配Unicode字符串中的完全组成的字符? [英] How do I match only fully-composed characters in a Unicode string in Perl?

查看:60
本文介绍了如何在Perl中仅匹配Unicode字符串中的完全组成的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种仅匹配Unicode字符串中完全组成的字符的方法.

I'm looking for a way to match only fully composed characters in a Unicode string.

[:print:] 是否依赖于包含此字符类的任何正则表达式实现中的语言环境?例如,它是否匹配日语字符あ",因为它不是控制字符,还是 [:print:] 始终是ASCII码0x20到0x7E?

Is [:print:] dependent upon locale in any regular expression implementation that incorporates this character class? For example, will it match Japanese character 'あ', since it is not a control character, or is [:print:] always going to be ASCII codes 0x20 to 0x7E?

是否有任何字符类(包括Perl RE)可用于匹配控制字符以外的任何其他字符?如果 [:print:] 仅包含ASCII范围内的字符,我会认为 [:cntrl:] 也是如此.

Is there any character class, including Perl REs, that can be used to match anything other than a control character? If [:print:] includes only characters in ASCII range I would assume [:cntrl:] does too.

推荐答案

echo あ| perl -nle 'BEGIN{binmode STDIN,":utf8"} print"[$_]"; print /[[:print:]]/ ? "YES" : "NO"'

尽管它会生成有关宽字符的警告,但大多数情况下都有效.但它给了你一个想法:你必须确保你正在处理一个真正的 unicode 字符串(检查 utf8::is_utf8).或者只检查 perlunicode -整个主题仍然使我旋转.

This mostly works, though it generates a warning about a wide character. But it gives you the idea: you must be sure you're dealing with a real unicode string (check utf8::is_utf8). Or just check perlunicode at all - the whole subject still makes my head spin.

这篇关于如何在Perl中仅匹配Unicode字符串中的完全组成的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆