如何在Perl中仅匹配Unicode字符串中的完全组成的字符? [英] How do I match only fully-composed characters in a Unicode string in Perl?
问题描述
我正在寻找一种仅匹配Unicode字符串中完全组成的字符的方法.
I'm looking for a way to match only fully composed characters in a Unicode string.
[:print:]
是否依赖于包含此字符类的任何正则表达式实现中的语言环境?例如,它是否匹配日语字符あ",因为它不是控制字符,还是 [:print:]
始终是ASCII码0x20到0x7E?
Is [:print:]
dependent upon locale in any regular expression implementation that incorporates this character class? For example, will it match Japanese character 'あ', since it is not a control character, or is [:print:]
always going to be ASCII codes 0x20 to 0x7E?
是否有任何字符类(包括Perl RE)可用于匹配控制字符以外的任何其他字符?如果 [:print:]
仅包含ASCII范围内的字符,我会认为 [:cntrl:]
也是如此.
Is there any character class, including Perl REs, that can be used to match anything other than a control character? If [:print:]
includes only characters in ASCII range I would assume [:cntrl:]
does too.
推荐答案
echo あ| perl -nle 'BEGIN{binmode STDIN,":utf8"} print"[$_]"; print /[[:print:]]/ ? "YES" : "NO"'
尽管它会生成有关宽字符的警告,但大多数情况下都有效.但它给了你一个想法:你必须确保你正在处理一个真正的 unicode 字符串(检查 utf8::is_utf8).或者只检查 perlunicode -整个主题仍然使我旋转.
This mostly works, though it generates a warning about a wide character. But it gives you the idea: you must be sure you're dealing with a real unicode string (check utf8::is_utf8). Or just check perlunicode at all - the whole subject still makes my head spin.
这篇关于如何在Perl中仅匹配Unicode字符串中的完全组成的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!