任何语言中单词字符的正则表达式 [英] Regex for word characters in any language
问题描述
测试 PHP 正则表达式引擎,我看到它只将 [0-9A-Za-z_]
视为单词字符.非 ASCII 语言(例如希伯来语)的字母与 [\w]
不匹配为单词字符.是否有任何 PHP 或 Perl 正则表达式转义序列可以匹配任何语言的字母?我可以为我希望使用的每个字母添加范围,但用户总是会用意想不到的语言给我们带来惊喜!
Testing the PHP regex engine, I see that it considers only [0-9A-Za-z_]
to be word characters. Letters of non-ASCII languages, such as Hebrew, are not matched as word characters with [\w]
. Are there any PHP or Perl regex escape sequences which will match a letter in any language? I could add ranges for each alphabet that I expect to be used, but users will always surprise us with unexpected languages!
请注意,这不是用于安全过滤,而是用于标记文本.
Note that this is not for security filtering but rather for tokenizing a text.
推荐答案
Try [\pL_]
- 请参阅参考资料
Try [\pL_]
- see the reference at
http://php.net/manual/en/regexp.reference.unicode.php一个>
这篇关于任何语言中单词字符的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!