如何在 python 正则表达式中实现 \p{L} [英] How to implement \p{L} in python regex
本文介绍了如何在 python 正则表达式中实现 \p{L}的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图匹配任何语言中包含一个单词的所有字符串.我的搜索使我找到了 Python 的 Re 模块中没有的 \p{...}.但我找到了 https://pypi.python.org/pypi/regex.它应该与 \p{...} 命令一起使用.虽然没有.
I was trying to match all the string that contain one word in any language. My search led me to \p{...} which was absent in python's Re module. But I found https://pypi.python.org/pypi/regex. It should work with \p{...} commands. Although it doesn't.
我尝试解析这些行:
7652167371 apéritif
78687 attaché
78687 époque
78678 kunngjøre
78678 ærbødig
7687 vår
12312 dfsdf
23123 322432
1321 23123
2312 привер
32211 оипвыола
与:
def Pattern_compile(pattern_array):
regexes = [regex.compile(p) for p in pattern_array]
return regexes
def main():
for line in sys.stdin:
for regexp in Pattern_compile(p_a):
if regexp.search (line):
print line.strip('\n')
if __name__ == '__main__':
p_a = ['^\d+\t(\p{L}|\p{M})+$', ]
main()
结果只是拉丁字符单词:
The result is only latin-character word:
12312 dfsdf
推荐答案
你应该通过 unicode.(正则表达式和字符串)
You should pass unicode. (Both regular expression and the string)
import sys
import regex
def main(patterns):
patterns = [regex.compile(p) for p in patterns]
for line in sys.stdin:
line = line.decode('utf8')
for regexp in patterns:
if regexp.search (line):
print line.strip('\n')
if __name__ == '__main__':
main([ur'^\d+\t(\p{L}|\p{M})+$', ])
这篇关于如何在 python 正则表达式中实现 \p{L}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文