如何在 python 正则表达式中实现 \p{L} [英] How to implement \p{L} in python regex

查看:98
本文介绍了如何在 python 正则表达式中实现 \p{L}的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图匹配任何语言中包含一个单词的所有字符串.我的搜索使我找到了 Python 的 Re 模块中没有的 \p{...}.但我找到了 https://pypi.python.org/pypi/regex.它应该与 \p{...} 命令一起使用.虽然没有.

I was trying to match all the string that contain one word in any language. My search led me to \p{...} which was absent in python's Re module. But I found https://pypi.python.org/pypi/regex. It should work with \p{...} commands. Although it doesn't.

我尝试解析这些行:

7652167371  apéritif
78687   attaché
78687   époque
78678   kunngjøre
78678   ærbødig
7687    vår
12312   dfsdf
23123   322432
1321    23123
2312    привер
32211   оипвыола

与:

def Pattern_compile(pattern_array):
    regexes = [regex.compile(p) for p in pattern_array]
    return regexes

def main():
    for line in sys.stdin:
        for regexp in Pattern_compile(p_a):
            if regexp.search (line):
                print line.strip('\n')

if __name__ == '__main__':
    p_a = ['^\d+\t(\p{L}|\p{M})+$', ]
    main()

结果只是拉丁字符单词:

The result is only latin-character word:

12312   dfsdf

推荐答案

你应该通过 unicode.(正则表达式和字符串)

You should pass unicode. (Both regular expression and the string)

import sys

import regex


def main(patterns):
    patterns = [regex.compile(p) for p in patterns]
    for line in sys.stdin:
        line = line.decode('utf8')
        for regexp in patterns:
            if regexp.search (line):
                print line.strip('\n')

if __name__ == '__main__':
    main([ur'^\d+\t(\p{L}|\p{M})+$', ])

这篇关于如何在 python 正则表达式中实现 \p{L}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆