仅匹配 Python re 中的一个 Unicode 字母 [英] Matching only a unicode letter in Python re

查看：38 发布时间：2021/7/6 19:11:41 python regex unicode character-properties

本文介绍了仅匹配 Python re 中的一个 Unicode 字母的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符串，我想从中提取 3 个组:

'19 janvier 2012' ->19"、詹维尔"、2012"

月份名称可能包含非 ASCII 字符，所以 [A-Za-z] 对我不起作用:

<预><代码>>>>进口重新>>>re.search(ur'(\d{,2}) ([A-Za-z]+) (\d{4})', u'20 janvier 2012', re.UNICODE).groups()(u'20', u'janvier', u'2012')>>>re.search(ur'(\d{,2}) ([A-Za-z]+) (\d{4})', u'20 février 2012', re.UNICODE).groups()回溯(最近一次调用最后一次):文件<stdin>"，第 1 行，在 <module> 中AttributeError: 'NoneType' 对象没有属性 'groups'>>>

我可以使用 \w 但它匹配数字和下划线:

<预><代码>>>>re.search(ur'(\w+)', u'février', re.UNICODE).groups()(u'f\xe9vrier',)>>>re.search(ur'(\w+)', u'fé_q23vrier', re.UNICODE).groups()(u'f\xe9_q23vrier',)>>>

我尝试使用 [:alpha:]，但它不起作用:

<预><代码>>>>re.search(ur'[:alpha:]+', u'février', re.UNICODE).groups()回溯(最近一次调用最后一次):文件<stdin>"，第 1 行，在 <module> 中AttributeError: 'NoneType' 对象没有属性 'groups'>>>

如果我可以在没有 [_0-9] 的情况下以某种方式匹配 \w，但我不知道如何匹配.即使我知道如何做到这一点，是否有像 [:alpha:] 这样的现成快捷方式，它可以在 Python 中使用?

解决方案

你可以构造一个新的字符类:

[^\W\d_]

而不是 \w.翻译成英文，它的意思是任何不是非字母数字字符的字符([^\W] 与 \w 相同)，但这也不是数字而不是下划线".

因此，它将只允许 Unicode 字母(如果您使用 re.UNICODE 编译选项).

I have a string from which i want to extract 3 groups:

'19 janvier 2012' -> '19', 'janvier', '2012'

Month name could contain non ASCII characters, so [A-Za-z] does not work for me:

>>> import re
>>> re.search(ur'(\d{,2}) ([A-Za-z]+) (\d{4})', u'20 janvier 2012', re.UNICODE).groups()
(u'20', u'janvier', u'2012')
>>> re.search(ur'(\d{,2}) ([A-Za-z]+) (\d{4})', u'20 février 2012', re.UNICODE).groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
>>>

I could use \w but it matches digits and underscore:

>>> re.search(ur'(\w+)', u'février', re.UNICODE).groups()
(u'f\xe9vrier',)
>>> re.search(ur'(\w+)', u'fé_q23vrier', re.UNICODE).groups()
(u'f\xe9_q23vrier',)
>>>

I tried to use [:alpha:], but it's not working:

>>> re.search(ur'[:alpha:]+', u'février', re.UNICODE).groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
>>>

If i could somehow match \w without [_0-9], but i don't know how. And even if i find out how to do this, is there a ready shortcut like [:alpha:] which works in Python?

解决方案

You can construct a new character class:

[^\W\d_]

instead of \w. Translated into English, it means "Any character that is not a non-alphanumeric character ([^\W] is the same as \w), but that is also not a digit and not an underscore".

Therefore, it will only allow Unicode letters (if you use the re.UNICODE compile option).

这篇关于仅匹配 Python re 中的一个 Unicode 字母的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

仅匹配 Python re 中的一个 Unicode 字母 [英] Matching only a unicode letter in Python re

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

仅匹配 Python re 中的一个 Unicode 字母 [英] Matching only a unicode letter in Python re

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭