如何在 Python 中获取匹配正则表达式的组名? [英] How to get group name of match regular expression in Python?
问题描述
问题是非常基本的,我不知道如何从匹配中找出组名.让我用代码解释一下:
导入重新a = list(re.finditer('(?P[^\W\d_]+)|(?P\d+)', 'Ala ma kota'))
如何获得 a[0].group(0)
匹配的组名 - 假设命名模式的数量可以更大?
示例已简化以学习基础知识.
我可以反转匹配 a[0].groupdict()
但它会很慢.
您可以从编译后的表达式中获取此信息:
<预><代码>>>>模式 = re.compile(r'(?P这使用 RegexObject.groupindex
属性:
一个映射由 (?P
定义的任何符号组名到组号的字典.如果模式中没有使用符号组,则字典为空.
如果您只能访问匹配对象,则可以使用 MatchObject.re
属性:
如果您想知道匹配的是哪个组,请查看值;None
表示从未在匹配中使用过组:
number
组从未用于匹配任何内容,因为它的值为 None
.
然后您可以使用以下命令查找正则表达式中使用的名称:
names_used = [名称的名称,matchobj.groupdict().iteritems() 中的值,如果值不是 None]
或者如果只有一个组可以匹配,您可以使用MatchObject.lastgroup
:
name_used = matchobj.lastgroup
顺便说一句,你的正则表达式有一个致命的缺陷;\d
匹配的所有内容,也由 \w
匹配.您永远不会看到 number
在 name
可以首先匹配的地方使用.反转模式以避免这种情况:
但考虑到以数字开头的单词仍然会混淆您的简单情况:
<预><代码>>>>对于 re.finditer(r'(?PQuestion is very basic whatever I do not know how to figure out group name from match. Let me explain in code:
import re
a = list(re.finditer('(?P<name>[^\W\d_]+)|(?P<number>\d+)', 'Ala ma kota'))
How to get group name of a[0].group(0)
match - assume that number of named patterns can be larger?
Example is simplified to learn basics.
I can invert match a[0].groupdict()
but it will be slow.
You can get this information from the compiled expression:
>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}
This uses the RegexObject.groupindex
attribute:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers. The dictionary is empty if no symbolic groups were used in the pattern.
If you only have access to the match object, you can get to the pattern with the MatchObject.re
attribute:
>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}
If all you wanted to know what group matched look at the value; None
means a group never was used in a match:
>>> a[0].groupdict()
{'name': 'Ala', 'number': None}
The number
group never used to match anything because its value is None
.
You can then find the names used in the regular expression with:
names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]
or if there is only ever one group that can match, you can use MatchObject.lastgroup
:
name_used = matchobj.lastgroup
As a side note, your regular expression has a fatal flaw; everything that \d
matches, is also matched by \w
. You'll never see number
used where name
can match first. Reverse the pattern to avoid this:
>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
... print match.lastgroup
...
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
... print match.lastgroup
...
name
number
but take into account that words starting with digits will still confuse things for your simple case:
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
... print match.lastgroup, repr(match.group(0))
...
name 'word42'
number '42'
name 'word'
这篇关于如何在 Python 中获取匹配正则表达式的组名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!