如何在 Python 中获取匹配正则表达式的组名? [英] How to get group name of match regular expression in Python?

查看:40
本文介绍了如何在 Python 中获取匹配正则表达式的组名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题是非常基本的,我不知道如何从匹配中找出组名.让我用代码解释一下:

导入重新a = list(re.finditer('(?P[^\W\d_]+)|(?P\d+)', 'Ala ma kota'))

如何获得 a[0].group(0) 匹配的组名 - 假设命名模式的数量可以更大?

示例已简化以学习基础知识.

我可以反转匹配 a[0].groupdict() 但它会很慢.

解决方案

您可以从编译后的表达式中获取此信息:

<预><代码>>>>模式 = re.compile(r'(?P\w+)|(?P\d+)')>>>模式.groupindex{'姓名':1,'号码':2}

这使用 RegexObject.groupindex 属性:

<块引用>

一个映射由 (?P) 定义的任何符号组名到组号的字典.如果模式中没有使用符号组,则字典为空.

如果您只能访问匹配对象,则可以使用 MatchObject.re 属性:

<预><代码>>>>a = list(re.finditer(r'(?P\w+)|(?P\d+)', 'Ala ma kota'))>>>[0]<_sre.SRE_Match 对象在 0x100264ad0>>>>a[0].re.groupindex{'姓名':1,'号码':2}

如果您想知道匹配的是哪个组,请查看值;None 表示从未在匹配中使用过组:

<预><代码>>>>a[0].groupdict(){'name': 'Ala', 'number': 无}

number 组从未用于匹配任何内容,因为它的值为 None.

然后您可以使用以下命令查找正则表达式中使用的名称:

names_used = [名称的名称,matchobj.groupdict().iteritems() 中的值,如果值不是 None]

或者如果只有一个组可以匹配,您可以使用MatchObject.lastgroup:

name_used = matchobj.lastgroup

顺便说一句,你的正则表达式有一个致命的缺陷;\d 匹配的所有内容,也由 \w 匹配.您永远不会看到 numbername 可以首先匹配的地方使用.反转模式以避免这种情况:

<预><代码>>>>对于 re.finditer(r'(?P\w+)|(?P\d+)', 'word 42') 中的匹配:... 打印 match.lastgroup...名称名称>>>对于 re.finditer(r'(?P\d+)|(?P\w+)', 'word 42') 中的匹配:... 打印 match.lastgroup...名称数字

但考虑到以数字开头的单词仍然会混淆您的简单情况:

<预><代码>>>>对于 re.finditer(r'(?P\d+)|(?P\w+)', 'word42 42word') 中的匹配:... 打印 match.lastgroup, repr(match.group(0))...命名'word42'数字42"命名词"

Question is very basic whatever I do not know how to figure out group name from match. Let me explain in code:

import re    
a = list(re.finditer('(?P<name>[^\W\d_]+)|(?P<number>\d+)', 'Ala ma kota'))

How to get group name of a[0].group(0) match - assume that number of named patterns can be larger?

Example is simplified to learn basics.

I can invert match a[0].groupdict() but it will be slow.

解决方案

You can get this information from the compiled expression:

>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}

This uses the RegexObject.groupindex attribute:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers. The dictionary is empty if no symbolic groups were used in the pattern.

If you only have access to the match object, you can get to the pattern with the MatchObject.re attribute:

>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}

If all you wanted to know what group matched look at the value; None means a group never was used in a match:

>>> a[0].groupdict()
{'name': 'Ala', 'number': None}

The number group never used to match anything because its value is None.

You can then find the names used in the regular expression with:

names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]

or if there is only ever one group that can match, you can use MatchObject.lastgroup:

name_used = matchobj.lastgroup

As a side note, your regular expression has a fatal flaw; everything that \d matches, is also matched by \w. You'll never see number used where name can match first. Reverse the pattern to avoid this:

>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
...     print match.lastgroup
... 
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
...     print match.lastgroup
... 
name
number

but take into account that words starting with digits will still confuse things for your simple case:

>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
...     print match.lastgroup, repr(match.group(0))
... 
name 'word42'
number '42'
name 'word'

这篇关于如何在 Python 中获取匹配正则表达式的组名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆