正则表达式:如何访问一个组的多个匹配项? [英] regexes: How to access multiple matches of a group?

查看:59
本文介绍了正则表达式:如何访问一个组的多个匹配项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在整理一个相当复杂的正则表达式.表达式的一部分匹配诸如+a"、-57"等字符串.A + 或a - 后跟任意数量的字母或数字.我想匹配 0 个或多个匹配此模式的字符串.

这是我想出的表达方式:

([\+-][a-zA-Z0-9]+)*

如果我使用这种模式搜索字符串-56+a",我希望得到两个匹配项:

+a 和 -56

但是,我只返回了最后一个匹配项:

<预><代码>>>>m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a')>>>m.groups()('+a',)

查看 python 文档,我看到:

<块引用>

如果一个组匹配多次,则只能访问最后一个匹配:

<预><代码>>>>m = re.match(r"(..)+", "a1b2c3") # 匹配 3 次.>>>m.group(1) # 只返回最后一个匹配.'c3'

那么,我的问题是:你如何访问多个组匹配?

解决方案

从您的正则表达式中删除 * (因此它与您的模式的一个实例完全匹配).然后使用 re.findall(...)re.finditer(参见 此处) 返回所有匹配项.

更新:

听起来您实际上是在构建一个递归下降解析器.对于相对简单的解析任务,手工完成是很常见且完全合理的.如果您对库解决方案感兴趣(例如,以防您的解析任务稍后可能变得更加复杂),请查看 pyparsing.

I am putting together a fairly complex regular expression. One part of the expression matches strings such as '+a', '-57' etc. A + or a - followed by any number of letters or numbers. I want to match 0 or more strings matching this pattern.

This is the expression I came up with:

([\+-][a-zA-Z0-9]+)*

If I were to search the string '-56+a' using this pattern I would expect to get two matches:

+a and -56

However, I only get the last match returned:

>>> m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a')
>>> m.groups()
('+a',)

Looking at the python docs I see that:

If a group matches multiple times, only the last match is accessible:

>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

So, my question is: how do you access multiple group matches?

解决方案

Drop the * from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...) or re.finditer (see here) to return all matches.

Update:

It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.

这篇关于正则表达式:如何访问一个组的多个匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆