正则表达式:如何访问一个组的多个匹配项? [英] regexes: How to access multiple matches of a group?
问题描述
我正在整理一个相当复杂的正则表达式.表达式的一部分匹配诸如+a"、-57"等字符串.A + 或a - 后跟任意数量的字母或数字.我想匹配 0 个或多个匹配此模式的字符串.
这是我想出的表达方式:
([\+-][a-zA-Z0-9]+)*
如果我使用这种模式搜索字符串-56+a",我希望得到两个匹配项:
+a 和 -56
但是,我只返回了最后一个匹配项:
<预><代码>>>>m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a')>>>m.groups()('+a',)查看 python 文档,我看到:
<块引用>如果一个组匹配多次,则只能访问最后一个匹配:
<预><代码>>>>m = re.match(r"(..)+", "a1b2c3") # 匹配 3 次.>>>m.group(1) # 只返回最后一个匹配.'c3'那么,我的问题是:你如何访问多个组匹配?
从您的正则表达式中删除 *
(因此它与您的模式的一个实例完全匹配).然后使用 re.findall(...)
或 re.finditer
(参见 此处) 返回所有匹配项.
更新:
听起来您实际上是在构建一个递归下降解析器.对于相对简单的解析任务,手工完成是很常见且完全合理的.如果您对库解决方案感兴趣(例如,以防您的解析任务稍后可能变得更加复杂),请查看 pyparsing.
I am putting together a fairly complex regular expression. One part of the expression matches strings such as '+a', '-57' etc. A + or a - followed by any number of letters or numbers. I want to match 0 or more strings matching this pattern.
This is the expression I came up with:
([\+-][a-zA-Z0-9]+)*
If I were to search the string '-56+a' using this pattern I would expect to get two matches:
+a and -56
However, I only get the last match returned:
>>> m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a')
>>> m.groups()
('+a',)
Looking at the python docs I see that:
If a group matches multiple times, only the last match is accessible:
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. >>> m.group(1) # Returns only the last match. 'c3'
So, my question is: how do you access multiple group matches?
Drop the *
from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...)
or re.finditer
(see here) to return all matches.
Update:
It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.
这篇关于正则表达式:如何访问一个组的多个匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!