获取正则表达式的所有可能匹配项(在 python 中)? [英] Get all possible matches for regex (in python)?
问题描述
我有一个正则表达式,它可以以多种可能重叠的方式匹配一个字符串.但是,它似乎只捕获字符串中的一个可能匹配项,我怎样才能获得所有可能的匹配项?我试过 finditer 没有成功,但也许我用错了.
我试图解析的字符串是:
foo-foobar-foobaz
我使用的正则表达式是:
(.*)-(.*)>>>s = "foo-foobar-foobaz">>>匹配 = re.finditer(r'(.*)-(.*)', s)>>>[match.group(1) 用于匹配匹配]['foo-foobar']
我想要匹配(foo 和 foobar-foobaz),但它似乎只能得到(foo-foobar 和 foobaz).
没问题:
<预><代码>>>>正则表达式 = "([^-]*-)(?=([^-]*))">>>对于 re.finditer(regex, "foo-foobar-foobaz") 结果:>>>打印("".join(result.groups()))foo-foobarfoobar-foobaz通过将第二个捕获括号放在先行断言中,您可以捕获其内容没有在整体比赛中消耗它.
我还使用了 [^-]*
而不是 .*
因为点也匹配分隔符 -
你可能不不想.
I have a regex that can match a string in multiple overlapping possible ways. However, it seems to only capture one possible match in the string, how can I get all possible matches? I've tried finditer with no success, but maybe I'm using it wrong.
The string I'm trying to parse is:
foo-foobar-foobaz
The regex I'm using is:
(.*)-(.*)
>>> s = "foo-foobar-foobaz"
>>> matches = re.finditer(r'(.*)-(.*)', s)
>>> [match.group(1) for match in matches]
['foo-foobar']
I want the match (foo and foobar-foobaz), but it seems to only get (foo-foobar and foobaz).
No problem:
>>> regex = "([^-]*-)(?=([^-]*))"
>>> for result in re.finditer(regex, "foo-foobar-foobaz"):
>>> print("".join(result.groups()))
foo-foobar
foobar-foobaz
By putting the second capturing parenthesis in a lookahead assertion, you can capture its contents without consuming it in the overall match.
I've also used [^-]*
instead of .*
because the dot also matches the separator -
which you probably don't want.
这篇关于获取正则表达式的所有可能匹配项(在 python 中)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!