Python re.findall() 没有按预期工作 [英] Python re.findall() is not working as expected

查看:33
本文介绍了Python re.findall() 没有按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有代码:

import re
sequence="aabbaa"
rexp=re.compile("(aa|bb)+")
rexp.findall(sequence)

返回['aa']

如果我们有

import re
sequence="aabbaa"
rexp=re.compile("(aa|cc)+")
rexp.findall(sequence)

我们得到['aa','aa']

为什么有区别,为什么(对于第一个)我们没有得到 ['aa','bb','aa']?

Why is there a difference and why (for the first) do we not get ['aa','bb','aa']?

谢谢!

推荐答案

让我解释一下你在做什么:

let me explain what you are doing:

regex = re.compile("(aa|bb)+")

您正在创建一个正则表达式,它将查找 aabb,然后尝试查找是否还有更多 aabb 之后,它会一直寻找 aabb 直到找不到.由于您希望捕获组仅返回 aabb,因此您只能获得最后捕获/找到的组.

you are creating a regex which will look for aa or bb and then will try to find if there are more aa or bb after that, and it will keep looking for aa or bb until it doesnt find. since you want your capturing group to return only the aa or bb then you only get the last captured/found group.

但是,如果您有这样的字符串:aaxaabbxaa,您将得到 aa,bb,aa 因为您首先查看字符串并找到 aa,然后你再找,发现只有一个x,所以你有1组.然后你找到另一个aa,但是你找到一个bb,然后是一个x,所以你停下来,你有你的第二个组,它是<代码>bb.然后你会发现另一个aa.所以你的最终结果是 aa,bb,aa

however, if you have a string like this: aaxaabbxaa you will get aa,bb,aa because you first look at the string and find aa, then you look for more, and find only an x, so you have 1 group. then you find another aa, but then you find a bb, and then an x so you stop and you have your second group which is bb. then you find another aa. and so your final result is aa,bb,aa

我希望这能解释你在做什么.正如预期的那样.要获得任何 aabb 组,您需要删除 + ,它告诉正则表达式在返回匹配之前寻找多个组.并让正则表达式返回 aabb...

i hope this explains what you are DOING. and it is as expected. to get ANY group of aa or bb you need to remove the + which is telling the regex to seek multiple groups before returning a match. and just have regex return each match of aa or bb...

所以你的正则表达式应该是:

so your regex should be:

regex = re.compile("(aa|bb)")

干杯.

这篇关于Python re.findall() 没有按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆