两个正则表达式的区别:[abc]+ 和 ([abc])+ [英] difference between two regular expressions: [abc]+ and ([abc])+

查看:75
本文介绍了两个正则表达式的区别:[abc]+ 和 ([abc])+的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

In [29]: re.findall("([abc])+","abc")
Out[29]: ['c']

In [30]: re.findall("[abc]+","abc")
Out[30]: ['abc']

被分组的人弄糊涂了.它有何不同?

Confused by the grouped one. How does it make difference?

推荐答案

这里有两点需要说明:量化分组的行为,以及findall()方法的设计.

There are two things that need to be explained here: the behavior of quantified groups, and the design of the findall() method.

在您的第一个示例中,[abc]a 匹配,后者在组 #1 中捕获.然后它匹配 b 并在组 #1 中捕获它,覆盖 a.然后再次使用 c,这就是比赛结束时第 1 组中剩下的.

In your first example, [abc] matches the a, which is captured in group #1. Then it matches b and captures it in group #1, overwriting the a. Then again with the c, and that's what's left in group #1 at the end of the match.

但它确实匹配整个字符串.如果您使用 search()finditer(),您将能够查看 MatchObject 并看到 group(0) 包含abcgroup(1) 包含 c.但是 findall() 返回字符串,而不是 MatchObjects.如果没有组,则返回整体匹配的列表;如果有组,则列表包含所有捕获,但整体匹配.

But it does match the whole string. If you were using search() or finditer(), you would be able to look at the MatchObject and see that group(0) contains abc and group(1) contains c. But findall() returns strings, not MatchObjects. If there are no groups, it returns a list of the overall matches; if there are groups, the list contains all the captures, but not the overall match.

所以你的两个正则表达式都匹配整个字符串,但第一个也单独捕获和丢弃每个字符(这有点毫无意义).只是 findall() 的意外行为让您看起来像是得到了不同的结果.

So both of your regexes are matching the whole string, but the first one is also capturing and discarding each character individually (which is kinda pointless). It's only the unexpected behavior of findall() that makes it look like you're getting different results.

这篇关于两个正则表达式的区别:[abc]+ 和 ([abc])+的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆