re.findall 没有返回完整匹配? [英] re.findall not returning full match?

查看:57
本文介绍了re.findall 没有返回完整匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一堆字符串的文件,例如size=XXX;".我第一次尝试使用 python 的 re 模块,但对以下行为有点困惑:如果我在正则表达式中使用管道表示 'or',我只能看到返回的匹配项.例如:

<预><代码>>>>myfile = open('testfile.txt','r').read()>>>打印 re.findall('size=50;',myfile)['size=50;', 'size=50;', 'size=50;', 'size=50;']>>>打印 re.findall('size=51;',myfile)['size=51;', 'size=51;', 'size=51;']>>>打印 re.findall('size=(50|51);',myfile)['51', '51', '51', '50', '50', '50', '50']>>>打印 re.findall(r'size=(50|51);',myfile)['51','51','51','50','50','50','50']

匹配的size="部分消失了.(不过肯定是用在搜索中的,不然结果会多一些).我做错了什么?

解决方案

您遇到的问题是,如果 re.findall 尝试匹配捕获组(即括号中的正则表达式部分),然后是组返回,而不是匹配的字符串.

解决此问题的一种方法是使用非捕获组(以 ?: 为前缀).

<预><代码>>>>进口重新>>>s = '大小=50;大小=51;'>>>re.findall('size=(?:50|51);', s)['size=50;', 'size=51;']

如果 re.findall 尝试匹配的正则表达式没有捕获任何内容,它将返回整个匹配的字符串.

尽管在这种特殊情况下使用字符类可能是最简单的选择,但非捕获组提供了更通用的解决方案.

I have a file that includes a bunch of strings like "size=XXX;". I am trying python's re module for the first time and am a bit mystified by the following behavior: if I use a pipe for 'or' in a regular expression, I only see that bit of the match returned. E.g.:

>>> myfile = open('testfile.txt','r').read()
>>> print re.findall('size=50;',myfile)
['size=50;', 'size=50;', 'size=50;', 'size=50;']
>>> print re.findall('size=51;',myfile)
['size=51;', 'size=51;', 'size=51;']
>>> print re.findall('size=(50|51);',myfile)
['51', '51', '51', '50', '50', '50', '50']
>>> print re.findall(r'size=(50|51);',myfile)
['51', '51', '51', '50', '50', '50', '50']

The "size=" part of the match is gone. (Yet it is certainly used in the search, otherwise there would be more results). What am I doing wrong?

解决方案

The problem you have is that if the regex that re.findall tries to match captures groups (i.e. the portions of the regex that are enclosed in parentheses), then it is the groups that are returned, rather than the matched string.

One way to solve this issue is to use non-capturing groups (prefixed with ?:).

>>> import re
>>> s = 'size=50;size=51;'
>>> re.findall('size=(?:50|51);', s)
['size=50;', 'size=51;']

If the regex that re.findall tries to match does not capture anything, it returns the whole of the matched string.

Although using character classes might be the simplest option in this particular case, non-capturing groups provide a more general solution.

这篇关于re.findall 没有返回完整匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆