Python 正则表达式的奇怪行为 - findall 只找到“()?";部分 [英] Odd behavior with Python regular expressions - findall only finds the "()?" portion

查看:40
本文介绍了Python 正则表达式的奇怪行为 - findall 只找到“()?";部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个正则表达式来查找字符串中的单位和大小(或者它可以用作维度).例如:产品:A,2 x 3.5 加仑瓶"

为简单起见,我将删除所有空格,因此变为:

产品:A,2x3.5 加仑瓶"

我的正则表达式如下:

numAndSize = re.compile(r'\d+[xX]\d+(\.\d+)?')

但是当我尝试使用 findall 时,会发生这种情况:

在 [47]: numAndSize.findall("Product:A,2x3.5gallonbottles")输出[47]:['.5']

我-只-得到这个字符串中的'.5',而不是整个表达式

然而,使用搜索和组按预期工作:

在 [50]: numAndSize.search("Product:A,2x3.5gallonbottles").group(0)输出[50]:'2x3.5'

从那里开始,我尝试将我的正则表达式更改为不包含可选的小数,并在其上运行 findall.

在[51]中:numAndSize = re.compile(r'\d+[xX]\d+')在 [52]: numAndSize.findall("Product:A,2x3.5gallonbottles")输出[52]:['2x3']

这种行为背后有什么原因吗?出于我的目的,我当然可以使用 .search().group(),但我个人喜欢 findall,因为输出以干净的格式返回了更多信息.

解决方案

如果正则表达式包含任何捕获组,re.findall() 将返回这些组而不是整个匹配项.要获得整个比赛,请使用非捕获组:

<预><代码>>>>numAndSize = re.compile(r'\d+[xX]\d+(?:\.\d+)?')>>>numAndSize.findall("产品:A,2x3.5gallonbottles")['2x3.5']

或者,如果您可以利用此行为使其返回维度(或单位或其他任何内容)的元组:

<预><代码>>>>numAndSize = re.compile(r'(\d+)[xX](\d+(?:\.\d+)?)')>>>numAndSize.findall("产品:A,2x3.5gallonbottles")[('2', '3.5')]

I'm currently writing a regular expression to find the units and size (or it could work as dimensions) in a string. For example: "Product: A, 2 x 3.5 gallon bottles"

For simplicity, I'm removing all whitespace, so this becomes:

"Product:A,2x3.5gallonbottles"

My regex is as follows:

numAndSize = re.compile(r'\d+[xX]\d+(\.\d+)?')

But when I try to use findall, this happens:

In [47]: numAndSize.findall("Product:A,2x3.5gallonbottles")
Out[47]: ['.5']

I -only- get the '.5' in this string, instead of the entire expression

Using search and group, however, works as expected:

In [50]: numAndSize.search("Product:A,2x3.5gallonbottles").group(0)
Out[50]: '2x3.5'

From there, I tried changing my regex to not include the optional decimal, and ran findall on that.

In [51]: numAndSize = re.compile(r'\d+[xX]\d+')
In [52]: numAndSize.findall("Product:A,2x3.5gallonbottles")
Out[52]: ['2x3']

Is there a reason behind this behavior? For my purposes I can certainly use .search().group(), but I personally like findall since the output gives back a lot more information in a clean format.

解决方案

If the regular expression contains any capturing groups, re.findall() will return those groups instead of the entire match. To get the entire match use a non-capturing group:

>>> numAndSize = re.compile(r'\d+[xX]\d+(?:\.\d+)?')
>>> numAndSize.findall("Product:A,2x3.5gallonbottles")
['2x3.5']

Or if you could take advantage of this behavior to have it return a tuple of the dimensions (or units or whatever they are):

>>> numAndSize = re.compile(r'(\d+)[xX](\d+(?:\.\d+)?)')
>>> numAndSize.findall("Product:A,2x3.5gallonbottles")
[('2', '3.5')]

这篇关于Python 正则表达式的奇怪行为 - findall 只找到“()?";部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆