Python 正则表达式模式 * 未按预期工作 [英] Python regular expression pattern * is not working as expected
问题描述
在学习 Google 2010 Python 课程时,我发现了以下文档:
<块引用>'*'
-- 左边模式出现 0 次或多次
但是当我尝试以下操作时
re.search(r'i*','biiiiiiiiiiiiiig').group()
我希望 'iiiiiiiiiiiiiii'
作为输出但得到 ''
.为什么?
*
表示 0 或更多,但 re.search
只会返回第一个匹配项.这里的第一个匹配项是一个空字符串.所以你得到一个空字符串作为输出.
将 *
更改为 +
以获得所需的输出.
考虑这个例子.
<预><代码>>>>re.search(r'i*','biiiiiiiiiiiiiig').group()''>>>re.search(r'i*','iiiiiiiiiiiiiig').group()'iiiiiiiiiiiiiii'这里 i*
返回 iiiiiiiiiiiiiii
因为一开始,正则表达式引擎尝试匹配 i
的零次或多次.一旦它在第一个找到 i
,它就会贪婪地匹配第二个例子中的所有 i
,所以你得到 iiiiiiiii
作为输出如果 i
不是第一个(考虑这个 biiiiiiig
字符串),i*
模式将匹配所有每个不匹配之前的空字符串,在我们的例子中,它匹配 b
和 g
之前存在的所有空字符串.因为 re.search
只返回第一个匹配项,你应该得到一个空字符串,因为第一个 b
不匹配.
为什么我在下面的例子中得到三个空字符串作为输出?
<预><代码>>>>re.findall(r'i*','biiiiiiiiiiiiiig')['', 'iiiiiiiiiiiiiii', '', '']正如我之前解释过的,对于每个不匹配,你应该得到一个空字符串作为匹配.让我解释.正则表达式引擎从左到右解析输入.
作为输出的第一个空字符串是因为模式
i*
不匹配字符b
但它匹配存在于b
.现在引擎移动到下一个字符
i
它将被我们的模式i*
匹配,所以它贪婪地匹配下面的我
的 .所以你得到iiiiiiiiiiiiiii
作为第二个.在匹配完所有的
i
之后,它会移动到下一个字符g
,它与我们的模式i* 不匹配代码> .所以
i*
匹配非匹配g
之前的空字符串.这就是第三个空字符串的原因.现在我们的模式
i*
匹配存在于行尾之前的空字符串.这就是第四个空字符串的原因.
While working through Google's 2010 Python class, I found the following documentation:
'*'
-- 0 or more occurrences of the pattern to its left
But when I tried the following
re.search(r'i*','biiiiiiiiiiiiiig').group()
I expected 'iiiiiiiiiiiiii'
as output but got ''
. Why?
*
means 0 or more but re.search
would return only the first match. Here the first match is an empty string. So you get an empty string as output.
Change *
to +
to get the desired output.
>>> re.search(r'i*','biiiiiiiiiiiiiig').group()
''
>>> re.search(r'i+','biiiiiiiiiiiiiig').group()
'iiiiiiiiiiiiii'
Consider this example.
>>> re.search(r'i*','biiiiiiiiiiiiiig').group()
''
>>> re.search(r'i*','iiiiiiiiiiiiiig').group()
'iiiiiiiiiiiiii'
Here i*
returns iiiiiiiiiiiiii
because at first , the regex engine tries to match zero or more times of i
. Once it finds i
at the very first, it matches greedily all the i
's like in the second example, so you get iiiiiiii
as output and if the i
is not at the first (consider this biiiiiiig
string), i*
pattern would match all the empty string before the every non-match, in our case it matches all the empty strings that exists before b
and g
. Because re.search
returns only the first match, you should get an empty string because of the non-match b
at the first.
Why i got three empty strings as output in the below example?
>>> re.findall(r'i*','biiiiiiiiiiiiiig')
['', 'iiiiiiiiiiiiii', '', '']
As i explained earlier, for every non-match you should get an empty string as match. Let me explain. Regex engine parses the input from left to right.
First empty string as output is because the pattern
i*
won't match the characterb
but it matches the empty string which exists before theb
.Now the engine moves to the next character that is
i
which would be matched by our patterni*
, so it greedily matches the followingi
's . So you getiiiiiiiiiiiiii
as the second.After matching all the
i
's, it moves to the next character that isg
which isn't matched by our patterni*
. Soi*
matches the empty string before the non-matchg
. That's the reason for the third empty string.Now our pattern
i*
matches the empty string which exists before the end of the line. That's the reason for fourth empty string.
这篇关于Python 正则表达式模式 * 未按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!