re.sub(“.*",“,(replacement)","text")在Python 3.7上的替换次数增加了一倍 [英] re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7
问题描述
在Python 3.7(在Windows 64位上测试)上,使用RegEx .*
替换字符串会使输入字符串重复两次!
On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .*
gives the input string repeated twice!
在Python 3.7.2上:
On Python 3.7.2:
>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)(replacement)'
在Python 3.6.4上:
On Python 3.6.4:
>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'
在Python 2.7.5(32位)上:
On Python 2.7.5 (32 bits):
>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'
怎么了?该如何解决?
推荐答案
This is not a bug, but a bug fix in Python 3.7 from the commit fbb490fd2f38bd817d99c20c05121ad0168a38ee.
在正则表达式中,非零宽度的匹配将指针位置移动到匹配的末尾,以便下一个断言(无论是否为零宽度)都可以从匹配之后的位置继续进行匹配.因此,在您的示例中,.*
贪婪地匹配并消耗了整个字符串之后,指针随后移至字符串末尾的事实实际上仍然为该位置的零宽度匹配留有余地",如下所示:从以下代码中可以明显看出,它们在Python 2.7、3.6和3.7中的行为相同:
In regex, a non-zero-width match moves the pointer position to the end of the match, so that the next assertion, zero-width or not, can continue to match from the position following the match. So in your example, after .*
greedily matches and consumes the entire string, the fact that the pointer is then moved to the end of the string still actually leaves "room" for a zero-width match at that position, as can be evident from the following code, which behaves the same in Python 2.7, 3.6 and 3.7:
>>> re.findall(".*", 'sample text')
['sample text', '']
因此,此错误修复程序是在非零宽度匹配之后立即替换零宽度匹配,现在可以用替换文本正确地替换两个匹配.
So the bug fix, which is about replacement of a zero-width match right after a non-zero-width match, now correctly replaces both matches with the replacement text.
这篇关于re.sub(“.*",“,(replacement)","text")在Python 3.7上的替换次数增加了一倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!