在python 3.7中检索具有零长度匹配的re.sub()的python 3.6处理 [英] Retrieving python 3.6 handling of re.sub() with zero length matches in python 3.7
问题描述
零长度匹配的处理在 python 3.7 中发生了变化.考虑以下 Python 3.6(及之前版本):
<预><代码>>>>进口重新>>>打印(re.sub('a*', 'x', 'bac'))xbxcx>>>打印(re.sub('.*', 'x', 'bac'))X我们在 python 3.7 中得到以下内容:
<预><代码>>>>进口重新>>>打印(re.sub('a*', 'x', 'bac'))xbxxxcx>>>打印(re.sub('.*', 'x', 'bac'))xx我理解这是 PCRE 的标准行为.此外, re.finditer() 似乎总是检测到额外的匹配:
<预><代码>>>>对于 re.finditer('a*', 'bac') 中的 m:... 打印(m.start(0), m.end(0), m.group(0))...0 01 2 一2 23 3也就是说,我对检索 python 3.6 的行为很感兴趣(这是一个业余项目,它实现了 sed在python中).
我可以提供以下解决方案:
def sub36(正则表达式,替换,字符串):编译 = 重新编译(正则表达式)类匹配(对象):def __init__(self):self.prevmatch = 无def __call__(self, match):尝试:if match.group(0) == '' and self.prevmatch and match.start(0) == self.prevmatch.end(0):返回 ''别的:返回 re._expand(编译、匹配、替换)最后:self.prevmatch = 匹配返回compiled.sub(Match(), string)
给出:
<预><代码>>>>打印(re.sub('a*', 'x', 'bac'))xbxxxcx>>>打印(sub36('a*', 'x', 'bac'))xbxcx>>>打印(re.sub('.*', 'x', 'bac'))xx>>>打印(sub36('.*', 'x', 'bac'))X但是,对于这些示例,这似乎非常精巧.
为 re.sub() 零长度匹配与 python 3.7 实现 python 3.6 行为的正确方法是什么?
你的解决方案可能在 regex egg:
Regex Egg 介绍
<块引用>这个正则表达式实现向后兼容标准‘re’ 模块,但提供了额外的功能.零宽度匹配的 re 模块的行为在 Python 中发生了变化3.7,并且此模块在为 Python 3.7 编译时将遵循该行为.
<小时>
安装:
pip 安装正则表达式
<小时>
用法:
使用regex
,您可以指定版本(V0
, V1
) 编译的正则表达式模式,即:
# Python 3.7 及更高版本导入正则表达式>>>regex.sub('.*', 'x', 'test')'xx'>>>regex.sub('.*?', '|', 'test')'|||||||||'# Python 3.6 及更早版本导入正则表达式>>>regex.sub('(?V0).*', 'x', 'test')'X'>>>regex.sub('(?V1).*', 'x', 'test')'xx'>>>regex.sub('(?V0).*?', '|', 'test')'|t|e|s|t|'>>>regex.sub('(?V1).*?', '|', 'test')'|||||||||'
<小时>
注意:
<块引用>版本可以通过VERSION0
或V0
标志,或(?V0)
来表示模式.
<小时>
来源:
正则表达式线程 - issue2636
正则表达式 2018.11.22
handling of zero length matches has changed with python 3.7. Consider the following with python 3.6 (and previous):
>>> import re
>>> print(re.sub('a*', 'x', 'bac'))
xbxcx
>>> print(re.sub('.*', 'x', 'bac'))
x
We get the following with python 3.7:
>>> import re
>>> print(re.sub('a*', 'x', 'bac'))
xbxxcx
>>> print(re.sub('.*', 'x', 'bac'))
xx
I understand this is the standard behavior of PCRE. Furthermore, re.finditer() seems to have always detected the additional match:
>>> for m in re.finditer('a*', 'bac'):
... print(m.start(0), m.end(0), m.group(0))
...
0 0
1 2 a
2 2
3 3
That said, I'm interested in retrieving the behavior of python 3.6 (this is for a hobby project implementing sed in python).
I can come with the following solution:
def sub36(regex, replacement, string):
compiled = re.compile(regex)
class Match(object):
def __init__(self):
self.prevmatch = None
def __call__(self, match):
try:
if match.group(0) == '' and self.prevmatch and match.start(0) == self.prevmatch.end(0):
return ''
else:
return re._expand(compiled, match, replacement)
finally:
self.prevmatch = match
return compiled.sub(Match(), string)
which gives:
>>> print(re.sub('a*', 'x', 'bac'))
xbxxcx
>>> print(sub36('a*', 'x', 'bac'))
xbxcx
>>> print(re.sub('.*', 'x', 'bac'))
xx
>>> print(sub36('.*', 'x', 'bac'))
x
However, this seems very crafted for these examples.
What would be the right way to implement python 3.6 behavior for re.sub() zero length matches with python 3.7?
Your solution may be in the regex egg:
Regex Egg Introduction
This regex implementation is backwards-compatible with the standard ‘re’ module, but offers additional functionality. The re module’s behaviour with zero-width matches changed in Python 3.7, and this module will follow that behaviour when compiled for Python 3.7.
Installation:
pip install regex
Usage:
With regex
, you can specify the version (V0
, V1
) which regex pattern will be compiled with, i.e.:
# Python 3.7 and later
import regex
>>> regex.sub('.*', 'x', 'test')
'xx'
>>> regex.sub('.*?', '|', 'test')
'|||||||||'
# Python 3.6 and earlier
import regex
>>> regex.sub('(?V0).*', 'x', 'test')
'x'
>>> regex.sub('(?V1).*', 'x', 'test')
'xx'
>>> regex.sub('(?V0).*?', '|', 'test')
'|t|e|s|t|'
>>> regex.sub('(?V1).*?', '|', 'test')
'|||||||||'
Note:
Version can be indicated by
VERSION0
orV0
flag, or(?V0)
in the pattern.
Sources:
Regex thread - issue2636
regex 2018.11.22
这篇关于在python 3.7中检索具有零长度匹配的re.sub()的python 3.6处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!