Python 正则表达式是否与 Ruby 的原子分组等效? [英] Do Python regular expressions have an equivalent to Ruby's atomic grouping?

查看:38
本文介绍了Python 正则表达式是否与 Ruby 的原子分组等效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Ruby 的正则表达式有一个称为原子分组的特性 (?>regexp),描述为 这里,Python 的 re 模块中是否有等价物?

Ruby's regular expressions have a feature called atomic grouping (?>regexp), described here, is there any equivalent in Python's re module?

推荐答案

Python 不直接支持此功能,但您可以通过使用零宽度前瞻断言 ((?=RE)),它从当前点开始匹配您想要的相同语义,将命名组 ((?P<name>RE)) 放在前瞻中,然后使用命名的反向引用(>(?P=name)) 以完全匹配零宽度断言匹配的任何内容.结合在一起,这将为您提供相同的语义,但代价是创建额外的匹配组和大量语法.

Python does not directly support this feature, but you can emulate it by using a zero-width lookahead assert ((?=RE)), which matches from the current point with the same semantics you want, putting a named group ((?P<name>RE)) inside the lookahead, and then using a named backreference ((?P=name)) to match exactly whatever the zero-width assertion matched. Combined together, this gives you the same semantics, at the cost of creating an additional matching group, and a lot of syntax.

例如,您提供的链接给出了 Ruby 示例

For example, the link you provided gives the Ruby example of

/"(?>.*)"/.match('"Quote"') #=> nil

我们可以像这样在 Python 中模拟:

We can emulate that in Python as such:

re.search(r'"(?=(?P<tmp>.*))(?P=tmp)"', '"Quote"') # => None

我们可以证明我正在做一些有用的事情而不仅仅是喷出线路噪音,因为如果我们改变它以便内部组不吃最后的",它仍然匹配:

We can show that I'm doing something useful and not just spewing line noise, because if we change it so that the inner group doesn't eat the final ", it still matches:

re.search(r'"(?=(?P<tmp>[A-Za-z]*))(?P=tmp)"', '"Quote"').groupdict()
# => {'tmp': 'Quote'}

您也可以使用匿名组和数字反向引用,但这会充满线路噪音:

You can also use anonymous groups and numeric backreferences, but this gets awfully full of line-noise:

re.search(r'"(?=(.*))\1"', '"Quote"') # => None

(完全披露:我从 perl 的 perlre 中学到了这个技巧文档,在 (?>...).)

(Full disclosure: I learned this trick from perl's perlre documentation, which mentions it under the documentation for (?>...).)

除了具有正确的语义外,它还具有适当的性能属性.如果我们从 perlre 中移植一个例子:

In addition to having the right semantics, this also has the appropriate performance properties. If we port an example out of perlre:

[nelhage@anarchique:~/tmp]$ cat re.py
import re
import timeit


re_1 = re.compile(r'''\(
                           (
                             [^()]+           # x+
                           |
                             \( [^()]* \)
                           )+
                       \)
                   ''', re.X)
re_2 = re.compile(r'''\(
                           (
                             (?=(?P<tmp>[^()]+ ))(?P=tmp) # Emulate (?> x+)
                           |
                             \( [^()]* \)
                           )+
                       \)''', re.X)

print timeit.timeit("re_1.search('((()' + 'a' * 25)",
                    setup  = "from __main__ import re_1",
                    number = 10)

print timeit.timeit("re_2.search('((()' + 'a' * 25)",
                    setup  = "from __main__ import re_2",
                    number = 10)

我们看到了显着的改进:

We see a dramatic improvement:

[nelhage@anarchique:~/tmp]$ python re.py
96.0800571442
7.41481781006e-05

随着我们扩展搜索字符串的长度,这只会变得更加引人注目.

Which only gets more dramatic as we extend the length of the search string.

这篇关于Python 正则表达式是否与 Ruby 的原子分组等效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆