确定是否正规前pression只匹配固定长度的字符串 [英] determine if regular expression only matches fixed-length strings

查看:123
本文介绍了确定是否正规前pression只匹配固定长度的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有确定是否正规前pression只匹配固定长度的字符串的方法吗?
我的想法是将扫描*,+和?于是,一些聪明的逻辑将需要寻找{M,N}其中m!= N。
这是没有必要采取|运营商考虑。结果
小例子:^ \\ D {4}是固定长度的; ^ \\ D {4,5}或^ \\ D +是可变长度

Is there a way of determining if the regular expression only matches fixed-length strings ? My idea would be to scan for *,+ and ? Then, some intelligent logic would be required to to look for {m,n} where m!=n. It is not necessary to take the | operator into account.
Small example: ^\d{4} is fixed-length; ^\d{4,5} or ^\d+ are variable-length

我使用PCRE。

感谢。

保罗Praet

推荐答案

那么,你可以利用一个事实,即Python的正则表达式引擎只允许固定长度的普通前pressions在向后断言:

Well, you could make use of the fact that Python's regex engine only allows fixed-length regular expressions in lookbehind assertions:

import re
regexes = [r".x{2}(abc|def)", # fixed
           r"a|bc",           # variable/finite
           r"(.)\1",          # fixed
           r".{0,3}",         # variable/finite
           r".*"]             # variable/infinite

for regex in regexes:
    try:
        r = re.compile("(?<=" + regex + ")")
    except:
        print("Not fixed length: {}".format(regex))
    else:
        print("Fixed length: {}".format(regex))

将输出

Fixed length: .x{2}(abc|def)
Not fixed length: a|bc
Fixed length: (.)\1
Not fixed length: .{0,3}
Not fixed length: .*

我假设的正则表达式本身是有效的。

I'm assuming that the regex itself is valid.

现在,如何知道Python的正则表达式是固定长度或没有?刚刚看过的源头 - 在 sre_parse.py ,有一个叫方法的getWidth()返回组成的元组最低和最高的可能长度,如果这些都不是在向后断言相等, re.compile()将引发一个错误。在的getWidth()方法通过正则表达式递归散步:

Now, how does Python know whether the regex is fixed-length or not? Just read the source - in sre_parse.py, there is a method called getwidth() that returns a tuple consisting of the lowest and the highest possible length, and if these are not equal in a lookbehind assertion, re.compile() will raise an error. The getwidth() method walks through the regex recursively:

def getwidth(self):
    # determine the width (min, max) for this subpattern
    if self.width:
        return self.width
    lo = hi = 0
    UNITCODES = (ANY, RANGE, IN, LITERAL, NOT_LITERAL, CATEGORY)
    REPEATCODES = (MIN_REPEAT, MAX_REPEAT)
    for op, av in self.data:
        if op is BRANCH:
            i = sys.maxsize
            j = 0
            for av in av[1]:
                l, h = av.getwidth()
                i = min(i, l)
                j = max(j, h)
            lo = lo + i
            hi = hi + j
        elif op is CALL:
            i, j = av.getwidth()
            lo = lo + i
            hi = hi + j
        elif op is SUBPATTERN:
            i, j = av[1].getwidth()
            lo = lo + i
            hi = hi + j
        elif op in REPEATCODES:
            i, j = av[2].getwidth()
            lo = lo + int(i) * av[0]
            hi = hi + int(j) * av[1]
        elif op in UNITCODES:
            lo = lo + 1
            hi = hi + 1
        elif op == SUCCESS:
            break
    self.width = int(min(lo, sys.maxsize)), int(min(hi, sys.maxsize))
    return self.width

这篇关于确定是否正规前pression只匹配固定长度的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆