再次计算重叠的正则表达式匹配 [英] Count overlapping regex matches once again

查看:56
本文介绍了再次计算重叠的正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 Python 获取重叠正则表达式匹配的数量?

我已经阅读并尝试了来自 这个that 和其他一些问题,但没有发现适合我的场景.这是:

  • 输入示例字符串:akka
  • 搜索模式:a.*k

一个合适的函数应该产生 2 作为匹配的数量,因为有两个可能的结束位置(k 个字母).

模式也可能更复杂,例如 a.*k.*a 也应该在 akka 中匹配两次(因为有两个 k 在中间).

解决方案

是的,它丑陋且未优化,但似乎有效.这是对所有可能的但独特的变体

的简单尝试

def myregex(pattern,text,dir=0):进口重新m = re.search(模式,文本)如果米:产量 m.group(0)如果 len(m.group('suffix')):for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[1:],m.group('end')),1):收益率如果目录<1:对于 r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[:-1],m.group('end')),-1):收益率def myprocess(模式,文本):零件 = pattern.split("*")对于范围内的 i (0, len(parts)-1 ):资源="对于范围内的 j (0, len(parts) ):如果 j==0:res+="(?P<前缀>"如果 j==i:res+=")(?P<后缀>"res+=零件[j]如果 j==i+1:res+=")(?P"如果 j

测试:

<预><代码>>>>mycount('a*b*c','abc')设置(['abc'])>>>mycount('a*k','akka')设置(['akk','ak'])>>>mycount('b*o','bboo')set(['bbo', 'bboo', 'bo', 'boo'])>>>mycount('b*o','bb123oo')设置(['b123o','bb123oo','bb123o','b123oo'])>>>mycount('b*o','ffbfbffffoff')设置(['bfbfffofo','bfbfffo','bfffofo','bfffo'])

How can I obtain the number of overlapping regex matches using Python?

I've read and tried the suggestions from this, that and a few other questions, but found none that would work for my scenario. Here it is:

  • input example string: akka
  • search pattern: a.*k

A proper function should yield 2 as the number of matches, since there are two possible end positions (k letters).

The pattern might also be more complicated, for example a.*k.*a should also be matched twice in akka (since there are two k's in the middle).

解决方案

Yes, it is ugly and unoptimized but it seems to be working. This is a simple try of all possible but unique variants

def myregex(pattern,text,dir=0):
    import re
    m = re.search(pattern, text)
    if m:
        yield m.group(0)
        if len(m.group('suffix')):
            for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[1:],m.group('end')),1):
                yield r
            if dir<1 :
                for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[:-1],m.group('end')),-1):
                    yield r


def myprocess(pattern, text):    
    parts = pattern.split("*")    
    for i in range(0, len(parts)-1 ):
        res=""
        for j in range(0, len(parts) ):
            if j==0:
                res+="(?P<prefix>"
            if j==i:
                res+=")(?P<suffix>"
            res+=parts[j]
            if j==i+1:
                res+=")(?P<end>"
            if j<len(parts)-1:
                if j==i:
                    res+=".*"
                else:
                    res+=".*?"
            else:
                res+=")"
        for r in myregex(res,text):
            yield r

def mycount(pattern, text):
    return set(myprocess(pattern, text))

test:

>>> mycount('a*b*c','abc')
set(['abc'])
>>> mycount('a*k','akka')
set(['akk', 'ak'])
>>> mycount('b*o','bboo')
set(['bbo', 'bboo', 'bo', 'boo'])
>>> mycount('b*o','bb123oo')
set(['b123o', 'bb123oo', 'bb123o', 'b123oo'])
>>> mycount('b*o','ffbfbfffofoff')
set(['bfbfffofo', 'bfbfffo', 'bfffofo', 'bfffo'])

这篇关于再次计算重叠的正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆