再次计算重叠的正则表达式匹配 [英] Count overlapping regex matches once again

查看：56 发布时间：2021/6/14 20:17:24 python regex string pattern-matching

本文介绍了再次计算重叠的正则表达式匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何使用 Python 获取重叠正则表达式匹配的数量?

我已经阅读并尝试了来自这个，that 和其他一些问题，但没有发现适合我的场景.这是:

输入示例字符串:akka
搜索模式:a.*k

一个合适的函数应该产生 2 作为匹配的数量，因为有两个可能的结束位置(k 个字母).

模式也可能更复杂，例如 a.*k.*a 也应该在 akka 中匹配两次(因为有两个 k 在中间).

解决方案

是的，它丑陋且未优化，但似乎有效.这是对所有可能的但独特的变体

的简单尝试

def myregex(pattern,text,dir=0):进口重新m = re.search(模式，文本)如果米:产量 m.group(0)如果 len(m.group('suffix')):for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[1:],m.group('end')),1):收益率如果目录<1:对于 r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[:-1],m.group('end')),-1):收益率def myprocess(模式，文本):零件 = pattern.split("*")对于范围内的 i (0, len(parts)-1 ):资源="对于范围内的 j (0, len(parts) ):如果 j==0:res+="(?P<前缀>"如果 j==i:res+=")(?P<后缀>"res+=零件[j]如果 j==i+1:res+=")(?P"如果 j

测试:

<预><代码>>>>mycount('a*b*c','abc')设置(['abc'])>>>mycount('a*k','akka')设置(['akk'，'ak'])>>>mycount('b*o','bboo')set(['bbo', 'bboo', 'bo', 'boo'])>>>mycount('b*o','bb123oo')设置(['b123o'，'bb123oo'，'bb123o'，'b123oo'])>>>mycount('b*o','ffbfbffffoff')设置(['bfbfffofo'，'bfbfffo'，'bfffofo'，'bfffo'])

How can I obtain the number of overlapping regex matches using Python?

I've read and tried the suggestions from this, that and a few other questions, but found none that would work for my scenario. Here it is:

input example string: akka
search pattern: a.*k

A proper function should yield 2 as the number of matches, since there are two possible end positions (k letters).

The pattern might also be more complicated, for example a.*k.*a should also be matched twice in akka (since there are two k's in the middle).

解决方案

Yes, it is ugly and unoptimized but it seems to be working. This is a simple try of all possible but unique variants

def myregex(pattern,text,dir=0):
    import re
    m = re.search(pattern, text)
    if m:
        yield m.group(0)
        if len(m.group('suffix')):
            for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[1:],m.group('end')),1):
                yield r
            if dir<1 :
                for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[:-1],m.group('end')),-1):
                    yield r


def myprocess(pattern, text):    
    parts = pattern.split("*")    
    for i in range(0, len(parts)-1 ):
        res=""
        for j in range(0, len(parts) ):
            if j==0:
                res+="(?P<prefix>"
            if j==i:
                res+=")(?P<suffix>"
            res+=parts[j]
            if j==i+1:
                res+=")(?P<end>"
            if j<len(parts)-1:
                if j==i:
                    res+=".*"
                else:
                    res+=".*?"
            else:
                res+=")"
        for r in myregex(res,text):
            yield r

def mycount(pattern, text):
    return set(myprocess(pattern, text))

test:

>>> mycount('a*b*c','abc')
set(['abc'])
>>> mycount('a*k','akka')
set(['akk', 'ak'])
>>> mycount('b*o','bboo')
set(['bbo', 'bboo', 'bo', 'boo'])
>>> mycount('b*o','bb123oo')
set(['b123o', 'bb123oo', 'bb123o', 'b123oo'])
>>> mycount('b*o','ffbfbfffofoff')
set(['bfbfffofo', 'bfbfffo', 'bfffofo', 'bfffo'])

这篇关于再次计算重叠的正则表达式匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

再次计算重叠的正则表达式匹配 [英] Count overlapping regex matches once again

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

再次计算重叠的正则表达式匹配 [英] Count overlapping regex matches once again

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭