在 Python 中生成正则表达式可以匹配的值列表 [英] Generating a list of values a regex COULD match in Python

查看:33
本文介绍了在 Python 中生成正则表达式可以匹配的值列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用正则表达式作为输入,并从中生成正则表达式匹配的所有可能值.

I'm trying to use a regex as an input, and from there generate all the possible values that the regex would match.

因此,例如,如果正则表达式是以 a 开头并以 c 结尾的三个字母的单词",那么代码将生成一个包含值 [aac, abc, acc, adc, a1c ....].

So, for example, if the regex is "three-letter words starting with a, and ending in c," then the code would generate a list with the values [aac, abc, acc, adc, a1c....].

有没有简单的方法可以做到这一点?我正在使用 python.

Is there an easy way to do this? I'm using python.

推荐答案

这是一个应该有效的蛮力解决方案.它的运行时间为 O(L^max_length)(其中 L 是字母表的大小),因此使用时风险自负.

Here's a brute force solution that should work. It has a running time of O(L^max_length) (where L is the size of the alphabet), so use it at your own risk.

def all_matching_strings(alphabet, max_length, regex):
"""Find the list of all strings over 'alphabet' of length up to 'max_length' that match 'regex'"""

if max_length == 0: return 

L = len(alphabet)
for N in range(1, max_length+1):
    indices = [0]*N
    for z in xrange(L**N):
        r = ''.join(alphabet[i] for i in indices)
        if regex.match(r):                
           yield(r)

        i = 0
        indices[i] += 1
        while (i<N) and (indices[i]==L):
            indices[i] = 0
            i += 1
            if i<N: indices[i] += 1

return

示例用法:

alphabet = 'abcdef1234567890'
import re
regex = re.compile('f*[1-3]+$')
for r in all_matching_strings(alphabet, 5, regex): 
    print r

这将输出长度为 5 的所有字符串,从 f 的序列开始,然后是 1-3 的非空序列,然后结束:

which would output all strings up to length 5, starting with a sequence of f's, and then a non empty sequence of 1-3, then ending:

1
2
3
f1
11
21
31
f2
12
22
32
f3
13
23
33
ff1
[more output omitted...]

这篇关于在 Python 中生成正则表达式可以匹配的值列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆