Python遍历字符串并与通配符模式匹配 [英] Python looping through string and matching it with with wildcard pattern

查看:291
本文介绍了Python遍历字符串并与通配符模式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

string1="abc"
string2="abdabcdfg"

我想查找string1是否为string2的子字符串.但是,存在通配符,例如"."可以是任何字母,y可以是"a""d"x可以是"b""c". 结果,".yx"将成为string2的子字符串.

I want to find if string1 is substring of string2. However, there are wildcard characters like "." can be any letter, y can be "a" or "d", x can be "b" or "c". as a result, ".yx" will be substring of string2.

如何仅使用一个循环对其进行编码?我想遍历string2并在每个索引处进行比较.我尝试过字典,但我想使用循环 我的代码:

How can I code it using only one loop? I want to loop through string2 and make comparisons at each index. i tried dictionary but I wand to use loop my code:

def wildcard(string,substring):
    sum=""
    table={'A': '.', 'C': '.', 'G': '.', 'T': '.','A': 'x', 'T': 'x', 'C': 'y', 'G': 'y'}
    for c in strand:
        if (c in table) and table[c] not in sum:
            sum+=table[c]
        elif c not in table:
            sum+=c
    if sum==substring:
        return True
    else:
        return False

print wildcard("TTAGTTA","xyT.")#should be true

推荐答案

我知道您正在专门使用循环来寻求解决方案.但是,我想采用另一种方法:您可以轻松地将模式转换为正则表达式.这是一种类似于字符串模式的语言,但功能更强大.然后,您可以使用re模块来检查是否可以在字符串中找到该正则表达式(以及您的子字符串模式).

I know you are specifically asking for a solution using a loop. However, I would suppose a different approach: You can easily translate your pattern to a regular expression. This is a similar language for string patterns, just much more powerful. You can then use the re module to check whether that regular expression (and thus your substring pattern) can be found in the string.

def to_regex(pattern, table):
    # join substitutions from table, using c itself as default
    return ''.join(table.get(c, c) for c in pattern)

import re
symbols = {'.': '[a-z]', '#': '[ad]', '+': '[bc]'}
print re.findall(to_regex('.+#', symbols), 'abdabcdfg')

如果您更喜欢动手"解决方案,则可以通过循环使用它.

If you prefer a more "hands-on" solution, you can use this, using loops.

def find_matches(pattern, table, string):
    for i in range(len(string) - len(pattern) + 1):
        # for each possible starting position, check the pattern
        for j, c in enumerate(pattern):
            if string[i+j] not in table.get(c, c):
                break # character does not match
        else:
            # loop completed without triggering the break
            yield string[i : i + len(pattern)]

symbols = {'.': 'abcdefghijklmnopqrstuvwxyz', '#': 'ad', '+': 'bc'}
print list(find_matches('.+#', symbols, 'abdabcdfg'))

两种情况下的输出均为['abd', 'bcd'],即使用这些替换可以找到两次.

Output in both cases is ['abd', 'bcd'], i.e. it can be found two times, using these substitutions.

这篇关于Python遍历字符串并与通配符模式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆