Python遍历字符串并与通配符模式匹配 [英] Python looping through string and matching it with with wildcard pattern
问题描述
string1="abc"
string2="abdabcdfg"
我想查找string1是否为string2的子字符串.但是,存在通配符,例如"."
可以是任何字母,y
可以是"a"
或"d"
,x
可以是"b"
或"c"
.
结果,".yx"
将成为string2
的子字符串.
I want to find if string1 is substring of string2. However, there are wildcard characters like "."
can be any letter, y
can be "a"
or "d"
, x
can be "b"
or "c"
.
as a result, ".yx"
will be substring of string2
.
如何仅使用一个循环对其进行编码?我想遍历string2并在每个索引处进行比较.我尝试过字典,但我想使用循环 我的代码:
How can I code it using only one loop? I want to loop through string2 and make comparisons at each index. i tried dictionary but I wand to use loop my code:
def wildcard(string,substring):
sum=""
table={'A': '.', 'C': '.', 'G': '.', 'T': '.','A': 'x', 'T': 'x', 'C': 'y', 'G': 'y'}
for c in strand:
if (c in table) and table[c] not in sum:
sum+=table[c]
elif c not in table:
sum+=c
if sum==substring:
return True
else:
return False
print wildcard("TTAGTTA","xyT.")#should be true
推荐答案
我知道您正在专门使用循环来寻求解决方案.但是,我想采用另一种方法:您可以轻松地将模式转换为正则表达式一个>.这是一种类似于字符串模式的语言,但功能更强大.然后,您可以使用re
模块来检查是否可以在字符串中找到该正则表达式(以及您的子字符串模式).
I know you are specifically asking for a solution using a loop. However, I would suppose a different approach: You can easily translate your pattern to a regular expression. This is a similar language for string patterns, just much more powerful. You can then use the re
module to check whether that regular expression (and thus your substring pattern) can be found in the string.
def to_regex(pattern, table):
# join substitutions from table, using c itself as default
return ''.join(table.get(c, c) for c in pattern)
import re
symbols = {'.': '[a-z]', '#': '[ad]', '+': '[bc]'}
print re.findall(to_regex('.+#', symbols), 'abdabcdfg')
如果您更喜欢动手"解决方案,则可以通过循环使用它.
If you prefer a more "hands-on" solution, you can use this, using loops.
def find_matches(pattern, table, string):
for i in range(len(string) - len(pattern) + 1):
# for each possible starting position, check the pattern
for j, c in enumerate(pattern):
if string[i+j] not in table.get(c, c):
break # character does not match
else:
# loop completed without triggering the break
yield string[i : i + len(pattern)]
symbols = {'.': 'abcdefghijklmnopqrstuvwxyz', '#': 'ad', '+': 'bc'}
print list(find_matches('.+#', symbols, 'abdabcdfg'))
两种情况下的输出均为['abd', 'bcd']
,即使用这些替换可以找到两次.
Output in both cases is ['abd', 'bcd']
, i.e. it can be found two times, using these substitutions.
这篇关于Python遍历字符串并与通配符模式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!