Python 将字符串中的多个位置分别切换为多个字母 [英] Python switching multiple positions in string each to multiple letters
问题描述
我正在尝试编写一个 Python 代码,用于在 DNA 序列中查找限制性内切酶位点.限制酶在特定的 DNA 序列上切割,但有些不是那么严格,例如 XmnI 切割此序列:
I am trying to write a python code that finds restriction enzyme sites within a sequence of DNA. Restriction enzymes cut at specific DNA sequences, however some are not so strict, for example XmnI cuts this sequence:
GAANNNNTTC
其中 N 可以是任何核苷酸(A、C、G 或 T).如果我的数学是正确的,那就是 4^4 = 256 个可以切割的独特序列.我想列出这 256 个短序列,然后根据(更长的)输入 DNA 序列检查每个序列.但是,我很难生成 256 个序列.这是我到目前为止所拥有的:
Where N can be any nucleotide (A, C, G, or T). If my math is right thats 4^4 = 256 unique sequences that it can cut. I want to make a list of these 256 short sequences, then check each one against a (longer) input DNA sequence. However, I'm having a hard time generating the 256 sequences. Here's what I have so far:
cutsequencequery = "GAANNNNTTC"
Nseq = ["A", "C", "G", "T"]
querylist = []
if "N" in cutsequencequery:
Nlist = [cutsequencequery.replace("N", t) for t in Nseq]
for j in list(Nlist):
querylist.append(j)
for i in querylist:
print(i)
print(len(querylist))
这是输出:
GAAAAAATTC
GAACCCCTTC
GAAGGGGTTC
GAATTTTTTC
4
因此它将每个 N 切换为 A、C、G 和 T,但我认为我需要另一个循环(或 3 个?)来生成所有 256 种组合.有没有一种我没有看到的有效方法来做到这一点?
So it's switching each N to either A, C, G, and T, but I think I need another loop (or 3?) to generate all 256 combinations. Is there an efficient way to do this that I'm not seeing?
推荐答案
也许你应该看看 python 的 itertools 库,其中包括 product
,它使用迭代的每个组合创建一个迭代,因此:
Maybe you should take a look into python's itertools library, which include product
which creates an iterable with every combination of iterables, therefore:
from itertools import product
cutsequencequery = "GAANNNNTTC"
nseq = ["A", "C", "G", "T"]
size = cutsequencequery.count('N')
possibilities = product(*[nseq for i in range(size)])
# = ('A', 'A', 'A', 'A'), ... , ('T', 'T', 'T', 'T')
# len(list(possibilities)) = 256 = 4^4, as expected
s = set()
for n in possibilities:
print(''.join(n)) # = 'AAAA', ..., 'TTTT'
new_sequence = cutsequencequery.replace('N' * size, ''.join(n))
s.add(new_sequence)
print(new_sequence) # = 'GAAAAAATTC', ..., 'GAATTTTTTC'
print(len(s)) # 256 unique sequences
这篇关于Python 将字符串中的多个位置分别切换为多个字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!