Python 将字符串中的多个位置分别切换为多个字母 [英] Python switching multiple positions in string each to multiple letters

查看:73
本文介绍了Python 将字符串中的多个位置分别切换为多个字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个 Python 代码,用于在 DNA 序列中查找限制性内切酶位点.限制酶在特定的 DNA 序列上切割,但有些不是那么严格,例如 XmnI 切割此序列:

I am trying to write a python code that finds restriction enzyme sites within a sequence of DNA. Restriction enzymes cut at specific DNA sequences, however some are not so strict, for example XmnI cuts this sequence:

GAANNNNTTC

其中 N 可以是任何核苷酸(A、C、G 或 T).如果我的数学是正确的,那就是 4^4 = 256 个可以切割的独特序列.我想列出这 256 个短序列,然后根据(更长的)输入 DNA 序列检查每个序列.但是,我很难生成 256 个序列.这是我到目前为止所拥有的:

Where N can be any nucleotide (A, C, G, or T). If my math is right thats 4^4 = 256 unique sequences that it can cut. I want to make a list of these 256 short sequences, then check each one against a (longer) input DNA sequence. However, I'm having a hard time generating the 256 sequences. Here's what I have so far:

cutsequencequery = "GAANNNNTTC"
Nseq = ["A", "C", "G", "T"]
querylist = []
if "N" in cutsequencequery:
    Nlist = [cutsequencequery.replace("N", t) for t in Nseq]
    for j in list(Nlist):
        querylist.append(j)

for i in querylist:
    print(i)
print(len(querylist))

这是输出:

GAAAAAATTC
GAACCCCTTC
GAAGGGGTTC
GAATTTTTTC
4

因此它将每个 N 切换为 A、C、G 和 T,但我认为我需要另一个循环(或 3 个?)来生成所有 256 种组合.有没有一种我没有看到的有效方法来做到这一点?

So it's switching each N to either A, C, G, and T, but I think I need another loop (or 3?) to generate all 256 combinations. Is there an efficient way to do this that I'm not seeing?

推荐答案

也许你应该看看 python 的 itertools 库,其中包括 product,它使用迭代的每个组合创建一个迭代,因此:

Maybe you should take a look into python's itertools library, which include product which creates an iterable with every combination of iterables, therefore:

from itertools import product

cutsequencequery = "GAANNNNTTC"
nseq = ["A", "C", "G", "T"]

size = cutsequencequery.count('N')

possibilities = product(*[nseq for i in range(size)]) 
# = ('A', 'A', 'A', 'A'), ... , ('T', 'T', 'T', 'T') 
# len(list(possibilities)) = 256 = 4^4, as expected

s = set()
for n in possibilities:
    print(''.join(n)) # = 'AAAA', ..., 'TTTT'
    new_sequence = cutsequencequery.replace('N' * size, ''.join(n))
    
    s.add(new_sequence)
    print(new_sequence) # = 'GAAAAAATTC', ..., 'GAATTTTTTC'
print(len(s)) # 256 unique sequences

这篇关于Python 将字符串中的多个位置分别切换为多个字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆