使用Python对DNA链进行反向互补 [英] Reverse complement of DNA strand using Python

查看:320
本文介绍了使用Python对DNA链进行反向互补的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DNA序列,想用Python获得它的反向互补.它在CSV文件的一列中,我想将反向补语写入同一文件中的另一列.棘手的部分是,除了A,T,G和C之外,还有一些单元格.我能够用这段代码进行反向补码:

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    bases = [complement[base] for base in bases] 
    return ''.join(bases)
    def reverse_complement(s):
        return complement(s[::-1])

    print "Reverse Complement:"
    print(reverse_complement("TCGGGCCC"))

但是,当我尝试使用下面的代码查找补语词典中不存在的项目时,我只得到了最后一个底数的补语.它不会迭代.我想知道如何解决它.

However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    for element in bases:
        if element not in complement:
            print element  
        letters = [complement[base] for base in element] 
        return ''.join(letters)
def reverse_complement(seq):
    return complement(seq[::-1])

print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))

推荐答案

如果键不在词典中,则词典的get方法允许您指定默认值.作为预处理步骤,我会将所有非"ATGC"碱基映射为单个字母(或标点符号或数字或序列中不会显示的任何内容),然后反转序列,然后将单个字母替换为原始字母.或者,您可以先将其反转,然后搜索并将sni之类的内容替换为ins.

The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.

alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 

def reverse_complement(seq):    
    for k,v in alt_map.iteritems():
        seq = seq.replace(k,v)
    bases = list(seq) 
    bases = reversed([complement.get(base,base) for base in bases])
    bases = ''.join(bases)
    for k,v in alt_map.iteritems():
        bases = bases.replace(v,k)
    return bases

>>> seq = "TCGGinsGCCC"
>>> print "Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA

这篇关于使用Python对DNA链进行反向互补的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆