如何使用python编程将一组DNA序列转换为蛋白质序列? [英] How to convert a set of DNA sequences into protein sequences using python programming?

查看:1000
本文介绍了如何使用python编程将一组DNA序列转换为蛋白质序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python创建一个程序,该程序将一组DNA序列转换为氨基酸(蛋白质)序列.然后,我需要找到一个特定的子序列,并计算存在该特定子序列的序列数.这是我到目前为止的代码:

I am using python to create a program that converts a set of DNA sequences into amino acid (protein) sequences. I then need to find a specific subsequence, and count the number of sequences in which this specific subsequence is present. This is the code I have so far:

#Open cDNA_sequences file and read in line by line
with open('cDNA_sequences.csv', 'r') as results:

    for line in results:

        columns = line.rstrip("\n").split(",") #remove end of line characters and split commas to produce a list
        ensemblID = columns[0] #ensemblID is first element in our list
        dna_seq = columns[1] #dna_seq is second element in our list
        genetic code = {


        "UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
        "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
        "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
        "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
        "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
        "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
        "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
        "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
        "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
        "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
        "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
        "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
        "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
        "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
        "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
        "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",} #genetic code, telling into which amino acids the DNA triplets translate

        for i in range (0, len(dna_seq), 3):
            codon = dna_seq[i:i+3]
            protein += genetic_code [codon]
        print (protein)                    

    enterokinase_motif = "DDDDK"
    proline_motif = "DDDDKP"
    motif_number = 0
    if enterokinase_motif in line:
        motif_number = motif_number + 1;
    elif proline_number in line:
        motif_number = motif_number;
    else: 
        motif_number = motif_number
    print ("The number of sequences containing one or more enterokinase motifs is []".format(motif_number))

我在编写将DNA序列转换为蛋白质序列的代码时遇到麻烦.

I am having trouble writing the code for the conversion of the DNA sequences to Protein Sequences.

推荐答案

您应该阅读有关 Biopython 的信息.它具有与生物学和生物信息学相关的便捷功能和类.

You should read about Biopython. It comes with handy functions and classes related to Biology and Bioinformatics.

它具有执行所需功能的功能: Bio.Seq.translate

It has a function that does what you are looking for: Bio.Seq.translate

这里有代码示例:

>>> coding_dna = "GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
>>> translate(coding_dna)
'VAIVMGR*KGAR*'
>>> translate(coding_dna, stop_symbol="@")
'VAIVMGR@KGAR@'
>>> translate(coding_dna, to_stop=True)
'VAIVMGR'

这篇关于如何使用python编程将一组DNA序列转换为蛋白质序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆