如何比较列表中的元素并比较Python列表中的键? [英] How to compare elements in a list of lists and compare keys in a list of lists in Python?

查看:108
本文介绍了如何比较列表中的元素并比较Python列表中的键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下顺序:

  seq = [['ATG','ATG','ATG',' ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']] 

这是一个字典键,用于存储每个密码子的氨基酸值(三联体碱基如 ATG,GCT 等)。

  aminoacid = {'TTT':'F','TTC':'F ','TTA':'L','TTG':'L','CTT':'L','CTC':'L','CTA':'L','CTG':'L', 'ATT':'我','ATC':'我','ATA':'我','ATG':'M','GTT':'V','GTC':'V','GTA ':'V','GTG':'V','TCT':'S','TCC':'S','TCA':'S','TCG':'S','CCT': 'P','CCC':'P','CCA':'P','CCG':'P','ACT':'T','ACC':'T','ACA':'T ','ACG':'T','GCT':'A','海湾合作委员会':'A' ,'GCA':'A','GCG':'A','TAT':'Y','TAC':'Y','TAA':'停止','标记':'停止',' CAT':'H','CAC':'H','CAA':'Q','CAG':'Q','AAT':'N','AAC':'N','AAA' :'K','AAG':'K','GAT':'D','GAC':'D','GAA':'E','GAG':'E','TGT':' C','TGC':'C','TGA':'停止','TGG':'W','CGT':'R','CGC':'R','CGA':'R' ,'CG':'R','AGT':'S','AGC':'S','AGA':'R','AGC':'R','GGT':'G',' GGC':'G','GGA':'G','GGG':'G'} 

正如人们可以看到几个密码子可以编码相同的氨基酸(例如。 GGT,GGC,GGA,GGG等所有甘氨酸(G)代码)。这些是同义词(PSyn),如果密码子编码不同的氨基酸,则它们是非同义词(PNonsyn)



在此代码中,我需要执行以下操作:


  1. 对于列表列表中的每个元素,如果碱基发生变化,它们都会编码相同的氨基酸,然后将PSyn的计数增加1,如果它编码不同的氨基酸增量计数PNonsyn减1,那么

      ATG所有M#代码,所有ATG基数都没有变化。所以没有增加数量

    GAC,DAT的GAT; GAA for E;用于三种不同氨基酸的P#代码和CCT用于G #Different碱基的增量计数增加1

    GGT,GGC,GGA,GGG但相同氨基酸的所有代码增加,增加计数1

    OutPut:
    CountPsyn = 1
    CountPNonsyn = 1


  2. 生成与上述序列对应的氨基酸列表。这样:



    输出:['ATG','nonsyn','G']#对于具有不同氨基酸的网站,列表应该说nonsyn和具有相同基数的网站应列出基数


我需要帮助修改以下代码以使程序正常工作。我对如何从字典中调用值并对所有元素进行检查没有信心。
Code Attempted:

  countPsyn = 0 
countPnonsyn = 0
listofaa = []

for i in seq:
for base,value in enumerate(i):
if value [i] == value [i + 1]:#eg。 ['ATG','ATG','ATG','ATG']
listofaa.append(value)

如果value [i]!= value [i + 1]:
如果氨基酸[值] [i] ==氨基酸[值] [i + 1]:#eg。['GCC','GCG','GCA','GCT']
countPsyn = + 1
listofaa.append(aminoacid)
else:#eg。 ['GAC','GAT','GAA','CCT']
countPnonsyn = + 1
listofaa.append('nonsyn')

文件输出可以找到[here] [1] https://eval.in/669107


解决方案

以下是我的解决方案。

  aminoacid = {'GCC':'A','TTT' :'F','TTC':'F','TTA':'L','TTG':'L','CTT':'L','CTC':'L','CTA':' L','CTG':'L','ATT':'我','ATC':'我','ATA':'我','ATG':'M','GTT':'V' ,'GTC':'V','GTA':'V','GTG':'V','TCT':'S','TCC':'S','TCA':'S',' TCG':'S','CCT':'P','CCC':'P','CCA':'P','CCG':'P','ACT':'T','ACC' :'T','ACA':'T','ACG':'T','GCT':'A','GCG':'A','GCA':'A','GCG':'一个,'TAT':'Y','TAC':'Y','TAA':'停止','TAG':'停止','CAT':'H','CAC':'H',' CAA':'Q','CAG':'Q','AAT':'N','AAC':'N','AAA':'K','AAG':'K','GAT' :'D','GAC':'D','GAA':'E','GAG':'E','TGT':'C','TGC':'C','TGA':'停止','TGG':'W','CGT':'R','CGC':'R','CGA':'R','CGG':'R','AGT':'S' ,'AGC':'S','AGA':'R','AGC':'R','CGT':'G','GGC':'G','GGA':'G',' GGG':'G',} 

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA', 'CCT'],['GCC','GCG','GCA','GCT']]

Psyn = 0;
PNonsyn = 0;
输出= [];

#loop遍历列表中的每个列表
for selist中的子列表:
acids = [aminoacid [base] for base in sublist]
if len( set(acids))!= 1:#if有不同的氨基酸,然后nonsync
output.append('nonsync')
PNonsyn + = 1
else:#if相同的氨基酸
if len(set(sublist))== 1:#if same base
output.append(sublist [0]);
else:#if不一样基础
output.append(acids [0]);
Psyn + = 1

打印Psyn =+ str(Psyn)
打印PNonsyn =+ str(PNonsyn)
打印输出

不可否认,这不是对你的代码的修改,但这里有一个巧妙的技巧来取消双 for 循环。给定列表 mylist ,您可以通过调用 set(mylist)找到列表中的所有唯一元素。例如,

 >>> a = ['AGT','AGT','ACG'] 
>>>设置(a)
set(['AGT','ACG'])
>>> len(set(a))
2


I have the following sequence:

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Here is a dictionary key that stores the value of amino acid for each of the codons (Triplet bases like ATG, GCT etc).

aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCC' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','GGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G'}

As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G) ). These are Synonymous (PSyn) and if codons code for different amino acids they are Non-Synonymous (PNonsyn)

In this code, I need to do the following:

  1. For each element in the list of lists, if there is a change in the bases AND they all code for the same amino acid, then increase count of PSyn by 1 and if it codes for different amino acids increment count PNonsyn by 1

    Here,

    ATG all code for M #However, all are ATG's no change in bases. So no increment in count
    
    GAC, GAT for D; GAA for E; and CCT for P #Codes for three different amino acids, increment count by 1
    
    GGT,GGC,GGA, GGG for G #Different bases but all code for same amino acids, increment count by 1
    

    OutPut: CountPsyn = 1 CountPNonsyn = 1

  2. Generate a list of amino acids that corresponds to the above seq. such that:

    Output : ['ATG','nonsyn','G'] #For sites with different aminoacids, the list should say nonsyn and for sites which had identical bases it should list the bases

I need help modifying the following code to get the program to work. I am not confident on how to call values from dictionary and check them against all the elements. Code Attempted:

countPsyn = 0
countPnonsyn = 0
listofaa =[]

for i in seq:
    for base, value in enumerate(i):        
        if value[i] == value[i+1]: #eg. ['ATG','ATG','ATG','ATG'] 
            listofaa.append(value)

        if value[i] != value[i+1]: 
            if aminoacid[value][i] ==  aminoacid[value][i+1]: #eg.['GCC','GCG','GCA','GCT']
                countPsyn =+ 1
                listofaa.append(aminoacid)
            else: #eg. ['GAC','GAT','GAA','CCT']
                countPnonsyn =+ 1
                listofaa.append('nonsyn')

File Output can be found [here][1] https://eval.in/669107

解决方案

Here is my stab at the solution.

aminoacid = {'GCC': 'A' ,'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Psyn = 0;
PNonsyn = 0;
output = [];

#loop through each list in your list of list
for sublist in seq:
    acids = [aminoacid[base] for base in sublist]
    if len(set(acids)) != 1: #if there are different amino acids, then nonsync
        output.append('nonsync')
        PNonsyn += 1
    else: #if same amino acid
        if len(set(sublist)) == 1: #if same base
            output.append(sublist[0]);
        else: #if not same base
            output.append(acids[0]);
            Psyn += 1

print "Psyn = "+ str(Psyn)
print "PNonsyn = "+ str(PNonsyn)
print output

Admittedly it's not a modification of your code, but there is a neat trick here to void the double for loop. Given a list mylist, you could find all uniques elements in a list by calling set(mylist). E.g.

>>> a = ['AGT','AGT','ACG']
>>> set(a)
set(['AGT', 'ACG'])
>>> len(set(a))
2

这篇关于如何比较列表中的元素并比较Python列表中的键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆