python列中字母的频率 [英] frequency of letters in column python

查看:99
本文介绍了python列中字母的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算所有列中每个字母的出现频率: 例如,我有这三个序列:

I want to calculate the frequency of occurrence of each letter in all columns: for example I have this three sequences :

seq1=AATC
seq2=GCCT
seq3=ATCA

在这里,我们有:在第一列中,频率'A'为2,'G'为1. 对于第二列:"A"的频率为1,"C"的频率为1,而"T"的频率为1. 首先,我尝试编写计算频率的代码,我尝试这样做:

here, we have: in the first column frequency of 'A' is 2 , 'G' is 1 . for the second column : the frequency of 'A' is 1, 'C' is 1 and 'T' is 1. (the same thing in the rest of column) first, I try to do the code of calculating frequency I try this:

例如:

s='AATC'

dic={}
for x in s:
    dic[x]=s.count(x)

这给出:{'A':2,'T':1,'C':1} 现在,我想将此应用于列.为此,我使用以下指令:

this gives: {'A':2,'T':1,'C':1} now, I want to apply this on columns.for that I use this instruction:

f=list(zip(seq1,seq2,seq3))

给予:

[('A', 'G', 'A'), ('A', 'C', 'T'), ('T', 'C', 'C'), ('C', 'T', 'A')]

因此,在这里,我计算()中字母的频率: 我该怎么办?

So, here, I calculate the frequency of letters in (): How can I do this ?

如果我处理序列文件,如何使用此代码将其应用于文件序列? 例如,每当我采用三个序列并应用此代码时,我的文件包含100个序列.

if I work on a file of sequences, how can I use this code to apply it on the sequences of file? for example my file contains 100 sequences each time I take three sequences and apply this code.

推荐答案

与我对上一个问题的回答一样,您应该将功能包装在一个函数中:

As with my answer to your last question, you should wrap your functionality in a function:

def lettercount(pos):
    return {c: pos.count(c) for c in pos}

然后,您可以轻松地将其从zip应用于元组:

Then you can easily apply it to the tuples from zip:

counts = [lettercount(t) for t in zip(seq1, seq2, seq3)]

或将其组合到现有循环中:

Or combine it into the existing loop:

...
counts = []
for position in zip(seq1, seq2, seq3): # sets at same position
    counts.append(lettercount(position))
    for pair in combinations(position, 2): # pairs within set
        ...

这篇关于python列中字母的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆