计算文件中某个三联体的数量(DNA密码子分析) [英] count the number of a certain triplet in a file (DNA codon analysis)

查看:75
本文介绍了计算文件中某个三联体的数量(DNA密码子分析)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题实际上是用于DNA密码子分析的,简而言之,假设我有一个像这样的文件:
atgaaaccaaag ...
我想计算此文件中存在的"aaa"三元组的数量.重要的是,三元组从头开始(这意味着atg,aaa,cca,aag,...),因此在此示例中,结果应为1而不是2'aaa'.
有没有Python或Shellscript方法可以做到这一点?谢谢!

This question is actually for DNA codon analysis, to put it in a simple way, let's say I have a file like this:
atgaaaccaaag...
and I want to count the number of 'aaa' triplet present in this file. Importantly, the triplets start from the very beginning (which means atg,aaa,cca,aag,...) So the result should be 1 instead of 2 'aaa' in this example.
Is there any Python or Shellscript methods to do this? Thanks!

推荐答案

首先读入文件

with open("some.txt") as f:
    file_data = f.read()

然后将其分成3个

codons = [file_data[i:i+3] for i in range(0,len(file_data),3)]

然后计数em

print codons.count('aaa')

像这样

>>> my_codons = 'atgaaaccaaag'
>>> codons = [my_codons[i:i+3] for i in range(0,len(my_codons),3)]
>>> codons
['atg', 'aaa', 'cca', 'aag']
>>> codons.count('aaa')
1

这篇关于计算文件中某个三联体的数量(DNA密码子分析)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆