如何计算Fasta文件中dna序列的熵 [英] how to calculate the entropy of a dna sequence in a fasta file
本文介绍了如何计算Fasta文件中dna序列的熵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要计算Fasta文件中从10000到11000的dna序列的熵 这就是我所知道的,但是我需要计算第10,000至11,000个碱基之间的序列的熵.
I need to calculate the entropy of a dna sequence in a fasta file, from the base 10000 to the base 11000 here is what I know, but I need to calculate the entropy of the sequence between the 10,000th to 11,000th base
from math import log
def logent(x):
if x<=0:
return 0
else:
return -x*log(x)
def entropy(lis):
return sum([logent(elem) for elem in lis])
for i in SeqIO.parse("hsvs.fasta", "fasta"):
lisfreq1=[i.seq.count(base)*1.0/len(i.seq) for base in ["A", "C","G","T"]]
entropy(lisfreq1)
推荐答案
Your sequence is just a string, you can therefore simply slice it, e.g.
seq_start = 10000
seq_end = 11000 + 1
for i in SeqIO.parse("hsvs.fasta", "fasta"):
sub_seq = i.seq[seq_start:seq_end]
lisfreq1=[sub_seq.count(base)*1.0/len(sub_seq) for base in ["A", "C","G","T"]]
这篇关于如何计算Fasta文件中dna序列的熵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文