使用生成器（python）解析fasta文件 [英] parsing a fasta file using a generator ( python )

查看：227 发布时间：2017/10/26 20:56:37 python file parsing fasta

本文介绍了使用生成器（python）解析fasta文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解析一个大的fasta文件，我遇到了内存不足的错误。一些建议，以改善数据处理将不胜感激。目前，程序正确打印出名称，但部分通过该文件获得了MemoryError

I am trying to parse a large fasta file and I am encountering out of memory errors. Some suggestions to improve the data handling would be appreciated. Currently the program correctly prints out the names however partially through the file I get a MemoryError

这是发生器

def readFastaEntry( fp ):
    name = ""
    seq = ""
    for line in fp:
        if line.startswith( ">" ):
            tmp = []
            tmp.append( name )
            tmp.append( seq )
            name = line
            seq = ""
            yield tmp
        else:
            seq = seq.join( line )

这里是调用者存根更多将被添加在这部分工作之后

and here is the caller stub more will be added after this part works

fp = open( sys.argv[1], 'r' )

for seq in readFastaEntry( fp ) :
    print seq[0]

对于那些与fasta格式不相似的人，这里是一个例子

For those not fimilar with the fasta format here is an example

>1 (PB2)
AATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCTCGCACTCGCGAGATAC
TCACCAAAACCACTGTGGACCACATGGCCATAATCAAAAAGTACACATCAGGAAGGCAAGAGAAGAACCC
TGCACTCAGGATGAAGTGGATGATG
>2 (PB1)
AACCATTTGAATGGATGTCAATCCGACTTTACTTTTCTTGAAAGTTCCAGCGCAAAATGCCATAAGCACC
ACATTTCCCTATACTGGAGACCCTCC

每个条目以>开头，说明名称等，接下来的N行是数据。没有定义的数据结尾，除了开头有>的下一行之外。

each entry starts with a ">" stating the name etc then the next N lines are data. There is no defined ending of the data other than the next line having a ">" at the beginning.

推荐答案

你考虑过使用 BioPython 。他们有一个可以读取fasta文件的序列阅读器。如果您有兴趣自己编码，您可以查看 BioPython的代码< a>。

Have you considered using BioPython. They have a sequence reader that can read fasta files. And if you are interested in coding one yourself, you can take a look at BioPython's code.

修改：添加代码

def read_fasta(fp):
    name, seq = None, []
    for line in fp:
        line = line.rstrip()
        if line.startswith(">"):
            if name: yield (name, ''.join(seq))
            name, seq = line, []
        else:
            seq.append(line)
    if name: yield (name, ''.join(seq))

with open('f.fasta') as fp:
    for name, seq in read_fasta(fp):
        print(name, seq)

这篇关于使用生成器（python）解析fasta文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用生成器（python）解析fasta文件 [英] parsing a fasta file using a generator ( python )

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用生成器（python）解析fasta文件 [英] parsing a fasta file using a generator ( python )

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭