从FASTA文件添加多个序列在Python列表 [英] Add multiple sequences from a FASTA file to a list in python

查看:158
本文介绍了从FASTA文件添加多个序列在Python列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图来组织文件,多个序列。这样做,我想的名字添加到名单和顺序添加到一个单独的列表是与名单并行。我想通了如何将名称添加到列表中,但我无法弄清楚如何添加它后面为单独列出序列。我试图序列行追加到一个空字符串,但它附加的所有序列的所有行成一个字符串。

I'm trying to organize file with multiple sequences . In doing so, I'm trying to add the names to a list and add the sequences to a separate list that is parallel with the name list . I figured out how to add the names to a list but I can't figure out how to add the sequences that follow it into separate lists . I tried appending the lines of sequence into an empty string but it appended all the lines of all the sequences into a single string .

所有的名字开始与>

def Name_Organizer(FASTA,output):

    import os
    import re

    in_file=open(FASTA,'r')
    dir,file=os.path.split(FASTA)
    temp = os.path.join(dir,output)
    out_file=open(temp,'w')

    data=''
    name_list=[]

    for line in in_file:

        line=line.strip()
        for i in line:
            if i=='>':
                name_list.append(line)
                break
            else:
                line=line.upper()
        if all([k==k.upper() for k in line]):
            data=data+line

    print data

如何序列添加到列表中的一组字符串?

输入文件看起来像这样

推荐答案

您需要的时候你打标记线,像这样的重置字符串:

You need to reset the string when you hit marker lines, like this:

def Name_Organizer(FASTA,output):

    import os
    import re

    in_file=open(FASTA,'r')
    dir,file=os.path.split(FASTA)
    temp = os.path.join(dir,output)
    out_file=open(temp,'w')

    data=''
    name_list=[]
    seq_list=[]

    for line in in_file:

        line=line.strip()
        for i in line:
            if i=='>':
                name_list.append(line)
                if data:
                    seq_list.append(data)
                    data=''
                break
            else:
                line=line.upper()
        if all([k==k.upper() for k in line]):
            data=data+line

    print seq_list

当然,它也可能会更快(取决于你的文件有多大的)来使用字符串连接,而不是不断地追加:

Of course, it might also be faster (depending on how large your files are) to use string joining rather than continually appending:

data = []

# ...

data.append(line) # repeatedly

# ...

seq_list.append(''.join(data)) # each time you get to a new marker line
data = []

这篇关于从FASTA文件添加多个序列在Python列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆