将 FASTA 文件中的多个序列添加到 python 中的列表 [英] Add multiple sequences from a FASTA file to a list in python

查看:32
本文介绍了将 FASTA 文件中的多个序列添加到 python 中的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用多个序列来组织文件.这样做时,我试图将名称添加到列表中,并将序列添加到与名称列表平行的单独列表中.我想出了如何将名称添加到列表中,但我无法弄清楚如何将其后面的序列添加到单独的列表中.我尝试将序列行附加到一个空字符串中,但它将所有序列的所有行附加到一个字符串中.

I'm trying to organize file with multiple sequences . In doing so, I'm trying to add the names to a list and add the sequences to a separate list that is parallel with the name list . I figured out how to add the names to a list but I can't figure out how to add the sequences that follow it into separate lists . I tried appending the lines of sequence into an empty string but it appended all the lines of all the sequences into a single string .

所有名字都以'>'开头

all the names start with a '>'

def Name_Organizer(FASTA,output):

    import os
    import re

    in_file=open(FASTA,'r')
    dir,file=os.path.split(FASTA)
    temp = os.path.join(dir,output)
    out_file=open(temp,'w')

    data=''
    name_list=[]

    for line in in_file:

        line=line.strip()
        for i in line:
            if i=='>':
                name_list.append(line)
                break
            else:
                line=line.upper()
        if all([k==k.upper() for k in line]):
            data=data+line

    print data

如何将序列作为一组字符串添加到列表中?

输入文件看起来像这样

推荐答案

当你点击标记线时,你需要重置字符串,像这样:

You need to reset the string when you hit marker lines, like this:

def Name_Organizer(FASTA,output):

    import os
    import re

    in_file=open(FASTA,'r')
    dir,file=os.path.split(FASTA)
    temp = os.path.join(dir,output)
    out_file=open(temp,'w')

    data=''
    name_list=[]
    seq_list=[]

    for line in in_file:

        line=line.strip()
        for i in line:
            if i=='>':
                name_list.append(line)
                if data:
                    seq_list.append(data)
                    data=''
                break
            else:
                line=line.upper()
        if all([k==k.upper() for k in line]):
            data=data+line

    print seq_list

当然,使用字符串连接而不是连续附加可能会更快(取决于您的文件有多大):

Of course, it might also be faster (depending on how large your files are) to use string joining rather than continually appending:

data = []

# ...

data.append(line) # repeatedly

# ...

seq_list.append(''.join(data)) # each time you get to a new marker line
data = []

这篇关于将 FASTA 文件中的多个序列添加到 python 中的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆