将 FASTA 文件中的多个序列添加到 python 中的列表 [英] Add multiple sequences from a FASTA file to a list in python
问题描述
我正在尝试使用多个序列来组织文件.这样做时,我试图将名称添加到列表中,并将序列添加到与名称列表平行的单独列表中.我想出了如何将名称添加到列表中,但我无法弄清楚如何将其后面的序列添加到单独的列表中.我尝试将序列行附加到一个空字符串中,但它将所有序列的所有行附加到一个字符串中.
I'm trying to organize file with multiple sequences . In doing so, I'm trying to add the names to a list and add the sequences to a separate list that is parallel with the name list . I figured out how to add the names to a list but I can't figure out how to add the sequences that follow it into separate lists . I tried appending the lines of sequence into an empty string but it appended all the lines of all the sequences into a single string .
所有名字都以'>'开头
all the names start with a '>'
def Name_Organizer(FASTA,output):
import os
import re
in_file=open(FASTA,'r')
dir,file=os.path.split(FASTA)
temp = os.path.join(dir,output)
out_file=open(temp,'w')
data=''
name_list=[]
for line in in_file:
line=line.strip()
for i in line:
if i=='>':
name_list.append(line)
break
else:
line=line.upper()
if all([k==k.upper() for k in line]):
data=data+line
print data
如何将序列作为一组字符串添加到列表中?
输入文件看起来像这样
推荐答案
当你点击标记线时,你需要重置字符串,像这样:
You need to reset the string when you hit marker lines, like this:
def Name_Organizer(FASTA,output):
import os
import re
in_file=open(FASTA,'r')
dir,file=os.path.split(FASTA)
temp = os.path.join(dir,output)
out_file=open(temp,'w')
data=''
name_list=[]
seq_list=[]
for line in in_file:
line=line.strip()
for i in line:
if i=='>':
name_list.append(line)
if data:
seq_list.append(data)
data=''
break
else:
line=line.upper()
if all([k==k.upper() for k in line]):
data=data+line
print seq_list
当然,使用字符串连接而不是连续附加可能会更快(取决于您的文件有多大):
Of course, it might also be faster (depending on how large your files are) to use string joining rather than continually appending:
data = []
# ...
data.append(line) # repeatedly
# ...
seq_list.append(''.join(data)) # each time you get to a new marker line
data = []
这篇关于将 FASTA 文件中的多个序列添加到 python 中的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!