从行列表中搜索单词(从单词列表中)并将值附加到新列表中. Python [英] Search for word (from list of words) in line (from list of lines) and append values to new list. Python
问题描述
如果您有姓名列表. . .
If you had a list of names . . .
query = ['link','zelda','saria','ganon','volvagia']
和文件中的行列表
data = ['>link is the first','OIGFHFH','AGIUUIIUFG','>peach is the second',
'AGFDA','AFGDSGGGH','>luigi is the third','SAGSGFFG','AFGDFGDFG',
'DSGSFGAAA','>ganon is the fourth','ADGGHHHHHH','>volvagia is the last',
'AFGDAAFGDA','ADFGAFD','ADFDFFDDFG','AHUUERR','>ness is another','ADFGGGGH',
'HHHDFDA']
您如何查看以'>'开头的所有行,然后如果它们具有名称name_list之一,则包括带有'>'的行以及其后的序列(以下序列始终为在上方)在两个单独的列表中
how would you be able to look at all lines that start with '>' and then if they have one of the names name_list then include the line with the '>' and also the sequences following it (sequences following will always be in upper) in two separate lists
#example output file
name_list = ['>link is the first','>ganon is the fourth','>volvagia is the last']
seq_list = ['OIGFHFHAGIUUIIUFG','ADGGHHHHHH','AFGDAAFGDAADFGAFDADFDFFDDFGAHUUERR']
我宁愿不使用字典来执行此操作,因为在类似情况下会提示我这样做
i would rather not use a dictionary to do this as i've been prompted to do in similar situations
所以我到目前为止有:
for line,name in zip(data,query):
if bool(line[0] == '>' and re.search(name,line))==True:
#but then i'm stuck because len(query) and len(data) are not equal
....任何帮助将不胜感激.''
.... any help would be greatly appreciated``
推荐答案
result = []
names = ['link', 'zelda', 'saria', 'ganon', 'volvagia']
lines = iter(data)
for line in lines:
while line.startswith(">") and any(name in line for name in names):
name = line
upper_seq = []
for line in lines:
if not line.isupper():
break
upper_seq.append(line)
else:
line = "" # guard against infinite loop at EOF
result.append((name, ''.join(upper_seq)))
如果有很多名称,那么set()
可能会更快地在行中查找名称,而不是any(...)
:
If there are many names then set()
might be faster to find names in line instead of any(...)
:
names = set(names)
# ...
if line.startswith(">") and names.intersection(line[1:].split()):
# ...
结果
[('>link is the first', 'OIGFHFHAGIUUIIUFG'),
('>ganon is the fourth', 'ADGGHHHHHH'),
('>volvagia is the last', 'AFGDAAFGDAADFGAFDADFDFFDDFGAHUUERR')]
这篇关于从行列表中搜索单词(从单词列表中)并将值附加到新列表中. Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!