从行列表中搜索单词(从单词列表中)并将值附加到新列表中. Python [英] Search for word (from list of words) in line (from list of lines) and append values to new list. Python

查看:105
本文介绍了从行列表中搜索单词(从单词列表中)并将值附加到新列表中. Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您有姓名列表. . .

If you had a list of names . . .

query = ['link','zelda','saria','ganon','volvagia']

和文件中的行列表

data = ['>link is the first','OIGFHFH','AGIUUIIUFG','>peach is the second',
'AGFDA','AFGDSGGGH','>luigi is the third','SAGSGFFG','AFGDFGDFG',
'DSGSFGAAA','>ganon is the fourth','ADGGHHHHHH','>volvagia is the last',
 'AFGDAAFGDA','ADFGAFD','ADFDFFDDFG','AHUUERR','>ness is another','ADFGGGGH',
'HHHDFDA']

您如何查看以'>'开头的所有行,然后如果它们具有名称name_list之一,则包括带有'>'的行以及其后的序列(以下序列始终为在上方)在两个单独的列表中

how would you be able to look at all lines that start with '>' and then if they have one of the names name_list then include the line with the '>' and also the sequences following it (sequences following will always be in upper) in two separate lists

#example output file

name_list = ['>link is the first','>ganon is the fourth','>volvagia is the last']
seq_list = ['OIGFHFHAGIUUIIUFG','ADGGHHHHHH','AFGDAAFGDAADFGAFDADFDFFDDFGAHUUERR']

我宁愿不使用字典来执行此操作,因为在类似情况下会提示我这样做

i would rather not use a dictionary to do this as i've been prompted to do in similar situations

所以我到目前为止有:

for line,name in zip(data,query):
    if bool(line[0] == '>' and re.search(name,line))==True:
        #but then i'm stuck because len(query) and len(data) are not equal

....任何帮助将不胜感激.''

.... any help would be greatly appreciated``

推荐答案

result = []
names = ['link', 'zelda', 'saria', 'ganon', 'volvagia']
lines = iter(data)
for line in lines:
    while line.startswith(">") and any(name in line for name in names):
        name = line
        upper_seq = []
        for line in lines:
            if not line.isupper():
                break
            upper_seq.append(line)
        else:
            line = "" # guard against infinite loop at EOF 

        result.append((name, ''.join(upper_seq)))

如果有很多名称,那么set()可能会更快地在行中查找名称,而不是any(...):

If there are many names then set() might be faster to find names in line instead of any(...):

names = set(names)
# ...
    if line.startswith(">") and names.intersection(line[1:].split()):
        # ...

结果

[('>link is the first', 'OIGFHFHAGIUUIIUFG'),
 ('>ganon is the fourth', 'ADGGHHHHHH'),
 ('>volvagia is the last', 'AFGDAAFGDAADFGAFDADFDFFDDFGAHUUERR')]

这篇关于从行列表中搜索单词(从单词列表中)并将值附加到新列表中. Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆