当它有多个分隔符时,如何在Python中列出列表? [英] How to make a list of lists in Python when it has multiple separators?

查看:115
本文介绍了当它有多个分隔符时,如何在Python中列出列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例文件如下所示(全部一行,为了便于阅读而包装):

The sample file looks like this (all on one line, wrapped for legibility):

 ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
  '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
  '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
  '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
  '$$$\n', '\n',
  '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
  '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
  '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
  '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
  '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
  '>B42\n', 'TT-GTGGGTATC\n']

$$$将两组分隔.我需要使用.strip函数并删除\n和所有标题".

The $$$ separates the two sets. I need to use .strip function and remove the \n and all the "headers".

我需要列出一个列表(如下所示),并用Z替换-"(再次全部放在一行上;为便于阅读,在此包装):

I need to make a list of lists (as below) and replace "-" with Z (again, all on one line; wrapped here for legibility):

  [['TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
    'TCCGTGGGTATC',CGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC'],
   ['ATCGGGGGTATT', 'TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC',
    'TTCGTGGGTATT', 'TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC',
    'TTCGGGGGTATC', 'TTCGGGGGTATC','TT-GTGGGTATC]]

推荐答案

这是Moses Koledoye答案的一种变体,它检查>的第一个字符,并丢弃所有匹配项以及任何空元素.我还包括将-"替换为"Z".

Here is a variation of Moses Koledoye's answer which examines the first character for > and discards any matches as well as any empty elements. I also included replacing "-" with "Z".

lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
   '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
   '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
   '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
   '$$$\n', '\n',
   '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
   '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
   '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
   '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
   '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
   '>B42\n', 'TT-GTGGGTATC\n']

result = [[]]
for x in lst:
    if x.startswith('>'):
        continue
    if x.startswith('$$$'):
        result.append([])
        continue
    x = x.strip()
    if x:
        result[-1].append(x.replace("-", "Z"))
print(result)

这避免了为任何元素的长度分配任何特殊的意义.

This avoids assigning any particular significance to the length of any element.

这篇关于当它有多个分隔符时,如何在Python中列出列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆