正则表达式删除新行直到特定字符 [英] Regex to remove new lines up to a specific character

查看：36 发布时间：2021/7/6 20:55:21 python regex fasta

本文介绍了正则表达式删除新行直到特定字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在以下格式的文件中有一系列字符串:

I have a series of strings in a file of the format:

>HEADER_Text1
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada

我正在尝试找到一个正则表达式模式，该模式将删除下一个 > 字符之间 > 字符下方的换行符.所以最终的结果应该是这样的:

I am trying to find a regex pattern which will remove the new line characters below the > character in between the next > character. So the final result would look like:

>HEADER_Text1
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada

有谁知道我如何想出一个正则表达式来做到这一点?

Does anyone know how I can come up with a regex pattern to do this?

旁注:这种格式在计算科学中作为 FASTA 格式很常见.

Side note: This format is common in computational science as a FASTA format.

谢谢！

推荐答案

如评论中所述，最好的办法是使用现有的 FASTA 解析器.为什么不呢?

As noted in the comments, your best bet is to use an existing FASTA parser. Why not?

以下是我如何根据前导大于号连接行:

Here's how I would join lines based on the leading greater-than:

def joinup(f):
    buf = []
    for line in f:
        if line.startswith('>'):
            if buf:
                yield " ".join(buf)
            yield line.rstrip()
            buf = []
        else:
            buf.append(line.rstrip())
    yield " ".join(buf)

for joined_line in joinup(open("...")):
    # blah blah...

这篇关于正则表达式删除新行直到特定字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式删除新行直到特定字符 [英] Regex to remove new lines up to a specific character

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

正则表达式删除新行直到特定字符 [英] Regex to remove new lines up to a specific character

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭