正则表达式删除新行直到特定字符 [英] Regex to remove new lines up to a specific character
问题描述
我在以下格式的文件中有一系列字符串:
I have a series of strings in a file of the format:
>HEADER_Text1
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
我正在尝试找到一个正则表达式模式,该模式将删除下一个 >
字符之间 >
字符下方的换行符.所以最终的结果应该是这样的:
I am trying to find a regex pattern which will remove the new line characters below the >
character in between the next >
character. So the final result would look like:
>HEADER_Text1
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
有谁知道我如何想出一个正则表达式来做到这一点?
Does anyone know how I can come up with a regex pattern to do this?
旁注:这种格式在计算科学中作为 FASTA 格式很常见.
Side note: This format is common in computational science as a FASTA format.
谢谢!
推荐答案
如评论中所述,最好的办法是使用现有的 FASTA 解析器.为什么不呢?
As noted in the comments, your best bet is to use an existing FASTA parser. Why not?
以下是我如何根据前导大于号连接行:
Here's how I would join lines based on the leading greater-than:
def joinup(f):
buf = []
for line in f:
if line.startswith('>'):
if buf:
yield " ".join(buf)
yield line.rstrip()
buf = []
else:
buf.append(line.rstrip())
yield " ".join(buf)
for joined_line in joinup(open("...")):
# blah blah...
这篇关于正则表达式删除新行直到特定字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!