匹配多行的Python正则表达式(re.DOTALL) [英] Python Regular Expression matching multiple lines (re.DOTALL)

查看：198 发布时间：2020/7/1 2:31:41 python regex multilinestring

本文介绍了匹配多行的Python正则表达式(re.DOTALL)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解析多行字符串.

I'm trying to parse a string with multiple lines.

假设是:

text = '''
Section1
stuff belonging to section1
stuff belonging to section1
stuff belonging to section1
Section2
stuff belonging to section2
stuff belonging to section2
stuff belonging to section2
'''

我想使用re模块的finditer方法来获得像这样的字典:

I want to use the finditer method of the re module to get a dictionary like:

{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to section1\nstuff belonging to section1\n'}
{'section': 'Section2', 'section_data': 'stuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2\n'}

我尝试了以下操作:

import re
re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+)", re.DOTALL)
sections_it = re_sections.finditer(text)

for m in sections_it:
    print m.groupdict()

但这会导致:

{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to    section1\nstuff belonging to section1\nSection2\nstuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2\n'}

因此section_data也匹配Section2.

So the section_data also matches Section2.

我还试图告诉第二组匹配第一个组以外的所有组.但这根本没有输出.

I also tried to tell the second group to match all but the first one. But this leads to no output at all.

re_sections=re.compile(r"(?P<section>Section\d)\s+(?P<section_data>^(?P=section))", re.DOTALL)

我知道我可以使用以下内容，但是我正在寻找一个版本，无需在此告诉第二组的外观.

I know I could use the following re, but I'm looking for a version, where I do not have to tell what the second group looks like.

re_sections=re.compile(r"(?P<section>Section\d)\s+(?P<section_data>[a-z12\s]+)", re.DOTALL)

非常感谢！

推荐答案

使用先行查找将所有匹配至下一节标题或字符串末尾:

Use a look-ahead to match everything up to the next section header, or the end of the string:

re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+?)(?=(?:Section\d|$))", re.DOTALL)

请注意，这也需要一个非贪婪的.+?，否则它仍然会一直匹配到最后.

Note that this needs a non-greedy .+? as well, otherwise it'll still match all the way to the end first.

演示:

>>> re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+?)(?=(?:Section\d|$))", re.DOTALL) >>> for m in re_sections.finditer(text): print m.groupdict() ... {'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to section1\nstuff belonging to section1\n'} {'section': 'Section2', 'section_data': 'stuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2'}

这篇关于匹配多行的Python正则表达式(re.DOTALL)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配多行的Python正则表达式(re.DOTALL) [英] Python Regular Expression matching multiple lines (re.DOTALL)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

匹配多行的Python正则表达式(re.DOTALL) [英] Python Regular Expression matching multiple lines (re.DOTALL)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭