python正则表达式匹配包括新行的完整段落 [英] python regex match full paragraph including new line

查看:30
本文介绍了python正则表达式匹配包括新行的完整段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,从中我想匹配整个段落块,但我当前的正则表达式无法匹配包括新行在内的完整段落.

I've a text file, from that I want to match the full paragraph block but my current regex doesn't work to match full paragraph including the new line.

文本示例:

NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão
OTHER TEXT GOES HERE
....................
020007/002832/2020.

EXONERAR DOUGLAS ALVES BORRHER do cargo em comissão
OTHER TEXT GOES HERE
....................
020007/002832/2020.

NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o cargo
OTHER TEXT GOES HERE
....................
020007/002832/2020.

从上面的文本块我想匹配以单词NOMEAR

From the above text block I want to match the full paragraph starting with word NOMEAR

NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão
OTHER TEXT GOES HERE
....................
020007/002832/2020.


NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o cargo
OTHER TEXT GOES HERE
....................
020007/002832/2020.

我的尝试

import re
pattern = re.compile("NOMEAR (.*)", re.DOTALL)

for i, line in enumerate(open('pdf_text_tika.txt')):
    for match in re.finditer(pattern, line):
        print ('Found on line %s: %s' % (i+1, match.group()))

输出:

在 1305 行找到:NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão

Found on line 1305: NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão

在 1316 行找到:NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o货物

Found on line 1316: NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o cargo

推荐答案

您可以使用 MULTILINE 模式使用这个更简单的正则表达式:

You may use this simpler regex using MULTILINE mode:

^NOMEAR.+(?:\n.+)*

在蟒蛇中:

import re

pattern = re.compile(r'^NOMEAR.+(?:\n.+)*', re.MULTILINE)

with open('pdf_text_tika.txt', 'r') as file:
    data = file.read()

print (pattern.findall(data))

RegEx 演示

这篇关于python正则表达式匹配包括新行的完整段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆