python正则表达式匹配包括新行的完整段落 [英] python regex match full paragraph including new line
本文介绍了python正则表达式匹配包括新行的完整段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个文本文件,从中我想匹配整个段落块,但我当前的正则表达式无法匹配包括新行在内的完整段落.
I've a text file, from that I want to match the full paragraph block but my current regex doesn't work to match full paragraph including the new line.
文本示例:
NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão
OTHER TEXT GOES HERE
....................
020007/002832/2020.
EXONERAR DOUGLAS ALVES BORRHER do cargo em comissão
OTHER TEXT GOES HERE
....................
020007/002832/2020.
NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o cargo
OTHER TEXT GOES HERE
....................
020007/002832/2020.
从上面的文本块我想匹配以单词NOMEAR
From the above text block I want to match the full paragraph starting with word NOMEAR
NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão
OTHER TEXT GOES HERE
....................
020007/002832/2020.
NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o cargo
OTHER TEXT GOES HERE
....................
020007/002832/2020.
我的尝试
import re
pattern = re.compile("NOMEAR (.*)", re.DOTALL)
for i, line in enumerate(open('pdf_text_tika.txt')):
for match in re.finditer(pattern, line):
print ('Found on line %s: %s' % (i+1, match.group()))
输出:
在 1305 行找到:NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão
Found on line 1305: NOMEAR JOSIAS CARLOS BORRHER do cargo em comissão
在 1316 行找到:NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o货物
Found on line 1316: NOMEAR RAFAEL DOS SANTOS PASSAGEM para exercer o cargo
推荐答案
您可以使用 MULTILINE
模式使用这个更简单的正则表达式:
You may use this simpler regex using MULTILINE
mode:
^NOMEAR.+(?:\n.+)*
在蟒蛇中:
import re
pattern = re.compile(r'^NOMEAR.+(?:\n.+)*', re.MULTILINE)
with open('pdf_text_tika.txt', 'r') as file:
data = file.read()
print (pattern.findall(data))
这篇关于python正则表达式匹配包括新行的完整段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文