重复提取文本文件中两个分隔符之间的一行,Python [英] Repeatedly extract a line between two delimiters in a text file, Python
本文介绍了重复提取文本文件中两个分隔符之间的一行,Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个格式如下的文本文件:
I have a text file in the following format:
DELIMITER1
extract me
extract me
extract me
DELIMITER2
我想提取.txt文件中DELIMITER1和DELIMITER2之间的每个extract me
块
I'd like to extract every block of extract me
s between DELIMITER1 and DELIMITER2 in the .txt file
这是我当前的非执行代码:
This is my current, non-performing code:
import re
def GetTheSentences(file):
fileContents = open(file)
start_rx = re.compile('DELIMITER')
end_rx = re.compile('DELIMITER2')
line_iterator = iter(fileContents)
start = False
for line in line_iterator:
if re.findall(start_rx, line):
start = True
break
while start:
next_line = next(line_iterator)
if re.findall(end_rx, next_line):
break
print next_line
continue
line_iterator.next()
有什么想法吗?
推荐答案
您可以使用 re.S
将其简化为一个正则表达式,DOTALL 标志.
You can simplify this to one regular expression using re.S
, the DOTALL flag.
import re
def GetTheSentences(infile):
with open(infile) as fp:
for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
print result
# extract me
# extract me
# extract me
这也利用了非贪婪操作符.*?
,所以会找到多个不重叠的DELIMITER1-DELIMITER2对块.
This also makes use of the non-greedy operator .*?
, so multiple non-overlapping blocks of DELIMITER1-DELIMITER2 pairs will all be found.
这篇关于重复提取文本文件中两个分隔符之间的一行,Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文