重复提取文本文件中两个分隔符之间的一行,Python [英] Repeatedly extract a line between two delimiters in a text file, Python

查看:28
本文介绍了重复提取文本文件中两个分隔符之间的一行,Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个格式如下的文本文件:

I have a text file in the following format:

DELIMITER1
extract me
extract me
extract me
DELIMITER2

我想提取.txt文件中DELIMITER1和DELIMITER2之间的每个extract me

I'd like to extract every block of extract mes between DELIMITER1 and DELIMITER2 in the .txt file

这是我当前的非执行代码:

This is my current, non-performing code:

import re
def GetTheSentences(file):
     fileContents =  open(file)
     start_rx = re.compile('DELIMITER')
     end_rx = re.compile('DELIMITER2')

     line_iterator = iter(fileContents)
     start = False
     for line in line_iterator:
           if re.findall(start_rx, line):

                start = True
                break
      while start:
           next_line = next(line_iterator)
           if re.findall(end_rx, next_line):
                break

           print next_line

           continue
      line_iterator.next()

有什么想法吗?

推荐答案

您可以使用 re.S 将其简化为一个正则表达式,DOTALL 标志.

You can simplify this to one regular expression using re.S, the DOTALL flag.

import re
def GetTheSentences(infile):
     with open(infile) as fp:
         for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
             print result
# extract me
# extract me
# extract me

这也利用了非贪婪操作符.*?,所以会找到多个不重叠的DELIMITER1-DELIMITER2对块.

This also makes use of the non-greedy operator .*?, so multiple non-overlapping blocks of DELIMITER1-DELIMITER2 pairs will all be found.

这篇关于重复提取文本文件中两个分隔符之间的一行,Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆