如何使用python在大文件中的两种模式之间grep线 [英] How to grep lines between two patterns in a big file with python

查看:70
本文介绍了如何使用python在大文件中的两种模式之间grep线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的文件,像这样:

I have a very big file, like this:


[PATTERN1]
line1
line2
line3 
...
...
[END PATTERN]
[PATTERN2]
line1 
line2
...
...
[END PATTERN]

我需要在另一个文件中提取可变启动器模式[PATTERN1]和另一个定义模式[END PATTERN]之间的行,仅适用于某些特定的启动器模式.
例如:

I need to extract in another file, lines between a variable starter pattern [PATTERN1] and another define pattern [END PATTERN], only for some specific starter pattern.
For example:

[PATTERN2]
line1 
line2
...
...
[END PATTERN]

我已经使用以下代码使用较小的文件执行了相同的操作:

I already do the same thing, with a smaller file, using this code:

FILE=open('myfile').readlines()

newfile=[]
for n in name_list:
    A = FILE[[s for s,name in enumerate(FILE) if n in name][0]:]
    B = A[:[e+1 for e,end in enumerate(A) if 'END PATTERN' in end][0]]
    newfile.append(B)

其中"name_list"是包含我需要的特定入门模式的列表.

Where 'name_list' is a list with the specific starter patterns that I need.

有效!!但是我想有一种更好的方法来处理大型文件,而无需使用.readlines()命令.
有人可以帮助我吗?

It works!! but I suppose there is a better way to do this working with big files, without using the .readlines() command.
Anyone can help me?

非常感谢!

推荐答案

使用类似

import re

START_PATTERN = '^START-PATTERN$'
END_PATTERN = '^END-PATTERN$'

with open('myfile') as file:
    match = False
    newfile = None

    for line in file:
        if re.match(START_PATTERN, line):
            match = True
            newfile = open('my_new_file.txt', 'w')
            continue
        elif re.match(END_PATTERN, line):
            match = False
            newfile.close()
            continue
        elif match:
            newfile.write(line)
            newfile.write('\n')

这将迭代文件而不将其全部读入内存.它还直接写入新文件,而不是追加到内存中的列表.如果您的来源足够大,也可能会成为问题.

This will iterate the file without reading it all into memory. It also writes directly to your new file, rather than appending to a list in memory. If your source is large enough that too may become an issue.

显然,您可能需要对此代码进行大量修改;也许不需要使用正则表达式模式来匹配开始/结束行,在这种情况下,请用if 'xyz' in line之类的替换.

Obviously there are numerous modifications you may need to make to this code; perhaps a regex pattern is not required to match a start/end line, in which case replace it with something like if 'xyz' in line.

这篇关于如何使用python在大文件中的两种模式之间grep线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆