在Python中从大文件删除行的最快方法 [英] Fastest Way to Delete a Line from Large File in Python

查看:92
本文介绍了在Python中从大文件删除行的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Linux系统上处理非常大的文本文件(〜11GB).我正在通过正在检查文件错误的程序来运行它.一旦发现错误,我需要修复该行或完全删除该行.然后重复...

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...

最终,一旦我适应了该过程,就将其完全自动化.不过,现在让我们假设我正在手动运行它.

Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.

从执行时间来看,从此大文件中删除特定行最快的方法是什么?我曾想过用Python来做...但是可以接受其他示例.该行可能在文件中的任何地方.

What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.

如果是Python,则采用以下接口:

If Python, assume the following interface:

def removeLine(filename, lineno):

谢谢

-aj

推荐答案

对于同一文件,您可以同时具有两个文件对象(一个用于读取,一个用于写入):

You can have two file objects for the same file at the same time (one for reading, one for writing):

def removeLine(filename, lineno):
    fro = open(filename, "rb")

    current_line = 0
    while current_line < lineno:
        fro.readline()
        current_line += 1

    seekpoint = fro.tell()
    frw = open(filename, "r+b")
    frw.seek(seekpoint, 0)

    # read the line we want to discard
    fro.readline()

    # now move the rest of the lines in the file 
    # one line back 
    chars = fro.readline()
    while chars:
        frw.writelines(chars)
        chars = fro.readline()

    fro.close()
    frw.truncate()
    frw.close()

这篇关于在Python中从大文件删除行的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆