在 Python 中从大文件中删除一行的最快方法 [英] Fastest Way to Delete a Line from Large File in Python

查看:35
本文介绍了在 Python 中从大文件中删除一行的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Linux 系统上处理一个非常大(~11GB)的文本文件.我正在通过一个程序运行它,该程序正在检查文件是否有错误.一旦发现错误,我需要修复该行或完全删除该行.然后重复...

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...

最终,一旦我对这个过程感到满意,我就会将其完全自动化.但是现在,让我们假设我正在手动运行它.

Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.

从这个大文件中删除特定行的最快(就执行时间而言)方法是什么?我想用 Python 做这件事……但会接受其他例子.该行可能位于文件中的任何位置.

What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.

如果是 Python,假设如下接口:

If Python, assume the following interface:

def removeLine(filename, lineno):

谢谢,

-aj

推荐答案

您可以同时为同一个文件创建两个文件对象(一个用于读取,一个用于写入):

You can have two file objects for the same file at the same time (one for reading, one for writing):

def removeLine(filename, lineno):
    fro = open(filename, "rb")

    current_line = 0
    while current_line < lineno:
        fro.readline()
        current_line += 1

    seekpoint = fro.tell()
    frw = open(filename, "r+b")
    frw.seek(seekpoint, 0)

    # read the line we want to discard
    fro.readline()

    # now move the rest of the lines in the file 
    # one line back 
    chars = fro.readline()
    while chars:
        frw.writelines(chars)
        chars = fro.readline()

    fro.close()
    frw.truncate()
    frw.close()

这篇关于在 Python 中从大文件中删除一行的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆