Python中读取大文件的懒惰方法? [英] Lazy Method for Reading Big File in Python?

查看:29
本文介绍了Python中读取大文件的懒惰方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 4GB 的非常大的文件,当我尝试读取它时,我的计算机挂了.所以我想一块一块地读取它,并在处理完每一块后将处理过的一块存储到另一个文件中并读取下一块.

I have a very big file 4GB and when I try to read it my computer hangs. So I want to read it piece by piece and after processing each piece store the processed piece into another file and read next piece.

有没有什么方法可以yield这些碎片?

Is there any method to yield these pieces ?

我希望有一个懒惰的方法.

推荐答案

要编写惰性函数,只需使用 产量:

To write a lazy function, just use yield:

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        process_data(piece)

<小时>

另一种选择是使用 iter和一个辅助函数:


Another option would be to use iter and a helper function:

f = open('really_big_file.dat')
def read1k():
    return f.read(1024)

for piece in iter(read1k, ''):
    process_data(piece)

<小时>

如果文件是基于行的,则文件对象已经是行的惰性生成器:


If the file is line-based, the file object is already a lazy generator of lines:

for line in open('really_big_file.dat'):
    process_data(line)

这篇关于Python中读取大文件的懒惰方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆