Python巨大的文件读取 [英] Python huge file reading

查看:48
本文介绍了Python巨大的文件读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Python脚本逐行读取一个大数据文件(约200GB).

I need to read a big datafile (~200GB) , line by line using a Python script.

我尝试了常规的逐行方法,但是这些方法占用大量内存.我希望能够逐块读取文件.

I have tried the regular line by line methods, However those methods use a large amount of memory. I want to be able to read the file chunk by chunk.

是否有更好的方法逐行加载大文件

Is there a better way to load a large file line by line, say

a)通过明确提及文件在内存中任何一次可以加载的最大行数?或者b)通过按大小(例如1024个字节)的块进行加载,前提是该块的最后一行完全加载而不会被截断?

a) by explicitly mentioning the maximum number of lines the file could load at any one time in memory ? Or b) by loading it by chunks of size, say, 1024 bytes, provided the last line of the said chunk loads completely without being truncated?

推荐答案

而不是一次全部阅读,而是尝试逐行阅读:

Instead of reading it all at once, try reading it line by line:

with open("myFile.txt") as f:
    for line in f:
        #Do stuff with your line

或者,如果您想一次读入N行:

Or, if you want to read N lines in at a time:

with open("myFile.txt") as myfile:
    head = [next(myfile) for x in xrange(N)]
    print head

要处理由于打到文件末尾而产生的 StopIteration 错误,这是一个简单的 try/catch (尽管有很多方法).

To handle the StopIteration error that comes from hitting the end of the file, it's a simple try/catch (although there are plenty of ways).

try:
    head = [next(myfile) for x in xrange(N)]
except StopIteration:
    rest_of_lines = [line for line in myfile]

或者您可以根据需要阅读最后几行.

Or you can read those last lines in however you want.

这篇关于Python巨大的文件读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆