Python巨大的文件读取 [英] Python huge file reading
问题描述
我需要使用Python脚本逐行读取一个大数据文件(约200GB).
I need to read a big datafile (~200GB) , line by line using a Python script.
我尝试了常规的逐行方法,但是这些方法占用大量内存.我希望能够逐块读取文件.
I have tried the regular line by line methods, However those methods use a large amount of memory. I want to be able to read the file chunk by chunk.
是否有更好的方法逐行加载大文件
Is there a better way to load a large file line by line, say
a)通过明确提及文件在内存中任何一次可以加载的最大行数?或者b)通过按大小(例如1024个字节)的块进行加载,前提是该块的最后一行完全加载而不会被截断?
a) by explicitly mentioning the maximum number of lines the file could load at any one time in memory ? Or b) by loading it by chunks of size, say, 1024 bytes, provided the last line of the said chunk loads completely without being truncated?
推荐答案
而不是一次全部阅读,而是尝试逐行阅读:
Instead of reading it all at once, try reading it line by line:
with open("myFile.txt") as f:
for line in f:
#Do stuff with your line
或者,如果您想一次读入N行:
Or, if you want to read N lines in at a time:
with open("myFile.txt") as myfile:
head = [next(myfile) for x in xrange(N)]
print head
要处理由于打到文件末尾而产生的 StopIteration
错误,这是一个简单的 try/catch
(尽管有很多方法).
To handle the StopIteration
error that comes from hitting the end of the file, it's a simple try/catch
(although there are plenty of ways).
try:
head = [next(myfile) for x in xrange(N)]
except StopIteration:
rest_of_lines = [line for line in myfile]
或者您可以根据需要阅读最后几行.
Or you can read those last lines in however you want.
这篇关于Python巨大的文件读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!