什么是阅读了大量的二进制文件蟒蛇最有效的方法 [英] What is the most efficient way to read a large binary file python

查看:114
本文介绍了什么是阅读了大量的二进制文件蟒蛇最有效的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我想读入内存,然后传递到透明地处理数据给我一个子程序的大型(21 GB的)文件。我就在CentOS 6.5蟒蛇2.6.6所以升级操作系统或Python是不是一种选择。目前,我使用

I have a large (21 GByte) file which I want to read into memory and then pass to a subroutine which processes the data transparently to me. I am on python 2.6.6 on Centos 6.5 so upgrading the operating system or python is not an option. Currently, I am using

f = open(image_filename, "rb")
image_file_contents=f.read()
f.close()
transparent_subroutine ( image_file_contents )

这是缓慢(〜15分钟)。在我开始阅读文件,我知道文件有多大,因为我打电话
    os.stat(IMAGE_FILENAME).st_size

which is slow (~15 minutes). Before I start reading the file, I know how big the file is, because I call os.stat( image_filename ).st_size

所以我可以pre-分配一些内存,如果这是有道理的。

so I could pre-allocate some memory if that made sense.

感谢您

推荐答案

要遵循迪特里希的建议下,我测量这个mmap的技术比一个大的读出速度20%为1.7GB输入文件

To follow Dietrich's suggestion, I measure this mmap technique is 20% faster than one big read for a 1.7GB input file

from zlib import adler32 as compute_cc

n_chunk = 1024**2
crc = 0
with open( fn ) as f:
  mm = mmap.mmap( f.fileno(), 0, prot = mmap.PROT_READ, flags = mmap.MAP_PRIVATE )
  while True:
    buf = mm.read( n_chunk )
    if not buf: break
    crc = compute_crc( buf, crc )
return crc

这篇关于什么是阅读了大量的二进制文件蟒蛇最有效的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆