将整个二进制文件读入Python [英] Reading an entire binary file into Python

查看:136
本文介绍了将整个二进制文件读入Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从Python导入一个二进制文件-内容是带符号的16位整数,大端.

I need to import a binary file from Python -- the contents are signed 16-bit integers, big endian.

以下堆栈溢出问题建议如何一次提取几个字节,但这是放大以读取整个文件的方法吗?

The following Stack Overflow questions suggest how to pull in several bytes at a time, but is this the way to scale up to read in a whole file?

在Python中接收16位整数

我想创建一个像这样的函数

I thought to create a function like:

from numpy import *
import os

def readmyfile(filename, bytes=2, endian='>h'):
    totalBytes = os.path.getsize(filename)
    values = empty(totalBytes/bytes)
    with open(filename, 'rb') as f:
        for i in range(len(values)):
            values[i] = struct.unpack(endian, f.read(bytes))[0]
    return values

filecontents = readmyfile('filename')

但这非常慢(文件为165924350字节).有更好的方法吗?

But this is quite slow (the file is 165924350 bytes). Is there a better way?

推荐答案

我将直接读取直到EOF(这意味着检查是否接收到空字符串),然后不再需要使用range()和getsize.
另外,使用xrange(而不是range)应该可以改善性能,尤其是对于内存使用.
而且,正如Falmarri建议的那样,同时读取更多数据将大大提高性能.

I would directly read until EOF (it means checking for receiving an empty string), removing then the need to use range() and getsize.
Alternatively, using xrange (instead of range) should improve things, especially for memory usage.
Moreover, as Falmarri suggested, reading more data at the same time would improve performance quite a lot.

也就是说,我不会指望奇迹,也是因为我不确定列表是存储所有数量数据的最有效方法.
使用NumPy的数组及其用于读取/写入二进制文件的工具怎么样?在该链接中,有一节介绍了如何使用numpyio.fread读取原始二进制文件.我相信这应该正是您所需要的.

That said, I would not expect miracles, also because I am not sure a list is the most efficient way to store all that amount of data.
What about using NumPy's Array, and its facilities to read/write binary files? In this link there is a section about reading raw binary files, using numpyio.fread. I believe this should be exactly what you need.

注意:就我个人而言,我从未使用过NumPy;但是,它的主要存在目的恰恰是处理大量数据-这就是您在问题中正在做的事情.

Note: personally, I have never used NumPy; however, its main raison d'etre is exactly handling of big sets of data - and this is what you are doing in your question.

这篇关于将整个二进制文件读入Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆