从文件读取和存储任意字节长度的整数 [英] Reading and storing arbitrary byte length integers from a file

查看:103
本文介绍了从文件读取和存储任意字节长度的整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过在numpy中进行解析/数据累积来加快我去年编写的二进制文件解析器的速度. numpy定义自定义数据结构并将数据从二进制文件导入其中的能力看起来像我所需要的,除了这些文件中的某些字段是非标准"长度(例如6字节)的无符号整数.由于我使用的是Python 2.7,所以我制作了自己的int.from_bytes仿真版本来处理这些字段,但是如果有任何方法可以将这些字段读取为numpy的本机整数,那显然会更快,更可取.

I am attempting to speed up a binary file parser I wrote last year by doing the parsing/data accumulation in numpy. numpy's ability to define customized data structures and slurp data from a binary file into them looks like what I need, except some of the fields in these files are unsigned integers of "nonstandard" length (e.g. 6 bytes). Since I am using Python 2.7, I made my own emulated version of int.from_bytes to handle these fields, but if there is any way to read these fields to integers natively in numpy, that would obviously be much faster and preferable.

推荐答案

Numpy不支持任意字节长度的整数,并且使用ctypes位域将比其价值更大.

Numpy doesn't support arbitrary-bytelength integers, and using ctypes bitfields would be more trouble than it's worth.

我建议使用矢量化切片将数据转换为更高标准尺寸的整数:

I'd suggest using vectorised slicing to convert your data to the next-higher standard size integer:

buf = "000000111111222222"
a = np.ndarray(len(buf), np.dtype('>i1'), buf)
e = np.zeros(len(buf) / 6, np.dtype('>i8'))
for i in range(3):
    e.view(dtype='>i2')[i + 1::4] = a.view(dtype='>i2')[i::3]
[hex(x) for x in e]

这篇关于从文件读取和存储任意字节长度的整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆