有人可以解释Python结构解压缩吗? [英] Can someone explain Python struct unpacking?
问题描述
我有一个由C结构制成的二进制文件,该文件要在Python中解析.我知道二进制文件的确切格式和布局,但是我对如何使用Python Struct解压缩来读取此数据感到困惑.
I have a binary file made from C structs that I want to parse in Python. I know the exact format and layout of the binary but I am confused on how to use Python Struct unpacking to read this data.
基于结构的成员是什么,我是否必须遍历整个二进制文件一次解压缩一定数量的字节?
Would I have to traverse the whole binary unpacking a certain number of bytes at a time based on what the members of the struct are?
C文件格式:
typedef struct {
int data1;
int data2;
int data4;
} datanums;
typedef struct {
datanums numbers;
char *name;
} personal_data;
让我们说一下,该二进制文件具有一个又一个的重复的personal_data结构.
Lets say the binary file had personal_data structs repeatedly after another.
推荐答案
假定布局是静态二进制结构,可以通过简单的 struct
模式描述,并且文件就是重复的结构一遍又一遍,然后是的,一次遍历整个二进制文件解压缩一定数量的字节"正是您要做的.
Assuming the layout is a static binary structure that can be described by a simple struct
pattern, and the file is just that structure repeated over and over again, then yes, "traverse the whole binary unpacking a certain number of bytes at a time" is exactly what you'd do.
例如:
record = struct.Struct('>HB10cL')
with open('myfile.bin', 'rb') as f:
while True:
buf = f.read(record.size)
if not buf:
break
yield record.unpack(buf)
如果您担心一次只读取17个字节的效率,并且想一次一次地缓存8K来封装它,那么……首先,请确保这是一个值得优化的实际问题;然后,如果不是,则循环遍历 unpack_from
而不是 unpack
.像这样的东西(未经测试的,我的头顶代码):
If you're worried about the efficiency of only reading 17 bytes at a time and you want to wrap that up by buffering 8K at a time or something… well, first make sure it's an actual problem worth optimizing; then, if it is, loop over unpack_from
instead of unpack
. Something like this (untested, top-of-my-head code):
buf, offset = b'', 0
with open('myfile.bin', 'rb') as f:
if len(buf) < record.size:
buf, offset = buf[offset:] + f.read(8192), 0
if not buf:
break
yield record.unpack_from(buf, offset)
offset += record.size
或者,甚至更简单,只要文件对于您的vmsize而言不是太大,只需对整个文件进行 mmap
并在 mmap
本身:
Or, even simpler, as long as the file isn't too big for your vmsize, just mmap
the whole thing and unpack_from
on the mmap
itself:
with open('myfile.bin', 'rb') as f:
with mmap.mmap(f, 0, access=mmap.ACCESS_READ) as m:
for offset in range(0, m.size(), record.size):
yield record.unpack_from(m, offset)
这篇关于有人可以解释Python结构解压缩吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!