解压以 ASCIIZ 字符串结尾的结构体 [英] Unpacking a struct ending with an ASCIIZ string
问题描述
我正在尝试使用 struct.unpack()
来拆分以 ASCII 字符串结尾的数据记录.
该记录(恰好是 TomTom ov2 记录)具有以下格式(存储为小端):
- 1 字节
- 4 字节整数表示总记录大小(包括此字段)
- 4 字节整数
- 4 字节整数
- 可变长度字符串,以空字符结尾
unpack()
要求字符串的长度包含在您传递的格式中.我可以使用第二个字段和记录其余部分的已知大小(13 个字节)来获取字符串长度:
str_len = struct.unpack("
然后继续完全解包,但由于字符串以空字符结尾,我真的希望 unpack()
能为我做这件事.如果我遇到一个不包含其自身大小的结构,那么拥有它也会很好.
我怎样才能做到这一点?
实际上,无尺寸记录相当容易处理,因为 struct.calcsize()
会告诉你它期望的长度.您可以使用它和数据的实际长度为 unpack()
构造一个新的格式字符串,其中包含正确的字符串长度.
这个函数只是对 unpack()
的一个包装,允许在最后一个位置添加一个新的格式字符,将删除终端 NUL:
导入结构def unpack_with_final_asciiz(fmt, dat):"""解包二进制数据,最后处理以空字符结尾的字符串(并且仅在最后)自动.第一个参数 fmt 是一个 struct.unpack() 格式的字符串以下修改:如果 fmt 的最后一个字符是 'z',则返回的字符串将删除 NUL.如果是没有长度的's',则返回包含NUL的字符串.如果它是带有长度的 's',则行为与正常的 unpack() 相同."""# 如果不需要特殊行为,则直接传递如果 fmt[-1] 不在 ('z', 's') 或 (fmt[-1] == 's' and fmt[-2].isdigit()) 中:返回 struct.unpack(fmt, dat)# 使用格式字符串获取包含的字符串和剩余记录的大小non_str_len = struct.calcsize(fmt[:-1])str_len = len(dat) - non_str_len# 设置新的格式字符串# 如果传入 'z',则将终止 NUL 视为填充字节"如果 fmt[-1] == 'z':str_fmt = "{0}sx".format(str_len - 1)别的:str_fmt = "{0}s".format(str_len)new_fmt = fmt[:-1] + str_fmt返回 struct.unpack(new_fmt, dat)
<小时><预><代码>>>>dat = b'\x02\x1e\x00\x00\x00z\x8eJ\x00\xb1\x7f\x03\x00在河边\x00'>>>unpack_with_final_asciiz("<biiiz", dat)(2, 30, 4886138, 229297, b'在河边')
I am trying to use struct.unpack()
to take apart a data record that ends with an ASCII string.
The record (it happens to be a TomTom ov2 record) has this format (stored little-endian):
- 1 byte
- 4 byte int for total record size (including this field)
- 4 byte int
- 4 byte int
- variable-length string, null-terminated
unpack()
requires that the string's length be included in the format you pass it. I can use the second field and the known size of the rest of the record -- 13 bytes -- to get the string length:
str_len = struct.unpack("<xi", record[:5])[0] - 13
fmt = "<biii{0}s".format(str_len)
then proceed with the full unpacking, but since the string is null-terminated, I really wish unpack()
would do it for me. It'd also be nice to have this should I run across a struct that doesn't include its own size.
How can I make that happen?
The size-less record is fairly easy to handle, actually, since struct.calcsize()
will tell you the length it expects. You can use that and the actual length of the data to construct a new format string for unpack()
that includes the correct string length.
This function is just a wrapper for unpack()
, allowing a new format character in the last position that will drop the terminal NUL:
import struct
def unpack_with_final_asciiz(fmt, dat):
"""
Unpack binary data, handling a null-terminated string at the end
(and only at the end) automatically.
The first argument, fmt, is a struct.unpack() format string with the
following modfications:
If fmt's last character is 'z', the returned string will drop the NUL.
If it is 's' with no length, the string including NUL will be returned.
If it is 's' with a length, behavior is identical to normal unpack().
"""
# Just pass on if no special behavior is required
if fmt[-1] not in ('z', 's') or (fmt[-1] == 's' and fmt[-2].isdigit()):
return struct.unpack(fmt, dat)
# Use format string to get size of contained string and rest of record
non_str_len = struct.calcsize(fmt[:-1])
str_len = len(dat) - non_str_len
# Set up new format string
# If passed 'z', treat terminating NUL as a "pad byte"
if fmt[-1] == 'z':
str_fmt = "{0}sx".format(str_len - 1)
else:
str_fmt = "{0}s".format(str_len)
new_fmt = fmt[:-1] + str_fmt
return struct.unpack(new_fmt, dat)
>>> dat = b'\x02\x1e\x00\x00\x00z\x8eJ\x00\xb1\x7f\x03\x00Down by the river\x00'
>>> unpack_with_final_asciiz("<biiiz", dat)
(2, 30, 4886138, 229297, b'Down by the river')
这篇关于解压以 ASCIIZ 字符串结尾的结构体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!