Python:struct和array与ctypes类似的功能 [英] Python: Similar functionality in struct and array vs ctypes
问题描述
Python提供了以下三个模块来处理C类型以及如何处理它们:
Python provides the following three modules that deal with C types and how to handle them:
虽然 ctypes
似乎比 struct
和 array
更通用,更灵活(其主要任务是"Python的外部函数库"),当任务是读取二进制数据结构时,这三个模块之间在功能上似乎有很大的重叠.例如,如果我想读取C结构
While ctypes
seems more general and flexible (its main task being "a foreign function library for Python") than struct
and array
, there seems to be significant overlap in functionality between these three modules when the task is to read binary data structures. For example, if I wanted to read a C struct
struct MyStruct {
int a;
float b;
char c[12];
};
我可以按如下方式使用 struct
:
I could use struct
as follows:
a, b, c = struct.unpack('if12s', b'\x11\0\0\0\x12\x34\x56\x78hello world\0')
print(a, b, c)
# 17 1.7378244361449504e+34 b'hello world\x00'
另一方面,使用 ctypes
效果同样好(尽管有些冗长):
On the other hand, using ctypes
works equally well (although a bit more verbose):
class MyStruct(ctypes.Structure):
_fields_ = [
('a', ctypes.c_int),
('b', ctypes.c_float),
('c', ctypes.c_char * 12)
]
s = MyStruct.from_buffer_copy(b'\x11\0\0\0\x12\x34\x56\x78hello world\0')
print(s.a, s.b, s.c)
# 17 1.7378244361449504e+34 b'hello world'
(此外:我确实不知道结尾的'\ 0'
在此版本中的位置…)
(Aside: I do wonder where the trailing '\0'
went in this version, though…)
在我看来,这似乎违反了《 The Zen of Python》中的原则:
This seems to me like it violates the principles in "The Zen of Python":
- 应该有一种(最好只有一种)明显的方式.
那么,使用类似的几个模块进行二进制数据处理时,这种情况是如何产生的呢?有历史或实际原因吗?(例如,我可以想象完全省略 struct
模块,而只是添加一个更方便的API来将C结构读/写到 ctypes
.)
So how did this situation with several of these similar modules for binary data handling arise? Is there a historical or practical reason? (For example, I could imagine omitting the struct
module entirely and simply adding a more convenient API for reading/writing C structs to ctypes
.)
推荐答案
免责声明:这篇推测是基于我对Python stdlib中分工"的理解,而不是基于事实的可参考信息.
Disclaimer: this post is speculation based on my understanding of the "division of labor" in Python stdlib, not on factual referenceable info.
您的问题源于以下事实:"C结构"和二进制数据"往往可以互换使用,尽管在实践中是正确的,但从技术意义上讲是错误的. struct
文档也具有误导性:它声称可以在"C structs"上工作,而更好的描述是"binary data",其中有一些关于C兼容性的免责声明.
Your question stems from the fact that "C structs" and "binary data" tend to be used interchangeably, which, while correct in practice, is wrong in a technical sense. The struct
documentation is also misleading: it claims to work on "C structs", while a better description would be "binary data", with some disclaimers about C compatibility.
从根本上说, struct
, array
和 ctypes
做不同的事情. struct
处理将Python值转换为二进制内存格式. array
处理有效存储大量值的问题. ctypes
处理C语言 (*).功能上的重叠源于这样的事实:对于C,二进制内存格式"是本机的,并且有效地存储值" 将它们包装到C形数组中.
Fundamentally, struct
, array
and ctypes
do different things. struct
deals with converting Python values into binary in-memory formats. array
deals with efficiently storing a lot of values. ctypes
deals with the C language(*). The overlap in functionality stems from the fact that for C, the "binary in-memory formats" are native, and that "efficiently storing values" is packing them into a C-like array.
您还将注意到, struct
可让您轻松指定字节顺序,因为它以多种可以打包的方式处理二进制数据的打包和拆包.而在 ctypes
中,获取非本地字节顺序会更加困难,因为它使用的是C 固有的字节顺序.
You will also note that struct
lets you easily specify endianness, because it deals with packing and unpacking binary data in many different ways it can be packed; while in ctypes
it is more difficult to get non-native byte order, because it uses the byte order that is native to C.
如果您的任务是读取二进制数据结构,则抽象级别不断提高:
If your task is reading binary data structures, there's increasing levels of abstraction:
- 手动拆分字节数组,并使用
int.from_bytes
之类的东西 进行转换 - 使用格式字符串描述数据,并使用
struct
一次性解压缩 - 使用类似 Construct 之类的库,以逻辑方式声明性地描述该结构.
- Manually splitting the byte array and converting parts with
int.from_bytes
and the like - Describing the data with a format string and using
struct
to unpack in one go - Using a library like Construct to describe the structure declaratively in logical terms.
ctypes
甚至在这里都没有用,因为对于此任务,使用 ctypes
几乎要遍历不同的编程语言.它对您的示例同样有效的事实是偶然的;它之所以起作用,是因为C本身就适合于表达打包二进制数据的许多方式.但是,例如,如果您的结构是混合字节序的,则很难用 ctypes
表示.另一个示例是没有C等效项的半精度浮点数(请参见这里).
ctypes
don't even figure here, because for this task, using ctypes
is pretty much taking a round-trip through a different programming language. The fact that it works just as well for your example is incidental; it works because C is natively suited to expressing many ways of packing binary data. But if your struct was mixed-endian, for instance, it would be very difficult to express in ctypes
. Another example is half-precision float which doesn't have a C equivalent (see here).
从这个意义上讲, ctypes
使用 struct
是非常合理的-毕竟,打包和解压缩二进制数据"是与C接口"的子任务.
In this sense, it's also very reasonable that ctypes
use struct
- after all, "packing and unpacking binary data" is a subtask of "interfacing with C".
另一方面,对于 struct
使用 ctypes
是没有意义的:就像使用 email
库来存储字符对转换进行编码,因为这是电子邮件库可以完成的任务.
On the other hand, it would make no sense for struct
to use ctypes
: it would be like using the email
library for character encoding conversions because it's a task that an e-mail library can do.
(*)好,基本上.更精确的是类似基于C的环境"的东西,即现代计算机由于与C作为主要系统语言的共同进化而在低级工作.
(*) well, basically. More precise would be something like "C-based environments", i.e., how modern computers work on low level due to co-evolution with C as the primary systems language.
这篇关于Python:struct和array与ctypes类似的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!