Python struct.pack() 行为 [英] Python struct.pack() behavior

查看:46
本文介绍了Python struct.pack() 行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data = 5结果 1 = struct.pack("

  1. 整数数据被转换为long(64位).即 01000000 00010100 00000000 00000000 00000000 00000000 00000000 00000000?
  2. 然后将这些位反转为字节并作为字节字符串存储在 Result1 中?即 00000000 00000000 00000000 00000000 00000000 00000000 00010100 01000000

这是该代码究竟发生了什么还是我误解了什么?

解决方案

来自 [Python 2.Docs]: struct - 将字节解释为打包的二进制数据:

<块引用>

此模块执行 Python 值和表示为 Python 字符串的 C 结构之间的转换.

这意味着它将把参数的内存表示打印为 char 序列.内存(以及驻留在其中的所有内容)是一个字节序列.每个字节都有一个值[0..255](为简单起见,我使用无符号).
所以,当它代表一个字节时,它会首先搜索一个字符,它的ASCII代码与字节值匹配,如果这样一个(printable) char 被找到,它将是那个字节的表示,否则表示将是字节值(十六进制)前面的\x(用于表示不可打印的 char 的约定).作为旁注,(非扩展)ASCII char 的值介于 0128 之间.

示例:

  • 65 的字节值(十六进制 0x41)将表示为A"(如AASCII代码是65>)

  • 217 的字节值(hex 0xd9)将简单地表示为\xd9"(此 ASCII 代码没有可打印的 char)

在进一步讨论之前,需要先介绍一下字节序:这就是数据(在我们的例子中是数字)在计算机内存中的表示方式.几个链接(虽然可以在互联网上找到很多资源):

我将尝试简要解释 biglittle endian 之间的区别(同样,为了简单起见,我将坚持使用 8 位原子 仅元素大小).

想象一下,我们正在一张纸上做一些记忆表征练习,或者更好:在黑板上.如果我们将黑板表示为计算机内存,那么左上角将是它的开始(地址0),并且随着我们向右移动地址会增加(当我们到达右边缘时也向下到下一行).
我们想将数字 0x12345678 表示为 4 字节数,从左上角开始(每个字节由恰好2个十六进制数字组成):

<块引用>

╔==========╦=========╦=========╦=========╦===========╗║ 字节 ║ 01 ║ 02 ║ 03 ║ 04 ║╠==========╬=========╬=========╬=========╬====╬=============╣║ 值 ║ 0x12 ║ 0x34 ║ 0x56 ║​​ 0x78 ║╚==========╩=========╩=========╩=========╩===========╝

我们的数字的最重要字节存储在最低内存地址(以及最低有效字节存储在最高),即大端.对于little endian,我们的数字字节顺序相反.

总而言之,人类认为big endianly".

我想讨论的另一个主题是:类型(更准确地说是int).Python,基于 C,继承了它的原生类型,所以 int 将有 4 个字节(在某些平台上/架构,它可能有 8).所以,int(再次讨论无符号)有一个值[0..4294967295].但即使对于较小的值:例如 5(仅需要 1 个字节),它仍然会占用 4 个字节:(最重要的) 未使用的字节将用 0 s 填充.因此,我们作为 4 字节 unsigned int 的数字将是 (hex):0x00000005.

现在,回到我们的问题:如上所述,50x05(或0x00000005 - 4 字节unsigned int)或字符em>:\x00\x00\x00\x05".但它的顺序与 struct.pack 显示的顺序相反;我想你已经猜到了原因:它是 little endian 表示.这是由 1st (fmt) 参数给出的(<"部分更准确)给予 [Python 2.Docs]: struct.pack(fmt, v1, v2, ...)(可能的值列在同一页面上:[Python 2.Docs]:结构 - 字节顺序、大小和对齐).对于 55555,情况是一样的.它的十六进制表示为:0xd9030x0000d903.

如果它还没有意义,请使用这个稍微修改过的代码版本并使用它,通过为 data_set 输入不同的值并查看输出:

code.py:

导入结构fmt = "

输出:

<块引用>

c:\Work\Dev\StackOverflow\q037990060>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe""代码.py"0x5 - \x05\x00\x00\x000xd903 - \x03\xd9\x00\x000x12345678 - xV4\x12

data = 5 
Result1 = struct.pack("<L", data)

  1. The integer data is converted to long (64bit). that is 01000000 00010100 00000000 00000000 00000000 00000000 00000000 00000000?
  2. Then the bits are reversed as bytes and stored in Result1 as byte strings?. that is 00000000 00000000 00000000 00000000 00000000 00000000 00010100 01000000

is this what exactly happens with that code or did I misunderstood anything?

解决方案

From [Python 2.Docs]: struct - Interpret bytes as packed binary data:

This module performs conversions between Python values and C structs represented as Python strings.

This means that it will print the memory representation of the argument(s) as char sequences. Memory (and everything that resides in it) is a sequence of bytes. Each byte has a value [0..255] (for simplicity's sake I use unsigned).
So, when it will represent a byte, it will first search for a char having the ASCII code matching the byte value, and if such a (printable) char is found, it will be the representation of that byte, otherwise the representation will be the byte value (in hex) preceded by \x (convention for representing non printable chars). As a side note, (non extended) ASCII chars have values between 0 and 128.

Example:

  • A byte value of 65 (hex 0x41) will be represented as 'A' (as A's ASCII code is 65)

  • A byte value of 217 (hex 0xd9) will be simply represented as '\xd9' (there's no printable char with this ASCII code)

Before going further, a few words are needed about endianness: that is the way how data (numbers in our case) is represented in computer memory. A couple of links (although many resources can be found on the internet):

I'll try to briefly explain the difference between big and little endian (again, for simplicity's sake I'll stick with the 8 bit atomic element size only).

Imagine we're doing some memory representation exercises on a piece of paper, or better: on a blackboard. If we were to represent the blackboard as the computer memory, then the upper left corner would be its beginning (address 0) and the addresses would increase as we go to the right (and also down below to the next line when we reach the right edge).
We want to represent the number 0x12345678 as a 4 byte number, starting from the upper left corner (each byte consists of exactly 2 hex digits):

╔═══════════╦══════════╦══════════╦══════════╦══════════╗
║   Byte    ║    01    ║    02    ║    03    ║    04    ║
╠═══════════╬══════════╬══════════╬══════════╬══════════╣
║   Value   ║   0x12   ║   0x34   ║   0x56   ║   0x78   ║
╚═══════════╩══════════╩══════════╩══════════╩══════════╝

Our number's most significant byte is stored at the lowest memory address (and the least significant byte is stored at the highest), which is big endian. For little endian, our number bytes are in reversed order.

As a conclusion, humans think "big endianly".

Another topic that I want to cover is: types (int to be more precise). Python, being C based, inherits its native types, so an int will have 4 bytes (on some platforms / architectures it might have 8). So, an int (again, talking about unsigned) has a value [0..4294967295]. But even for a smaller value: 5 for example (which only requires 1 byte), it will still occupy 4 bytes: the (most significant) unused bytes will be padded with 0s. So, our number as a 4 byte unsigned int will be (hex): 0x00000005.

Now, back to our problem(s): as stated above, 5 is 0x05 (or 0x00000005 - 4 byte unsigned int) or in chars: "\x00\x00\x00\x05". But it's in reversed order than what struct.pack displays; I think you already guessed why: it's in little endian representation. That is given by the 1st (fmt) argument ("<" part to be more precise) given to [Python 2.Docs]: struct.pack(fmt, v1, v2, ...) (possible values are listed on the same page: [Python 2.Docs]: struct - Byte Order, Size, and Alignment). For 55555, things are just the same. Its hex representation is: 0xd903 or 0x0000d903.

If it doesn't make sense yet, take this slightly modified version of your code and play with it, by entering different values for data_set and see the outputs:

code.py:

import struct
fmt = "<L"
data_set = [5, 55555, 0x12345678]

for data in data_set:
    output_str = "{} - {}".format(hex(data), repr(struct.pack(fmt, data)).strip("'"))  # This is just for formatting output string to be displayed to the user
    print(output_str)  # Python3 compatible (however the formatting above won't behave nicely)

Output:

c:\Work\Dev\StackOverflow\q037990060>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" "code.py"
0x5 - \x05\x00\x00\x00
0xd903 - \x03\xd9\x00\x00
0x12345678 - xV4\x12

这篇关于Python struct.pack() 行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆