Python字节数组在C表示中使用带符号整数吗? [英] Does Python bytearray use signed integers in the C representation?
问题描述
我已经编写了一个小的Cython工具,用于在Python中就位显示缓冲区协议的结构进行就地排序 .这是一项正在进行的工作;请原谅任何错误.这只是我要学习的.
I have written a small Cython tool for in-place sorting of structures exposing the buffer protocol in Python. It's a work in progress; please forgive any mistakes. This is just for me to learn.
在我的一组单元测试中,我正在跨多种不同类型的缓冲区公开数据结构测试就地排序,每种数据结构中都包含许多类型的基础数据.我可以验证它在大多数情况下都能正常工作,但是bytearray
的情况非常特殊.
In my set of unit tests, I am working on testing the in-place sort across many different kinds of buffer-exposing data structures, each with many types of underlying data contained in them. I can verify it is working as expected for most cases, but the case of bytearray
is very peculiar.
如果您认为我在下面的代码中导入的模块b
只是在Cython中执行了简单的堆排序,而在bytearray
上就位,那么以下代码示例将显示此问题:
If you take it for granted that my imported module b
in the code below is just performing a straightforward heap sort in Cython, in-place on the bytearray
, then the following code sample shows the issue:
In [42]: a #NumPy array
Out[42]: array([ 9, 148, 115, 208, 243, 197], dtype=uint8)
In [43]: byt = bytearray(a)
In [44]: byt
Out[44]: bytearray(b'\t\x94s\xd0\xf3\xc5')
In [45]: list(byt)
Out[45]: [9, 148, 115, 208, 243, 197]
In [46]: byt1 = copy.deepcopy(byt)
In [47]: b.heap_sort(byt1)
In [48]: list(byt1)
Out[48]: [148, 197, 208, 243, 9, 115]
In [49]: list(bytearray(sorted(byt)))
Out[49]: [9, 115, 148, 197, 208, 243]
您会看到,使用sorted
时,为了进行排序,将值迭代并像Python整数一样对待,然后放回新的bytearray
中.
What you can see is that when using sorted
, the values are iterated and treated like Python integers for the purpose of sorting, then placed back into a new bytearray
.
但是,在第47-48行的就地排序显示字节被解释为有符号整数,并按其2的补码值进行排序,将数字> = 128(因为它们为负数)向左移动.
But the in-place sort, in line 47-48 shows that the bytes are being interpreted as signed integers, and are sorted by their 2's complement value, putting number >= 128, since they are negative, towards the left.
我可以通过在0-255的整个范围内进行确认:
I can confirm it by running over the whole range 0-255:
In [50]: byt = bytearray(range(0,256))
In [51]: b.heap_sort(byt)
In [52]: list(byt)
Out[52]:
[128,
129,
130,
131,
132,
133,
134,
135,
136,
137,
138,
139,
140,
141,
142,
143,
144,
145,
146,
147,
148,
149,
150,
151,
152,
153,
154,
155,
156,
157,
158,
159,
160,
161,
162,
163,
164,
165,
166,
167,
168,
169,
170,
171,
172,
173,
174,
175,
176,
177,
178,
179,
180,
181,
182,
183,
184,
185,
186,
187,
188,
189,
190,
191,
192,
193,
194,
195,
196,
197,
198,
199,
200,
201,
202,
203,
204,
205,
206,
207,
208,
209,
210,
211,
212,
213,
214,
215,
216,
217,
218,
219,
220,
221,
222,
223,
224,
225,
226,
227,
228,
229,
230,
231,
232,
233,
234,
235,
236,
237,
238,
239,
240,
241,
242,
243,
244,
245,
246,
247,
248,
249,
250,
251,
252,
253,
254,
255,
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63,
64,
65,
66,
67,
68,
69,
70,
71,
72,
73,
74,
75,
76,
77,
78,
79,
80,
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93,
94,
95,
96,
97,
98,
99,
100,
101,
102,
103,
104,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
119,
120,
121,
122,
123,
124,
125,
126,
127]
我知道这很难复制.您可以根据需要使用Cython构建链接的程序包,然后按import src.buffersort as b
以获得与我正在使用的相同的排序功能.
I know this is difficult to reproduce. You can build the linked package with Cython if you want, and then import src.buffersort as b
to get the same sort functions I am using.
我尝试阅读Objects/bytearrayobject.c中bytearray
的源代码,但是我看到了对long
的一些引用和对PyInt_FromLong
的一些调用...
I've tried reading through the source code for bytearray
in Objects/bytearrayobject.c, but I see some references to long
and a few calls to PyInt_FromLong
...
这使我怀疑bytearray
的基础C级数据在C中表示为带符号的整数,但是从原始字节转换为Python int
意味着在Python中0到255之间是无符号的.我只能假设这是正确的……尽管我不明白为什么Python应该将C解释为无符号的,除非那只是我在代码中没有看到的bytearray
的约定.但是如果是这样,如果字节总是被Python视为无符号的,为什么在C端也不使用无符号的整数呢?
This makes me suspect that the underlying C-level data of a bytearray
is represented as a signed integer in C, but the conversion to Python int
from raw bytes means it is unsigned between 0 and 255 in Python. I can only assume this is true ... though I don't see why Python should interpret the C long as unsigned, unless that is merely a convention for bytearray
that I didn't see in the code. But if so, why wouldn't an unsigned integer be used on the C side as well, if the bytes are always treated by Python as unsigned?
如果为true,则应将原位排序的正确"结果视为什么?我想,由于它们都是字节",因此任何一种解释都是有效的,但是在Python精神上,我认为它们应该是被视为标准的一种方式.
If true, what should be considered the "right" result of the in-place sort? Since they are "just bytes" either interpretation is valid, I guess, but in Python spirit I think their should be one way which is considered the standard.
要匹配sorted
的输出,在C端是否足以在处理bytearray
时将值强制转换为unsigned long
?
To match output of sorted
, will it be sufficient on the C side to cast values to unsigned long
when dealing with bytearray
?
推荐答案
Python字节数组在C表示形式中使用带符号整数吗?
Does Python bytearray use signed integers in the C representation?
它使用char
s.这些是否签名取决于编译器.您可以在Include/bytearrayobject.h
中看到它. 这是2.7版本:
It uses char
s. Whether those are signed depends on the compiler. You can see this in Include/bytearrayobject.h
. Here's the 2.7 version:
/* Object layout */
typedef struct {
PyObject_VAR_HEAD
/* XXX(nnorwitz): should ob_exports be Py_ssize_t? */
int ob_exports; /* how many buffer exports */
Py_ssize_t ob_alloc; /* How many bytes allocated */
char *ob_bytes;
} PyByteArrayObject;
和这是3.5版本:
typedef struct {
PyObject_VAR_HEAD
Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */
char *ob_bytes; /* Physical backing buffer */
char *ob_start; /* Logical start inside ob_bytes */
/* XXX(nnorwitz): should ob_exports be Py_ssize_t? */
int ob_exports; /* How many buffer exports */
} PyByteArrayObject;
如果为true,则应将原位排序的正确"结果视为什么?
If true, what should be considered the "right" result of the in-place sort?
Python字节数组表示范围为0< = elem<范围内的整数序列. 256,无论编译器是否认为char
要签名.您可能应该将其排序为0≤elem <0范围内的整数序列. 256,而不是带符号的char
序列.
A Python bytearray represents a sequence of integers in the range 0 <= elem < 256, regardless of whether the compiler considers char
s to be signed. You should probably sort it as a sequence of integers in the range 0 <= elem < 256, rather than as a sequence of signed char
s.
要匹配sorted的输出,在处理字节数组时,在C侧是否足以将值转换为unsigned long?
To match output of sorted, will it be sufficient on the C side to cast values to unsigned long when dealing with bytearray?
我对Cython的了解不足,无法说出正确的代码更改.
I don't know enough about Cython to say what the correct code change would be.
这篇关于Python字节数组在C表示中使用带符号整数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!