Python字节数组在C表示中使用带符号整数吗? [英] Does Python bytearray use signed integers in the C representation?

查看:127
本文介绍了Python字节数组在C表示中使用带符号整数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个小的Cython工具,用于在Python中就位显示缓冲区协议的结构进行就地排序 .这是一项正在进行的工作;请原谅任何错误.这只是我要学习的.

I have written a small Cython tool for in-place sorting of structures exposing the buffer protocol in Python. It's a work in progress; please forgive any mistakes. This is just for me to learn.

在我的一组单元测试中,我正在跨多种不同类型的缓冲区公开数据结构测试就地排序,每种数据结构中都包含许多类型的基础数据.我可以验证它在大多数情况下都能正常工作,但是bytearray的情况非常特殊.

In my set of unit tests, I am working on testing the in-place sort across many different kinds of buffer-exposing data structures, each with many types of underlying data contained in them. I can verify it is working as expected for most cases, but the case of bytearray is very peculiar.

如果您认为我在下面的代码中导入的模块b只是在Cython中执行了简单的堆排序,而在bytearray上就位,那么以下代码示例将显示此问题:

If you take it for granted that my imported module b in the code below is just performing a straightforward heap sort in Cython, in-place on the bytearray, then the following code sample shows the issue:

In [42]: a #NumPy array
Out[42]: array([  9, 148, 115, 208, 243, 197], dtype=uint8)

In [43]: byt = bytearray(a)

In [44]: byt
Out[44]: bytearray(b'\t\x94s\xd0\xf3\xc5')

In [45]: list(byt)
Out[45]: [9, 148, 115, 208, 243, 197]

In [46]: byt1 = copy.deepcopy(byt)

In [47]: b.heap_sort(byt1)

In [48]: list(byt1)
Out[48]: [148, 197, 208, 243, 9, 115]

In [49]: list(bytearray(sorted(byt)))
Out[49]: [9, 115, 148, 197, 208, 243]

您会看到,使用sorted时,为了进行排序,将值迭代并像Python整数一样对待,然后放回新的bytearray中.

What you can see is that when using sorted, the values are iterated and treated like Python integers for the purpose of sorting, then placed back into a new bytearray.

但是,在第47-48行的就地排序显示字节被解释为有符号整数,并按其2的补码值进行排序,将数字> = 128(因为它们为负数)向左移动.

But the in-place sort, in line 47-48 shows that the bytes are being interpreted as signed integers, and are sorted by their 2's complement value, putting number >= 128, since they are negative, towards the left.

我可以通过在0-255的整个范围内进行确认:

I can confirm it by running over the whole range 0-255:

In [50]: byt = bytearray(range(0,256))

In [51]: b.heap_sort(byt)

In [52]: list(byt)
Out[52]: 
[128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185,
 186,
 187,
 188,
 189,
 190,
 191,
 192,
 193,
 194,
 195,
 196,
 197,
 198,
 199,
 200,
 201,
 202,
 203,
 204,
 205,
 206,
 207,
 208,
 209,
 210,
 211,
 212,
 213,
 214,
 215,
 216,
 217,
 218,
 219,
 220,
 221,
 222,
 223,
 224,
 225,
 226,
 227,
 228,
 229,
 230,
 231,
 232,
 233,
 234,
 235,
 236,
 237,
 238,
 239,
 240,
 241,
 242,
 243,
 244,
 245,
 246,
 247,
 248,
 249,
 250,
 251,
 252,
 253,
 254,
 255,
 0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127]

我知道这很难复制.您可以根据需要使用Cython构建链接的程序包,然后按import src.buffersort as b以获得与我正在使用的相同的排序功能.

I know this is difficult to reproduce. You can build the linked package with Cython if you want, and then import src.buffersort as b to get the same sort functions I am using.

我尝试阅读Objects/bytearrayobject.c中bytearray的源代码,但是我看到了对long的一些引用和对PyInt_FromLong的一些调用...

I've tried reading through the source code for bytearray in Objects/bytearrayobject.c, but I see some references to long and a few calls to PyInt_FromLong ...

这使我怀疑bytearray的基础C级数据在C中表示为带符号的整数,但是从原始字节转换为Python int意味着在Python中0到255之间是无符号的.我只能假设这是正确的……尽管我不明白为什么Python应该将C解释为无符号的,除非那只是我在代码中没有看到的bytearray的约定.但是如果是这样,如果字节总是被Python视为无符号的,为什么在C端也不使用无符号的整数呢?

This makes me suspect that the underlying C-level data of a bytearray is represented as a signed integer in C, but the conversion to Python int from raw bytes means it is unsigned between 0 and 255 in Python. I can only assume this is true ... though I don't see why Python should interpret the C long as unsigned, unless that is merely a convention for bytearray that I didn't see in the code. But if so, why wouldn't an unsigned integer be used on the C side as well, if the bytes are always treated by Python as unsigned?

如果为true,则应将原位排序的正确"结果视为什么?我想,由于它们都是字节",因此任何一种解释都是有效的,但是在Python精神上,我认为它们应该是被视为标准的一种方式.

If true, what should be considered the "right" result of the in-place sort? Since they are "just bytes" either interpretation is valid, I guess, but in Python spirit I think their should be one way which is considered the standard.

要匹配sorted的输出,在C端是否足以在处理bytearray时将值强制转换为unsigned long?

To match output of sorted, will it be sufficient on the C side to cast values to unsigned long when dealing with bytearray?

推荐答案

Python字节数组在C表示形式中使用带符号整数吗?

Does Python bytearray use signed integers in the C representation?

它使用char s.这些是否签名取决于编译器.您可以在Include/bytearrayobject.h中看到它. 这是2.7版本:

It uses chars. Whether those are signed depends on the compiler. You can see this in Include/bytearrayobject.h. Here's the 2.7 version:

/* Object layout */
typedef struct {
    PyObject_VAR_HEAD
    /* XXX(nnorwitz): should ob_exports be Py_ssize_t? */
    int ob_exports; /* how many buffer exports */
    Py_ssize_t ob_alloc; /* How many bytes allocated */
    char *ob_bytes;
} PyByteArrayObject;

这是3.5版本:

typedef struct {
    PyObject_VAR_HEAD
    Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */
    char *ob_bytes;      /* Physical backing buffer */
    char *ob_start;      /* Logical start inside ob_bytes */
    /* XXX(nnorwitz): should ob_exports be Py_ssize_t? */
    int ob_exports;      /* How many buffer exports */
} PyByteArrayObject;

如果为true,则应将原位排序的正确"结果视为什么?

If true, what should be considered the "right" result of the in-place sort?

Python字节数组表示范围为0< = elem<范围内的整数序列. 256,无论编译器是否认为char要签名.您可能应该将其排序为0≤elem <0范围内的整数序列. 256,而不是带符号的char序列.

A Python bytearray represents a sequence of integers in the range 0 <= elem < 256, regardless of whether the compiler considers chars to be signed. You should probably sort it as a sequence of integers in the range 0 <= elem < 256, rather than as a sequence of signed chars.

要匹配sorted的输出,在处理字节数组时,在C侧是否足以将值转换为unsigned long?

To match output of sorted, will it be sufficient on the C side to cast values to unsigned long when dealing with bytearray?

我对Cython的了解不足,无法说出正确的代码更改.

I don't know enough about Cython to say what the correct code change would be.

这篇关于Python字节数组在C表示中使用带符号整数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆