负整数的Python表示形式 [英] Python representation of negative integers

查看:105
本文介绍了负整数的Python表示形式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

>>> x = -4
>>> print("{} {:b}".format(x, x))
-4 -100
>>> mask = 0xFFFFFFFF
>>> print("{} {:b}".format(x & mask, x & mask))
4294967292 11111111111111111111111111111100
>>> 
>>> x = 0b11111111111111111111111111111100
>>> print("{} {:b}".format(x, x))
4294967292 11111111111111111111111111111100
>>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask)))
-4 -100

我在弄清楚Python如何表示负整数以及位操作的方式时遇到了麻烦.据我了解,Python尝试模拟二进制补码,但是可以使用任意数量的位.因此,通常使用32位掩码强制Python在位操作之前为整数设置标准大小.

I am having trouble figuring out how Python represents negative integers, and therefore how bit operations work. It is my understanding that Python attempts to emulate two's complement, but with any number of bits. Therefore, it is common to use 32-bit masks to force Python to set a standard size on integers before bit operations.

如您在我的示例中所见,-4 & 0xFFFFFFFF产生一个很大的正数.为什么Python似乎将其读为无符号整数,而不是二进制补码负数?后来,操作~(x ^ mask)应该产生与大正整数完全相同的二进制补码位模式,而是给出-4.是什么导致转换为带符号的int?

As you can see in my example, -4 & 0xFFFFFFFF yields a large positive number. Why does Python seem to read this as an unsigned integer, instead of a two's complement negative number? Later, the operation ~(x ^ mask), which should yield the exact same two's complement bit pattern as the large positive, instead gives -4. What causes the conversion to a signed int?

谢谢!

推荐答案

TLDR; CPython整数类型将符号存储在结构的特定字段中.当执行按位运算时,CPython将负数替换为它们的二进制补码,而有时(!)执行相反的操作(即,将负数替换为两个补数).

TLDR; CPython integer type stores the sign in a specific field of a structure. When performing a bitwise operation, CPython replaces negative numbers by their two's complement and sometimes (!) performs the reverse operation (ie replace the two's complements by negative numbers).

整数的内部表示形式是PyLongObject结构,其中包含PyVarObject结构. (当CPython创建新的PyLong对象时,它将为结构分配内存,并为数字分配尾随空间.)这里重要的是PyLong的大小是:嵌入的PyVarObjectob_size字段struct包含整数的大小(以数字为单位)(数字是15位或30位数字). 如果整数为负数,则此大小为减去个位数.

The internal representation of an integer is a PyLongObject struct, that contains a PyVarObject struct. (When CPython creates a new PyLong object, it allocates the memory for the structure and a trailing space for the digits.) What matter here is that the PyLong is sized: the ob_size field of the PyVarObject embedded struct contains the size (in digits) of the integer (digits are either 15 or 30 bits digits). If the integer is negative then this size is minus the number of digits .

(参考: https://github.com/python/cpython/blob/master/Include/object.h https://github.com/python/cpython/blob/master/Include/longobject.h )

如您所见,整数的内部CPython表示形式实际上与通常的二进制表示形式相去甚远.但是CPython必须为各种目的提供按位操作.让我们看一下代码中的注释:

As you see, the inner CPython's representation of an integer is really far from the usual binary representation. Yet CPython has to provide bitwise operations for various purposes. Let's take a look at the comments in the code:

static PyObject *
long_bitwise(PyLongObject *a,
             char op,  /* '&', '|', '^' */
             PyLongObject *b)
{
    /* Bitwise operations for negative numbers operate as though
       on a two's complement representation.  So convert arguments
       from sign-magnitude to two's complement, and convert the
       result back to sign-magnitude at the end. */

    /* If a is negative, replace it by its two's complement. */
    /* Same for b. */
    /* Complement result if negative. */
}

要在按位运算中处理负整数,CPython使用二进制补码(实际上,这是一个二进制补码),但我不赘述.但是请注意符号规则" (名称是我的):结果的符号是应用于数字符号的按位运算符.更准确地说,如果nega <op> negb == 1,结果为负(对于负数,negx = 1,对于正数,0). 简化代码:

To handle negative integers in bitwise operations, CPython use the two's complement (actually, that's a two's complement digit by digit, but I don't go into the details). But note the "Sign Rule" (name is mine): the sign of the result is the bitwise operator applied to the signs of the numbers. More precisely, the result is negative if nega <op> negb == 1, (negx = 1 for negative, 0 for positive). Simplified code:

switch (op) {
    case '^': negz = nega ^ negb; break;
    case '&': negz = nega & negb; break;
    case '|': negz = nega | negb; break;
    default: ...
}

二进制格式

另一方面,格式化程序甚至不执行二进制补码,即使是二进制表示形式:[format_long_internal](https://github.com/python/cpython/blob/master/Python/formatter_unicode.c#L839)调用[long_format_binary](https://github.com/python/cpython/blob/master/Objects/longobject.c#L1934)并删除两个前导字符,但保留符号.参见代码:

Binary formatting

On the other hand, the formatter does not perform the two's complement, even in binary representation: [format_long_internal](https://github.com/python/cpython/blob/master/Python/formatter_unicode.c#L839) calls [long_format_binary](https://github.com/python/cpython/blob/master/Objects/longobject.c#L1934) and remove the two leading characters, but keeps the sign. See the code:

 /* Is a sign character present in the output?  If so, remember it
           and skip it */
        if (PyUnicode_READ_CHAR(tmp, inumeric_chars) == '-') {
            sign_char = '-';
            ++prefix;
            ++leading_chars_to_skip;
}

long_format_binary函数不执行任何二进制补码:仅输出以2为底的数字,

The long_format_binary function does not perform any two's complement: just output the number in base 2, preceeded by the sign.

    if (negative)                                                   \
        *--p = '-'; \

您的问题

我将按照您的REPL顺序进行操作:

Your question

I will follow your REPL sequence:

>>> x = -4
>>> print("{} {:b}".format(x, x))
-4 -100

毫不奇怪,考虑到格式中没有两个补码,而是一个符号.

Nothing surprising, given that there is no two's complement in formatting, but a sign.

>>> mask = 0xFFFFFFFF
>>> print("{} {:b}".format(x & mask, x & mask))
4294967292 11111111111111111111111111111100

数字-4为负.因此,在逻辑和之前,用二进制补码替换它.您期望结果将变为负数,但请重新考虑签名规则":

The number -4 is negative. Hence, it is replaced by its two's complement before the logical and, digit by digit. You expected that the result will be turned into a negative number, but remenber the "Sign Rule":

>>> nega=1; negb=0
>>> nega & negb
0

因此:1.结果没有负号; 2.结果不取二.即使该规则看起来不太直观,您的结果也符合签名规则".

Hence: 1. the result does not have the negative sign; 2. the result is not complemented to two. Your result is compliant with the "Sign Rule", even if this rule doesn't seem very intuitive.

现在,最后一部分:

>>> x = 0b11111111111111111111111111111100
>>> print("{} {:b}".format(x, x))
4294967292 11111111111111111111111111111100
>>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask)))
-4 -100

再次

-4为负,因此用二进制补码0b11111111111111111111111111111100代替,然后与0b11111111111111111111111111111111进行异或.结果为0b11(3).您取补数一进制,即再次为0b11111111111111111111111111111100,但是这次的符号为负:

Again, -4 is negative, hence replaced by it's two's complement 0b11111111111111111111111111111100, then XORed with 0b11111111111111111111111111111111. The result is 0b11 (3). You take the complement unary, that is 0b11111111111111111111111111111100 again, but this time the sign is negative:

>>> nega=1; negb=0
>>> nega ^ negb
1

因此,结果得到了补充,并得到了负号,正如您所期望的那样.

Therefore, the result is complemented and gets the negative sign, as you expected.

结论:我想没有一个完美的解决方案来拥有任意长的带符号的数字并且提供按位运算,但是文档对所做出的选择并不是很冗长.

Conclusion: I guess there was no perfect solution to have arbitrary long signed number and provide bitwise operations, but the documentation is not really verbose on the choices that were made.

这篇关于负整数的Python表示形式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆