对于Python 3.x整数,比位移快两倍吗? [英] Times-two faster than bit-shift, for Python 3.x integers?

查看:82
本文介绍了对于Python 3.x整数,比位移快两倍吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看 sorted_containers 的来源,惊讶地看到此行:

I was looking at the source of sorted_containers and was surprised to see this line:

self._load, self._twice, self._half = load, load * 2, load >> 1

此处load是整数.为什么在一个位置使用位移,而在另一个位置使用乘法呢?移位可能比整数除以2快,但这是合理的,但是为什么不还用移位代替乘法呢?我对以下情况进行了基准测试:

Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:

  1. (次数,除法)
  2. (班次,班次)
  3. (时间,班次)
  4. (平移,除法)

发现#3始终比其他替代方法快:

and found that #3 is consistently faster than other alternatives:

# self._load, self._twice, self._half = load, load * 2, load >> 1

import random
import timeit
import pandas as pd

x = random.randint(10 ** 3, 10 ** 6)

def test_naive():
    a, b, c = x, 2 * x, x // 2

def test_shift():
    a, b, c = x, x << 1, x >> 1    

def test_mixed():
    a, b, c = x, x * 2, x >> 1    

def test_mixed_swapped():
    a, b, c = x, x << 1, x // 2

def observe(k):
    print(k)
    return {
        'naive': timeit.timeit(test_naive),
        'shift': timeit.timeit(test_shift),
        'mixed': timeit.timeit(test_mixed),
        'mixed_swapped': timeit.timeit(test_mixed_swapped),
    }

def get_observations():
    return pd.DataFrame([observe(k) for k in range(100)])

问题:

我的考试有效吗?如果是这样,为什么(乘法,移位)比(移位,移位)快?

Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?

我在Ubuntu 14.04上运行Python 3.5.

I run Python 3.5 on Ubuntu 14.04.

修改

以上是问题的原始说明.丹·盖茨(Dan Getz)在回答中提供了出色的解释.

Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.

为完整起见,以下是在不应用乘法优化的情况下较大的x的示例插图.

For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.

推荐答案

这似乎是因为小数的乘法在CPython 3.5中得到了优化,而小数的左移则没有.正向左移始终会创建一个较大的整数对象,以存储结果,作为计算的一部分,而对于您在测试中使用的排序的乘法,特殊的优化可避免这种情况,并创建正确大小的整数对象.可以在 Python的整数实现的源代码中看到.

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.

由于Python中的整数是任意精度的,因此它们存储为整数数字"的数组,每个整数位数的位数受到限制.因此,在一般情况下,涉及整数的运算不是单个运算,而是需要处理多个数字"的情况.在 pyport.h 中,此位限制定义为在64位平台上为30位,否则为15位. (为了简化说明,我从这里开始将其称为30.但是请注意,如果您使用的是针对32位编译的Python,则基准测试的结果取决于x是否小于32,768.)

Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)

当操作的输入和输出保持在此30位限制内时,可以以优化的方式而不是一般的方式来处理该操作. 整数乘法实现的开头如下:

When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

    /* fast path for single-digit multiplication */
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        /* if we don't have long long then we're almost certainly
           using 15-bit digits, so v will fit in a long.  In the
           unlikely event that we're using 30-bit digits on a platform
           without long long, a large v will just cause us to fall
           through to the general multiplication code below. */
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

因此,当两个整数相乘时,每个整数都适合一个30位数字,这由CPython解释器直接进行乘法运算,而不是将整数作为数组使用. (在正整数对象上调用的 MEDIUM_VALUE() 只会获得其前30个位结果.)如果结果适合单个30位数字,请 PyLong_FromLongLong() 将在相对较少的操作中注意到这一点,并创建一个单位整数对象来存储它.

So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.

相反,左移并没有以这种方式优化,每个左移都将整数作为数组进行处理.特别是,如果您查看 long_lshift() 的源代码,请在如果是小而正的左移,则总是创建一个2位整数对象,如果只是将其长度在以后截断为1:(我在/*** ***/中的注释)

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** zero for small w ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** w for small w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 1 for small v > 0 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}


整数除法

您没有问整数地板除法与右移相比有更差的表现,因为它符合您(和我)的期望.但是,将一个小正数除以另一个小正数也不会像小乘法那样优化.每个//都使用函数和余数. > long_divrem() .余数是使用乘法存储在新分配的整数对象中,在这种情况下,该对象立即被丢弃.


Integer division

You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.

这篇关于对于Python 3.x整数,比位移快两倍吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆