python 2 vs python 3的随机性能,尤其是`random.sample`和`random.shuffle` [英] python 2 vs python 3 performance of random, particularly `random.sample` and `random.shuffle`

查看:102
本文介绍了python 2 vs python 3的随机性能,尤其是`random.sample`和`random.shuffle`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题

The issue of the performance of the python random module, and in particular, random.sample and random.shuffle came up in this question. On my computer, I get the following results:

> python  -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 1.07 usec per loop
> python3 -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 1.3 usec per loop

与python2相比,python3的性​​能下降了20%以上.情况变得更糟.

That's more than a 20% degradation of performance in python3 vs python2. It gets much worse.

> python  -m timeit -s 'import random' 'random.shuffle(list(range(10)))'
100000 loops, best of 3: 3.85 usec per loop
> python3 -m timeit -s 'import random' 'random.shuffle(list(range(10)))'
100000 loops, best of 3: 8.04 usec per loop

> python  -m timeit -s 'import random' 'random.sample(range(10),3)'
100000 loops, best of 3: 2.4 usec per loop
> python3 -m timeit -s 'import random' 'random.sample(range(10),3)'
100000 loops, best of 3: 6.49 usec per loop

这表示random.shuffle的性能下降100%,而random.sample的性能下降近200%.那很严重.

That represents a 100% degradation in performance for random.shuffle, and almost a 200% degradation for random.sample. That is quite severe.


在上述测试中,我使用了python 2.7.9和python 3.4.2.


I used python 2.7.9 and python 3.4.2 in the above tests.

我的问题:python3中的random模块发生了什么?

My question: What happened to the random module in python3?

推荐答案

-----------发生了什么变化----------------- ------------------------------

发生了几件事:

  • 在int/long统一中,整数的效率降低.这就是为什么整数现在为28字节宽,而不是64位Linux/MacOS构建中的24字节宽的原因.

  • Integers became less efficient in the int/long unification. That is also why integers are 28 bytes wide now instead of 24 bytes on 64-bit Linux/MacOS builds.

通过使用_randbelow,随机播放变得更加准确.这消除了先前算法中的细微偏差.

Shuffle became more accurate by using _randbelow. This eliminated a subtle bias in the previous algorithm.

索引变慢的原因是,整数索引的特殊情况已从 ceval.c 中删除,这主要是因为较难处理较新的整数,并且几个核心开发人员没有这样做.认为优化是值得的.

Indexing became slower because the special case for integer indices was removed from ceval.c primarily because it was harder to do with the newer integers and because a couple of the core devs didn't think the optimization was worth it.

range()函数已替换为 xrange().这很重要,因为OP的计时都在内部循环中使用 range().

The range() function was replaced with xrange(). This is relevant because the OP's timings both use range() in the inner-loop.

shuffle() sample()的算法保持不变.

Python 3进行了许多更改,例如到处都是unicode,这使内部结构变得更加复杂,速度变慢并且占用大量内存.作为回报,Python 3使用户更容易编写正确的代码.

Python 3 made a number of changes like unicode-everywhere that made the internals more complex, a little slower, and more memory intensive. In return, Python 3 makes it easier for users to write correct code.

同样,int/long统一使语言更简单,但速度和空间有所损失.在 random 模块中切换为使用_randbelow()会花费运行时间,但在准确性和正确性方面会受益.

Likewise, int/long unification made the language simpler but at a cost of speed and space. The switch to using _randbelow() in the random module had a runtime cost but benefited in terms of accuracy and correctness.

-----------结论-------------------------------- ------------------

简而言之,Python 3在对许多用户而言重要的某些方面更好,而在人们很少注意到的某些方面更差.工程通常是权衡取舍的.

In short, Python 3 is better in some ways that matter to many users and worse in some ways that people rarely notice. Engineering is often about trade-offs.

-----------详细信息-------------------------------- -------------------------

shuffle()的

Python2.7代码:

Python2.7 code for shuffle():

def shuffle(self, x, random=None):
    if random is None:
        random = self.random
    _int = int
    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = _int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

shuffle()的

Python3.6代码:

Python3.6 code for shuffle():

def shuffle(self, x, random=None):
    if random is None:
        randbelow = self._randbelow
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = randbelow(i+1)              # <-- This part changed
            x[i], x[j] = x[j], x[i]
    else:
        _int = int
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = _int(random() * (i+1))
            x[i], x[j] = x[j], x[i]

Python 2.7整数大小:

Python 2.7 integer size:

>>> import sys
>>> sys.getsizeof(1)
24

Python 3.6整数大小:

Python 3.6 integer size:

>>> import sys
>>> sys.getsizeof(1)
28

索引查找的相对速度(将整数参数索引到列表中的二进制订阅):

Relative speed of indexed lookups (binary subscriptions with integer arguments indexing into a list):

$ python2.7 -m timeit -s 'a=[0]' 'a[0]'
10000000 loops, best of 3: 0.0253 usec per loop
$ python3.6 -m timeit -s 'a=[0]' 'a[0]'
10000000 loops, best of 3: 0.0313 usec per loop

ceval.c 中的

Python 2.7代码以及针对索引查找的优化:

Python 2.7 code in ceval.c with an optimization for indexed lookups:

    TARGET_NOARG(BINARY_SUBSCR)
    {
        w = POP();
        v = TOP();
        if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
            /* INLINE: list[int] */
            Py_ssize_t i = PyInt_AsSsize_t(w);
            if (i < 0)
                i += PyList_GET_SIZE(v);
            if (i >= 0 && i < PyList_GET_SIZE(v)) {
                x = PyList_GET_ITEM(v, i);
                Py_INCREF(x);
            }
            else
                goto slow_get;
        }
        else
          slow_get:
            x = PyObject_GetItem(v, w);
        Py_DECREF(v);
        Py_DECREF(w);
        SET_TOP(x);
        if (x != NULL) DISPATCH();
        break;
    }

ceval.c 中的

Python 3.6代码,但没有优化索引查找:

Python 3.6 code in ceval.c without the optimization for indexed lookups:

    TARGET(BINARY_SUBSCR) {
        PyObject *sub = POP();
        PyObject *container = TOP();
        PyObject *res = PyObject_GetItem(container, sub);
        Py_DECREF(container);
        Py_DECREF(sub);
        SET_TOP(res);
        if (res == NULL)
            goto error;
        DISPATCH();
    }

这篇关于python 2 vs python 3的随机性能,尤其是`random.sample`和`random.shuffle`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆