为什么使用列表理解而不是生成器表达式时,更新列表的速度更快? [英] Why is updating a list faster when using a list comprehension as opposed to a generator expression?

查看:65
本文介绍了为什么使用列表理解而不是生成器表达式时,更新列表的速度更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据此答案列表在许多情况下(例如,当与(因为该算法需要将数据传递两次).

According to this answer lists perform better than generators in a number of cases, for example when used together with str.join (since the algorithm needs to pass over the data twice).

在下面的示例中,使用 list comprehension 似乎比使用相应的生成器表达式产生更好的性能,尽管直观上,list comprehension带有分配和复制到生成器回避的其他内存的开销.

In the following example using a list comprehension seems to yield better performance than using a corresponding generator expression though intuitively the list comprehension comes with an overhead of allocating and copying to additional memory which the generator sidesteps.

In [1]: l = list(range(2_000_000))

In [2]: %timeit l[:] = [i*3 for i in range(len(l))]
190 ms ± 4.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit l[:] = (i*3 for i in range(len(l)))
261 ms ± 7.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit l[::2] = [i*3 for i in range(len(l)//2)]
97.1 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit l[::2] = (i*3 for i in range(len(l)//2))
129 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit l[:len(l)//2] = [i*3 for i in range(len(l)//2)]
92.6 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit l[:len(l)//2] = (i*3 for i in range(len(l)//2))
118 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

为什么在这些情况下列表理解会产生更好的性能?

Why does a list comprehension yield better performance in these cases?

推荐答案

此答案仅涉及CPython实现. 使用列表推导速度更快,因为生成器还是会先转换为列表.之所以这样做,是因为应该在 之前确定序列的长度,然后再替换数据,发电机无法告诉您其长度.

This answer concerns CPython implementation only. Using a list comprehension is faster, since the generator is first converted into a list anyway. This is done because the length of the sequence should be determined before proceeding to replace data, and a generator can't tell you its length.

对于列表切片分配,此操作由名称有趣的 list_ass_slice .分配列表或元组有一种特殊情况的处理,

For list slice assignment, this operation is handled by the amusingly named list_ass_slice. There is a special-case handling for assigning a list or tuple, here - they can use PySequence_Fast ops.

是v3. 7.4 PySequence_Fast的实现,您可以清楚地看到列表或元组的类型检查:

This is the v3.7.4 implementation of PySequence_Fast, where you can clearly see a type-check for list or tuples:

PyObject *
PySequence_Fast(PyObject *v, const char *m)
{
    PyObject *it;

    if (v == NULL) {
        return null_error();
    }

    if (PyList_CheckExact(v) || PyTuple_CheckExact(v)) {
        Py_INCREF(v);
        return v;
    }

    it = PyObject_GetIter(v);
    if (it == NULL) {
        if (PyErr_ExceptionMatches(PyExc_TypeError))
            PyErr_SetString(PyExc_TypeError, m);
        return NULL;
    }

    v = PySequence_List(it);
    Py_DECREF(it);

    return v;
}

生成器表达式将无法通过此类型检查,并继续执行后备代码,在该代码中将其转换为列表对象,以便

A generator expression will fail this type check and continue to the fallback code, where it is converted into a list object, so that the length can be predetermined.

在一般情况下,需要预定长度以便有效分配列表存储空间,并且

In the general case, a predetermined length is desirable in order to allow efficient allocation of list storage, and also to provide useful error messages with extended slice assignment:

>>> vals = (x for x in 'abc')
>>> L = [1,2,3]
>>> L[::2] = vals  # attempt assigning 3 values into 2 positions
---------------------------------------------------------------------------
                                          Traceback (most recent call last)
...
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
>>> L  # data unchanged
[1, 2, 3]
>>> list(vals)  # generator was fully consumed
[]

这篇关于为什么使用列表理解而不是生成器表达式时,更新列表的速度更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆