为什么当用于展平列表列表时,python 的内置 sum 函数很慢? [英] Why is python's built in sum function slow when used to flatten a list of lists?

查看:60
本文介绍了为什么当用于展平列表列表时,python 的内置 sum 函数很慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试使用 python 2.7 的内置 sum 函数扁平化列表列表时,我遇到了一些性能问题 - 不仅计算速度慢,而且迭代方法产生了更快的结果.

When trying to flatten a list of lists using python 2.7's built-in sum function, I've ran across some performance issues - not only was the computation slow, but the iterative approach yielded much faster results.

下面的简短代码似乎说明了这种性能差距:

The short code below seems to illustrate this performance gap:

import timeit

def sum1(arrs):
    return sum(arrs, [])

def sum2(arrs):
    s = []
    for arr in arrs:
        s += arr
    return s

def main():
    array_of_arrays = [[0] for _ in range(1000)]
    print timeit.timeit(lambda: sum1(array_of_arrays), number=100)
    print timeit.timeit(lambda: sum2(array_of_arrays), number=100)

if __name__=='__main__':
    main()

在我的笔记本电脑上,我得到了输出:

On my laptop, I get as output:

>> 0.247241020203
>> 0.0043830871582

谁能给我解释一下为什么会这样?

Could anyone explain to me why is it so?

推荐答案

你的 sum2 使用 +=:

    for arr in arrs:
        s += arr

sum 不使用 +=.sum 被定义为使用 +.不同之处在于允许s += arr通过改变现有的s列表来执行操作,而s = s +arr 必须构造一个新列表,复制旧列表的缓冲区.

sum does not use +=. sum is defined to use +. The difference is that s += arr is allowed to perform the operation by mutating the existing s list, while s = s + arr must construct a new list, copying the buffers of the old lists.

使用 +=,Python 可以使用有效的列表调整大小策略,该策略需要与最终列表的大小成比例的复制量.对于 N 个长度为 K 的列表,这需要与 N*K 成正比的时间.

With +=, Python can use an efficient list resizing strategy that requires an amount of copying proportional to the size of the final list. For N lists of length K each, this takes time proportional to N*K.

使用 +,Python 无法做到这一点.对于每个 s = s + arr,Python 必须复制整个 sarr 列表来构造新的 s.对于 N 个大小为 K 的列表,复制花费的总时间与 N**2 * K 成正比,更糟.

With +, Python cannot do that. For every s = s + arr, Python must copy the entire s and arr lists to construct the new s. For N lists of size K each, the total time spent copying is proportional to N**2 * K, much worse.

因此,您几乎不应该使用 sum 来连接序列.

Because of this, you should pretty much never use sum to concatenate sequences.

这篇关于为什么当用于展平列表列表时,python 的内置 sum 函数很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆