为什么当用于展平列表列表时,python 的内置 sum 函数很慢? [英] Why is python's built in sum function slow when used to flatten a list of lists?
问题描述
当尝试使用 python 2.7 的内置 sum
函数扁平化列表列表时,我遇到了一些性能问题 - 不仅计算速度慢,而且迭代方法产生了更快的结果.
When trying to flatten a list of lists using python 2.7's built-in sum
function, I've ran across some performance issues - not only was the computation slow, but the iterative approach yielded much faster results.
下面的简短代码似乎说明了这种性能差距:
The short code below seems to illustrate this performance gap:
import timeit
def sum1(arrs):
return sum(arrs, [])
def sum2(arrs):
s = []
for arr in arrs:
s += arr
return s
def main():
array_of_arrays = [[0] for _ in range(1000)]
print timeit.timeit(lambda: sum1(array_of_arrays), number=100)
print timeit.timeit(lambda: sum2(array_of_arrays), number=100)
if __name__=='__main__':
main()
在我的笔记本电脑上,我得到了输出:
On my laptop, I get as output:
>> 0.247241020203
>> 0.0043830871582
谁能给我解释一下为什么会这样?
Could anyone explain to me why is it so?
推荐答案
你的 sum2
使用 +=
:
for arr in arrs:
s += arr
sum
不使用 +=
.sum
被定义为使用 +
.不同之处在于允许s += arr
通过改变现有的s
列表来执行操作,而s = s +arr
必须构造一个新列表,复制旧列表的缓冲区.
sum
does not use +=
. sum
is defined to use +
. The difference is that s += arr
is allowed to perform the operation by mutating the existing s
list, while s = s + arr
must construct a new list, copying the buffers of the old lists.
使用 +=
,Python 可以使用有效的列表调整大小策略,该策略需要与最终列表的大小成比例的复制量.对于 N
个长度为 K
的列表,这需要与 N*K
成正比的时间.
With +=
, Python can use an efficient list resizing strategy that requires an amount of copying proportional to the size of the final list. For N
lists of length K
each, this takes time proportional to N*K
.
使用 +
,Python 无法做到这一点.对于每个 s = s + arr
,Python 必须复制整个 s
和 arr
列表来构造新的 s
.对于 N
个大小为 K
的列表,复制花费的总时间与 N**2 * K
成正比,更糟.
With +
, Python cannot do that. For every s = s + arr
, Python must copy the entire s
and arr
lists to construct the new s
. For N
lists of size K
each, the total time spent copying is proportional to N**2 * K
, much worse.
因此,您几乎不应该使用 sum
来连接序列.
Because of this, you should pretty much never use sum
to concatenate sequences.
这篇关于为什么当用于展平列表列表时,python 的内置 sum 函数很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!