为什么列表上的总和(有时)比 itertools.chain 快? [英] why sum on lists is (sometimes) faster than itertools.chain?

查看:28
本文介绍了为什么列表上的总和(有时)比 itertools.chain 快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里回答了几个问题,使用它来展平"列表列表:

<预><代码>>>>l = [[1,2,3],[4,5,6],[7,8,9]]>>>总和(l,[])

它工作正常并产生:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

虽然我被告知 sum 运算符执行 a = a + b 不如 itertools.chain

的性能>

我计划的问题是为什么可以在列表中阻止它在字符串上出现",但我在我的机器上做了一个快速基准比较 sumitertools.chain.from_iterable 在相同的数据上:

import itertools,timeit打印(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))打印(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))

我做了好几次,我总是得到与下面相同的数字:

0.71555228360702460.9883352857722025

令我惊讶的是,chain - 在对我的答案的几条评论中,每个人都推荐比 sum 用于列表 - 速度要慢得多.

for 循环中迭代时仍然很有趣,因为它实际上并没有创建列表,但是在创建列表时,sum 获胜.

那么当预期结果是一个 list 时,我们应该放弃 itertools.chain 并使用 sum 吗?

感谢一些评论,我通过增加列表数量进行了另一个测试

s = 'l = [[4,5,6] for _ in range(20)]'打印(timeit.timeit("sum(l,[])",setup=s))打印(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))

现在我得到了相反的结果:

6.4798978107025373.793455760814343

解决方案

您的测试输入很小.在这些尺度上,sum 版本可怕的 O(n^2) 渐近运行时间是不可见的.时间由常数因子决定,并且 sum 具有更好的常数因子,因为它不必通过迭代器工作.

对于更大的列表,很明显 sum 根本不是为这种事情设计的:

<预><代码>>>>timeit.timeit('list(itertools.chain.from_iterable(l))',... 'l = [[i] for i in xrange(5000)];导入 itertools',...数=1000)0.20425895931668947>>>timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)49.55303902059097

I answered several questions here by using this to "flatten" a list of lists:

>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])

it works fine and yields:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

although I was told that the sum operator does a = a + b which is not as performant as itertools.chain

My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum and itertools.chain.from_iterable on the same data:

import itertools,timeit

print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))

I did that several times and I always get about the same figures as below:

0.7155522836070246
0.9883352857722025

To my surprise, chain - recommended over sum for lists by everyone in several comments on my answers - is much slower.

It's still interesting when iterating in a for loop because it doesn't actually create the list, but when creating the list, sum wins.

So should we drop itertools.chain and use sum when the expected result is a list ?

EDIT: thanks to some comments, I made another test by increasing the number of lists

s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))

now I get the opposite:

6.479897810702537
3.793455760814343

解决方案

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum version isn't visible. The timings are dominated by constant factors, and sum has a better constant factor, since it doesn't have to work through iterators.

With bigger lists, it becomes clear that sum is not at all designed for this kind of thing:

>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
...               'l = [[i] for i in xrange(5000)]; import itertools',
...               number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097

这篇关于为什么列表上的总和(有时)比 itertools.chain 快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆