列表理解与生成器表达式的奇怪时间结果? [英] List comprehension vs generator expression's weird timeit results?

查看:86
本文介绍了列表理解与生成器表达式的奇怪时间结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在回答这个问题,我在这里更喜欢生成器表达式并使用了它,我认为这样做会更快,因为生成器不需要先创建整个列表:

I was answering this question, I preferred generator expression here and used this, which I thought would be faster as generator doesn't need to create the whole list first:

>>> lis=[['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True

Levon在他的解决方案中使用了列表理解功能,

And Levon used list comprehension in his solution,

>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in [j for i in mylist for j in i]
True

但是当我这样做时,这些LC的时间结果比生成器快:

But when I did the timeit results for these LC was faster than generator:

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in (y for x in lis for y in x)"
    100000 loops, best of 3: 2.36 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in [y for x in lis for y in x]"
    100000 loops, best of 3: 1.51 usec per loop

然后我增加了列表的大小,并再次对其进行了计时:

then I increased the size of list, and timed it again:

lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]

这次搜索'd'生成器比LC快,但是当我搜索中间元素(11)和最后一个元素时,LC再次击败了生成器表达式,我不明白为什么?

This time for searching 'd' generator was faster than LC, but when I searched a middle element(11) and the last element then LC again beats the generator expression, and I can't understand why?

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
    100000 loops, best of 3: 2.96 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
    100000 loops, best of 3: 7.4 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in [y for x in lis for y in x]"
100000 loops, best of 3: 5.61 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in (y for x in lis for y in x)"
100000 loops, best of 3: 9.76 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in (y for x in lis for y in x)"
100000 loops, best of 3: 8.94 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in [y for x in lis for y in x]"
100000 loops, best of 3: 7.13 usec per loop

推荐答案

扩展 Paulo 的答案,生成器表达式由于函数调用的开销,它们通常比列表理解要慢.在这种情况下,如果in的短路行为在很早就被发现,则可以弥补这种缓慢性,但否则,该模式将保持不变.

Expanding on Paulo's answer, generator expressions are often slower than list comprehensions because of the overhead of function calls. In this case, the short-circuiting behavior of in offsets that slowness if the item is found fairly early, but otherwise, the pattern holds.

我通过探查器运行了一个简单的脚本,以进行更详细的分析.这是脚本:

I ran a simple script through the profiler for a more detailed analysis. Here's the script:

lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],
     [7,8,9],[10,11,12],[13,14,15],[16,17,18]]

def ge_d():
    return 'd' in (y for x in lis for y in x)
def lc_d():
    return 'd' in [y for x in lis for y in x]

def ge_11():
    return 11 in (y for x in lis for y in x)
def lc_11():
    return 11 in [y for x in lis for y in x]

def ge_18():
    return 18 in (y for x in lis for y in x)
def lc_18():
    return 18 in [y for x in lis for y in x]

for i in xrange(100000):
    ge_d()
    lc_d()
    ge_11()
    lc_11()
    ge_18()
    lc_18()

以下是相关结果,已重新排序以使模式更清晰.

Here are the relevant results, reordered to make the patterns clearer.

         5400002 function calls in 2.830 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.158    0.000    0.251    0.000 fop.py:3(ge_d)
   500000    0.092    0.000    0.092    0.000 fop.py:4(<genexpr>)
   100000    0.285    0.000    0.285    0.000 fop.py:5(lc_d)

   100000    0.356    0.000    0.634    0.000 fop.py:8(ge_11)
  1800000    0.278    0.000    0.278    0.000 fop.py:9(<genexpr>)
   100000    0.333    0.000    0.333    0.000 fop.py:10(lc_11)

   100000    0.435    0.000    0.806    0.000 fop.py:13(ge_18)
  2500000    0.371    0.000    0.371    0.000 fop.py:14(<genexpr>)
   100000    0.344    0.000    0.344    0.000 fop.py:15(lc_18)

创建生成器表达式等效于创建生成器函数并调用它.这说明了对<genexpr>的一次调用.然后,在第一种情况下,next被调用4次,直到达到d,总共进行5次调用(时间100000次迭代= ncalls = 500000次).在第二种情况下,它被调用了17次,总共有18次调用.在第三次中,有24次,总共有25次通话.

Creating a generator expression is equivalent to creating a generator function and calling it. That accounts for one call to <genexpr>. Then, in the first case, next is called 4 times, until d is reached, for a total of 5 calls (times 100000 iterations = ncalls = 500000). In the second case, it is called 17 times, for a total of 18 calls; and in the third, 24 times, for a total of 25 calls.

在第一种情况下,genex的性能优于列表理解,但是在第二和第三种情况下,对next的额外调用造成了列表理解速度与生成器表达式速度之间的大部分差异. /p>

The genex outperforms the list comprehension in the first case, but the extra calls to next account for most of the difference between the speed of the list comprehension and the speed of the generator expression in the second and third cases.

>>> .634 - .278 - .333
0.023
>>> .806 - .371 - .344
0.091

我不确定剩下的时间是什么?似乎即使没有附加的函数调用,生成器表达式的执行速度也会变慢.我想这确认了 inspectorG4dget 的断言与列表理解相比,创建生成器理解具有更多的本机开销".但是无论如何,这很清楚地表明生成器表达式主要是较慢,这是因为调用了next.

I'm not sure what accounts for the remaining time; it seems that generator expressions would be a hair slower even without the additional function calls. I suppose this confirms inspectorG4dget's assertion that "creating a generator comprehension has more native overhead than does a list comprehension." But in any case, this shows pretty clearly that generator expressions are slower mostly because of calls to next.

我要补充一点,当短路无济于事时,即使对于非常大的列表,列表理解也会更快地 .例如:

I'll add that when short-circuiting doesn't help, list comprehensions are still faster, even for very large lists. For example:

>>> counter = itertools.count()
>>> lol = [[counter.next(), counter.next(), counter.next()] 
           for _ in range(1000000)]
>>> 2999999 in (i for sublist in lol for i in sublist)
True
>>> 3000000 in (i for sublist in lol for i in sublist)
False
>>> %timeit 2999999 in [i for sublist in lol for i in sublist]
1 loops, best of 3: 312 ms per loop
>>> %timeit 2999999 in (i for sublist in lol for i in sublist)
1 loops, best of 3: 351 ms per loop
>>> %timeit any([2999999 in sublist for sublist in lol])
10 loops, best of 3: 161 ms per loop
>>> %timeit any(2999999 in sublist for sublist in lol)
10 loops, best of 3: 163 ms per loop
>>> %timeit for i in [2999999 in sublist for sublist in lol]: pass
1 loops, best of 3: 171 ms per loop
>>> %timeit for i in (2999999 in sublist for sublist in lol): pass
1 loops, best of 3: 183 ms per loop

如您所见,当短路无关紧要时,即使对于一百万个项目长的列表,列表理解也始终保持一致.显然,对于in在这些规模上的实际使用,由于短路,发电机会更快.但是对于在项目数量上真正线性的其他类型的迭代任务,列表理解总是快得多.如果您需要在一个列表上执行多个测试,则尤其如此.您可以非常快速地遍历已构建的列表理解 :

As you can see, when short circuiting is irrelevant, list comprehensions are consistently faster even for a million-item-long list of lists. Obviously for actual uses of in at these scales, generators will be faster because of short-circuiting. But for other kinds of iterative tasks that are truly linear in the number of items, list comprehensions are pretty much always faster. This is especially true if you need to perform multiple tests on a list; you can iterate over an already-built list comprehension very quickly:

>>> incache = [2999999 in sublist for sublist in lol]
>>> get_list = lambda: incache
>>> get_gen = lambda: (2999999 in sublist for sublist in lol)
>>> %timeit for i in get_list(): pass
100 loops, best of 3: 18.6 ms per loop
>>> %timeit for i in get_gen(): pass
1 loops, best of 3: 187 ms per loop

在这种情况下,列表理解要快一个数量级!

In this case, the list comprehension is an order of magnitude faster!

当然,只有在内存用尽之前,情况才会如此.这将我带到了最后一点.使用生成器的主要原因有两个:利用短路的优势并节省内存.对于非常大的序列/可迭代对象,生成器是显而易见的方法,因为它们可以节省内存.但是,如果不能选择短路,那么您几乎从不从列表中选择发电机,而不是 speed .您选择它们​​来节省内存,这始终是一个权衡.

Of course, this only remains true until you run out of memory. Which brings me to my final point. There are two main reasons to use a generator: to take advantage of short circuiting, and to save memory. For very large seqences/iterables, generators are the obvious way to go, because they save memory. But if short-circuiting is not an option, you pretty much never choose generators over lists for speed. You chose them to save memory, and it's always a trade-off.

这篇关于列表理解与生成器表达式的奇怪时间结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆