在any()语句中迭代一个小列表会更快吗? [英] Is it faster to iterate a small list within an any() statement?

查看:127
本文介绍了在any()语句中迭代一个小列表会更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在低长度迭代的限制中考虑以下操作,

Consider the following operation in the limit of low length iterables,

d = (3, slice(None, None, None), slice(None, None, None))

In [215]: %timeit any([type(i) == slice for i in d])
1000000 loops, best of 3: 695 ns per loop

In [214]: %timeit any(type(i) == slice for i in d)
1000000 loops, best of 3: 929 ns per loop

设置为列表为25比使用生成器表达式快%?

Setting as a list is 25% faster than using a generator expression?

为什么这样设置为列表是一个额外的操作。

Why is this the case as setting as a list is an extra operation.

注意:在两次运行中我都获得了警告: 最慢的运行时间比最快的运行时长6.42倍。这可能意味着正在缓存中间结果

Note: In both runs I obtained the warning: The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached I

在此特定测试中, list()结构的速度更快,长度为 4 提高了性能。

In this particular test, list() structures are faster up to a length of 4 from which the generator has increased performance.

红线显示此事件发生的位置,黑线显示两者的性能相等。

The red line shows where this event occurs and the black line shows where both are equal in performance.


使用所有核心代码在我的MacBook Pro上运行大约需要1分钟:

The code takes about 1min to run on my MacBook Pro by utilising all the cores:

import timeit, pylab, multiprocessing
import numpy as np

manager = multiprocessing.Manager()
g = manager.list([])
l = manager.list([])

rng = range(1,16) # list lengths
max_series = [3,slice(None, None, None)]*rng[-1] # alternate array types
series = [max_series[:n] for n in rng]

number, reps = 1000000, 5
def func_l(d):
    l.append(timeit.repeat("any([type(i) == slice for i in {}])".format(d),repeat=reps, number=number))
    print "done List, len:{}".format(len(d))
def func_g(d):
    g.append(timeit.repeat("any(type(i) == slice for i in {})".format(d), repeat=reps, number=number))
    print "done Generator, len:{}".format(len(d))

p = multiprocessing.Pool(processes=min(16,rng[-1])) # optimize for 16 processors
p.map(func_l, series) # pool list
p.map(func_g, series) # pool gens

ratio = np.asarray(g).mean(axis=1) / np.asarray(l).mean(axis=1)
pylab.plot(rng, ratio, label='av. generator time / av. list time')
pylab.title("{} iterations, averaged over {} runs".format(number,reps))
pylab.xlabel("length of iterable")
pylab.ylabel("Time Ratio (Higher is worse)")
pylab.legend()
lt_zero = np.argmax(ratio<1.)
pylab.axhline(y=1, color='k')
pylab.axvline(x=lt_zero+1, color='r')
pylab.ion() ; pylab.show()


推荐答案

捕获的大小是您正在申请的商品任何。在较大的数据集上重复相同的过程:

The catch is the size of the items you are applying any on. Repeat the same process on a larger dataset:

In [2]: d = ([3] * 1000) + [slice(None, None, None), slice(None, None, None)]*1000

In [3]: %timeit any([type(i) == slice for i in d])
1000 loops, best of 3: 736 µs per loop

In [4]: %timeit any(type(i) == slice for i in d)
1000 loops, best of 3: 285 µs per loop

然后,使用列表(将所有项加载到内存中)变得慢得多,并且生成器表达式更好。

Then, using a list (loading all the items into memory) becomes much slower, and the generator expression plays out better.

这篇关于在any()语句中迭代一个小列表会更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆