在 any() 语句中迭代一个小列表是否更快? [英] Is it faster to iterate a small list within an any() statement?

查看:25
本文介绍了在 any() 语句中迭代一个小列表是否更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在低长度迭代的限制下考虑以下操作,

Consider the following operation in the limit of low length iterables,

d = (3, slice(None, None, None), slice(None, None, None))

In [215]: %timeit any([type(i) == slice for i in d])
1000000 loops, best of 3: 695 ns per loop

In [214]: %timeit any(type(i) == slice for i in d)
1000000 loops, best of 3: 929 ns per loop

设置为 list 比使用生成器表达式快 25%?

Setting as a list is 25% faster than using a generator expression?

为什么会这样,因为设置为 list 是一个额外的操作.

Why is this the case as setting as a list is an extra operation.

注意:在两次运行中,我都收到了警告:最慢的运行时间比最快的运行时间长 6.42 倍.这可能意味着正在缓存中间结果 I

Note: In both runs I obtained the warning: The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached I

在这个特定的测试中,list() 结构的速度更快,直到 4 的长度,生成器由此提高了性能.

In this particular test, list() structures are faster up to a length of 4 from which the generator has increased performance.

红线表示发生此事件的位置,黑线表示两者性能相同的位置.

The red line shows where this event occurs and the black line shows where both are equal in performance.

通过使用所有内核,代码在我的 MacBook Pro 上运行大约需要 1 分钟:

The code takes about 1min to run on my MacBook Pro by utilising all the cores:

import timeit, pylab, multiprocessing
import numpy as np

manager = multiprocessing.Manager()
g = manager.list([])
l = manager.list([])

rng = range(1,16) # list lengths
max_series = [3,slice(None, None, None)]*rng[-1] # alternate array types
series = [max_series[:n] for n in rng]

number, reps = 1000000, 5
def func_l(d):
    l.append(timeit.repeat("any([type(i) == slice for i in {}])".format(d),repeat=reps, number=number))
    print "done List, len:{}".format(len(d))
def func_g(d):
    g.append(timeit.repeat("any(type(i) == slice for i in {})".format(d), repeat=reps, number=number))
    print "done Generator, len:{}".format(len(d))

p = multiprocessing.Pool(processes=min(16,rng[-1])) # optimize for 16 processors
p.map(func_l, series) # pool list
p.map(func_g, series) # pool gens

ratio = np.asarray(g).mean(axis=1) / np.asarray(l).mean(axis=1)
pylab.plot(rng, ratio, label='av. generator time / av. list time')
pylab.title("{} iterations, averaged over {} runs".format(number,reps))
pylab.xlabel("length of iterable")
pylab.ylabel("Time Ratio (Higher is worse)")
pylab.legend()
lt_zero = np.argmax(ratio<1.)
pylab.axhline(y=1, color='k')
pylab.axvline(x=lt_zero+1, color='r')
pylab.ion() ; pylab.show()

推荐答案

关键是你正在应用 any 的项目的大小.在更大的数据集上重复相同的过程:

The catch is the size of the items you are applying any on. Repeat the same process on a larger dataset:

In [2]: d = ([3] * 1000) + [slice(None, None, None), slice(None, None, None)]*1000

In [3]: %timeit any([type(i) == slice for i in d])
1000 loops, best of 3: 736 µs per loop

In [4]: %timeit any(type(i) == slice for i in d)
1000 loops, best of 3: 285 µs per loop

然后,使用 list(将所有项目加载到内存中)会变得更慢,而生成器表达式的效果会更好.

Then, using a list (loading all the items into memory) becomes much slower, and the generator expression plays out better.

这篇关于在 any() 语句中迭代一个小列表是否更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆