内置范围或 numpy.arange:哪个更有效? [英] built-in range or numpy.arange: which is more efficient?

查看:70
本文介绍了内置范围或 numpy.arange:哪个更有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用范围表达式迭代大型数组时,我应该使用 Python 的内置范围函数还是 numpy 的 arange 以获得最佳性能?

When iterating over a large array with a range expression, should I use Python's built-in range function, or numpy's arange to get the best performance?

到目前为止我的推理:

arange 可能会采用本机实现,因此可能会更快.另一方面,arange 返回一个完整的数组,它占用内存,因此可能会有开销.Python 3 的范围表达式是一个生成器,它不保存内存中的所有值.

arange probably resorts to a native implementation and might be faster therefore. On the other hand, arange returns a full array, which occupies memory, so there might be an overhead. Python 3's range expression is a generator, which does not hold all the values in memory.

推荐答案

对于大型数组,向量化的 numpy 操作是最快的.如果你必须循环,最好使用 xrange/range 并避免使用 np.arange.

For large arrays, a vectorised numpy operation is the fastest. If you must loop, prefer xrange/range and avoid using np.arange.

在 numpy 中,您应该使用矢量化计算的组合,ufuncs索引 以解决您在 C 速度.与此相比,循环遍历 numpy 数组效率低下.

In numpy you should use combinations of vectorized calculations, ufuncs and indexing to solve your problems as it runs at C speed. Looping over numpy arrays is inefficient compared to this.

(你能做的最糟糕的事情就是用 rangenp.arange 创建的索引迭代数组作为你问题中的第一句话建议,但我不确定你是否真的是这个意思.)

(Something like the worst thing you could do would be to iterate over the array with an index created with range or np.arange as the first sentence in your question suggests, but I'm not sure if you really mean that.)

import numpy as np
import sys

sys.version
# out: '2.7.3rc2 (default, Mar 22 2012, 04:35:15) \n[GCC 4.6.3]'
np.version.version
# out: '1.6.2'

size = int(1E6)

%timeit for x in range(size): x ** 2
# out: 10 loops, best of 3: 136 ms per loop

%timeit for x in xrange(size): x ** 2
# out: 10 loops, best of 3: 88.9 ms per loop

# avoid this
%timeit for x in np.arange(size): x ** 2
#out: 1 loops, best of 3: 1.16 s per loop

# use this
%timeit np.arange(size) ** 2
#out: 100 loops, best of 3: 19.5 ms per loop

因此,在这种情况下,如果操作正确,numpy 比使用 xrange 快 4 倍.根据您的问题,numpy 可能比 4 倍或 5 倍的加速快得多.

So for this case numpy is 4 times faster than using xrange if you do it right. Depending on your problem numpy can be much faster than a 4 or 5 times speed up.

这个问题的答案解释了使用 numpy 数组代替 Python 列表处理大型数据集的更多优势.

The answers to this question explain some more advantages of using numpy arrays instead of python lists for large data sets.

这篇关于内置范围或 numpy.arange:哪个更有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆