内置范围或 numpy.arange:哪个更有效? [英] built-in range or numpy.arange: which is more efficient?
问题描述
在使用范围表达式迭代大型数组时,我应该使用 Python 的内置范围函数还是 numpy 的 arange
以获得最佳性能?
When iterating over a large array with a range expression, should I use Python's built-in range function, or numpy's arange
to get the best performance?
到目前为止我的推理:
arange
可能会采用本机实现,因此可能会更快.另一方面,arange
返回一个完整的数组,它占用内存,因此可能会有开销.Python 3 的范围表达式是一个生成器,它不保存内存中的所有值.
arange
probably resorts to a native implementation and might be faster therefore. On the other hand, arange
returns a full array, which occupies memory, so there might be an overhead. Python 3's range expression is a generator, which does not hold all the values in memory.
推荐答案
对于大型数组,向量化的 numpy 操作是最快的.如果你必须循环,最好使用 xrange
/range
并避免使用 np.arange
.
For large arrays, a vectorised numpy operation is the fastest. If you must loop, prefer xrange
/range
and avoid using np.arange
.
在 numpy 中,您应该使用矢量化计算的组合,ufuncs和 索引 以解决您在 C
速度.与此相比,循环遍历 numpy 数组效率低下.
In numpy you should use combinations of vectorized calculations, ufuncs and indexing to solve your problems as it runs at C
speed.
Looping over numpy arrays is inefficient compared to this.
(你能做的最糟糕的事情就是用 range
或 np.arange
创建的索引迭代数组作为你问题中的第一句话建议,但我不确定你是否真的是这个意思.)
(Something like the worst thing you could do would be to iterate over the array with an index created with range
or np.arange
as the first sentence in your question suggests, but I'm not sure if you really mean that.)
import numpy as np
import sys
sys.version
# out: '2.7.3rc2 (default, Mar 22 2012, 04:35:15) \n[GCC 4.6.3]'
np.version.version
# out: '1.6.2'
size = int(1E6)
%timeit for x in range(size): x ** 2
# out: 10 loops, best of 3: 136 ms per loop
%timeit for x in xrange(size): x ** 2
# out: 10 loops, best of 3: 88.9 ms per loop
# avoid this
%timeit for x in np.arange(size): x ** 2
#out: 1 loops, best of 3: 1.16 s per loop
# use this
%timeit np.arange(size) ** 2
#out: 100 loops, best of 3: 19.5 ms per loop
因此,在这种情况下,如果操作正确,numpy 比使用 xrange
快 4 倍.根据您的问题,numpy 可能比 4 倍或 5 倍的加速快得多.
So for this case numpy is 4 times faster than using xrange
if you do it right. Depending on your problem numpy can be much faster than a 4 or 5 times speed up.
这个问题的答案解释了使用 numpy 数组代替 Python 列表处理大型数据集的更多优势.
The answers to this question explain some more advantages of using numpy arrays instead of python lists for large data sets.
这篇关于内置范围或 numpy.arange:哪个更有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!