为什么numpy.zeros和numpy.zeros_like之间的性能差异? [英] Why the performance difference between numpy.zeros and numpy.zeros_like?
问题描述
我终于在我的代码中发现了性能瓶颈,但是对于原因是很困惑的.为了解决这个问题,我将所有对numpy.zeros_like
的调用都改为使用numpy.zeros
.但是为什么zeros_like
这么慢?
I finally found a performance bottleneck in my code but am confused as to what the reason is. To solve it I changed all my calls of numpy.zeros_like
to instead use numpy.zeros
. But why is zeros_like
sooooo much slower?
例如(在zeros
通话中记下e-05
):
For example (note e-05
on the zeros
call):
>>> timeit.timeit('np.zeros((12488, 7588, 3), np.uint8)', 'import numpy as np', number = 10)
5.2928924560546875e-05
>>> timeit.timeit('np.zeros_like(x)', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10)
1.4402990341186523
但是奇怪的是,使用zeros
创建的数组写入要比使用zeros_like
创建的数组慢得多:
But then strangely writing to an array created with zeros
is noticeably slower than an array created with zeros_like
:
>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10)
0.4310588836669922
>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros_like(np.zeros((12488, 7588, 3), np.uint8))', number = 10)
0.33325695991516113
我的猜测是zeros
正在使用某些CPU技巧,而不是实际写入内存来分配它.写入时可以即时完成.但这仍然不能解释数组创建时间的巨大差异.
My guess is zeros
is using some CPU trick and not actually writing to the memory to allocate it. This is done on the fly when it's written to. But that still doesn't explain the massive discrepancy in array creation times.
我正在使用当前的numpy版本运行Mac OS X Yosemite:
I'm running Mac OS X Yosemite with the current numpy version:
>>> numpy.__version__
'1.9.1'
推荐答案
我在Ipython中的计时是(使用更简单的timeit接口):
My timings in Ipython are (with a simplier timeit interface):
In [57]: timeit np.zeros_like(x)
1 loops, best of 3: 420 ms per loop
In [58]: timeit np.zeros((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 15.1 µs per loop
当我使用IPython(np.zeros_like??
)查看代码时,会看到:
When I look at the code with IPython (np.zeros_like??
) I see:
res = empty_like(a, dtype=dtype, order=order, subok=subok)
multiarray.copyto(res, 0, casting='unsafe')
而np.zeros
是黑盒-纯编译代码.
while np.zeros
is a blackbox - pure compiled code.
empty
的时间是:
In [63]: timeit np.empty_like(x)
100000 loops, best of 3: 13.6 µs per loop
In [64]: timeit np.empty((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 14.9 µs per loop
所以zeros_like
中的额外时间就是在copy
中.
So the extra time in zeros_like
is in that copy
.
在我的测试中,分配时间(x[]=1
)的差异可以忽略不计.
In my tests, the difference in assignment times (x[]=1
) is negligible.
我的猜测是zeros
,ones
,empty
都是早期编译的作品.为方便起见,添加了empty_like
,只是从其输入中绘制形状和类型信息.编写zeros_like
的目的不是为了提高速度,而是为了简化程序维护(重用empty_like
).
My guess is that zeros
, ones
, empty
are all early compiled creations. empty_like
was added as a convenience, just drawing shape and type info from its input. zeros_like
was written with more of an eye toward easy programming maintenance (reusing empty_like
) than for speed.
np.ones
和np.full
也使用np.empty ... copyto
序列,并显示相似的时序.
np.ones
and np.full
also use the np.empty ... copyto
sequence, and show similar timings.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/array_assign_scalar.c
似乎是将标量(例如0
)复制到数组的文件.我看不到memset
的用法.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/array_assign_scalar.c
appears to be file that copies a scalar (such as 0
) to an array. I don't see a use of memset
.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/alloc.c 调用了malloc
和calloc
.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c -zeros
和empty
的源.两者都调用PyArray_NewFromDescr_int
,但是其中一个最终使用npy_alloc_cache_zero
,另一个使用npy_alloc_cache
.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c - source for zeros
and empty
. Both call PyArray_NewFromDescr_int
, but one ends up using npy_alloc_cache_zero
and the other npy_alloc_cache
.
npy_alloc_cache
调用alloc
. npy_alloc_cache_zero
调用npy_alloc_cache
,后跟memset
. alloc.c
中的代码进一步与THREAD选项相混淆.
npy_alloc_cache
in alloc.c
calls alloc
. npy_alloc_cache_zero
calls npy_alloc_cache
followed by a memset
. Code in alloc.c
is further confused with a THREAD option.
有关calloc
v malloc+memset
差异的更多信息:
为什么malloc + memset比calloc慢?
More on the calloc
v malloc+memset
difference at:
Why malloc+memset is slower than calloc?
但是对于缓存和垃圾回收,我想知道calloc/memset
区别是否适用.
But with caching and garbage collection, I wonder whether the calloc/memset
distinction applies.
使用memory_profile
软件包进行的此简单测试支持以下说法:zeros
和empty
即时"分配内存,而zeros_like
预先分配所有内容:
This simple test with the memory_profile
package supports the claim that zeros
and empty
allocate memory 'on-the-fly', while zeros_like
allocates everything up front:
N = (1000, 1000)
M = (slice(None, 500, None), slice(500, None, None))
Line # Mem usage Increment Line Contents
================================================
2 17.699 MiB 0.000 MiB @profile
3 def test1(N, M):
4 17.699 MiB 0.000 MiB print(N, M)
5 17.699 MiB 0.000 MiB x = np.zeros(N) # no memory jump
6 17.699 MiB 0.000 MiB y = np.empty(N)
7 25.230 MiB 7.531 MiB z = np.zeros_like(x) # initial jump
8 29.098 MiB 3.867 MiB x[M] = 1 # jump on usage
9 32.965 MiB 3.867 MiB y[M] = 1
10 32.965 MiB 0.000 MiB z[M] = 1
11 32.965 MiB 0.000 MiB return x,y,z
这篇关于为什么numpy.zeros和numpy.zeros_like之间的性能差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!