为什么numpy.zeros和numpy.zeros_like之间的性能差异? [英] Why the performance difference between numpy.zeros and numpy.zeros_like?

查看:542
本文介绍了为什么numpy.zeros和numpy.zeros_like之间的性能差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我终于在我的代码中发现了性能瓶颈,但是对于原因是很困惑的.为了解决这个问题,我将所有对numpy.zeros_like的调用都改为使用numpy.zeros.但是为什么zeros_like这么慢?

I finally found a performance bottleneck in my code but am confused as to what the reason is. To solve it I changed all my calls of numpy.zeros_like to instead use numpy.zeros. But why is zeros_like sooooo much slower?

例如(在zeros通话中记下e-05):

For example (note e-05 on the zeros call):

>>> timeit.timeit('np.zeros((12488, 7588, 3), np.uint8)', 'import numpy as np', number = 10)
5.2928924560546875e-05
>>> timeit.timeit('np.zeros_like(x)', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10)
1.4402990341186523

但是奇怪的是,使用zeros创建的数组写入要比使用zeros_like创建的数组慢得多:

But then strangely writing to an array created with zeros is noticeably slower than an array created with zeros_like:

>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10)
0.4310588836669922
>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros_like(np.zeros((12488, 7588, 3), np.uint8))', number = 10)
0.33325695991516113

我的猜测是zeros正在使用某些CPU技巧,而不是实际写入内存来分配它.写入时可以即时完成.但这仍然不能解释数组创建时间的巨大差异.

My guess is zeros is using some CPU trick and not actually writing to the memory to allocate it. This is done on the fly when it's written to. But that still doesn't explain the massive discrepancy in array creation times.

我正在使用当前的numpy版本运行Mac OS X Yosemite:

I'm running Mac OS X Yosemite with the current numpy version:

>>> numpy.__version__
'1.9.1'

推荐答案

我在Ipython中的计时是(使用更简单的timeit接口):

My timings in Ipython are (with a simplier timeit interface):

In [57]: timeit np.zeros_like(x)
1 loops, best of 3: 420 ms per loop

In [58]: timeit np.zeros((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 15.1 µs per loop

当我使用IPython(np.zeros_like??)查看代码时,会看到:

When I look at the code with IPython (np.zeros_like??) I see:

res = empty_like(a, dtype=dtype, order=order, subok=subok)
multiarray.copyto(res, 0, casting='unsafe')

np.zeros是黑盒-纯编译代码.

while np.zeros is a blackbox - pure compiled code.

empty的时间是:

In [63]: timeit np.empty_like(x)
100000 loops, best of 3: 13.6 µs per loop

In [64]: timeit np.empty((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 14.9 µs per loop

所以zeros_like中的额外时间就是在copy中.

So the extra time in zeros_like is in that copy.

在我的测试中,分配时间(x[]=1)的差异可以忽略不计.

In my tests, the difference in assignment times (x[]=1) is negligible.

我的猜测是zerosonesempty都是早期编译的作品.为方便起见,添加了empty_like,只是从其输入中绘制形状和类型信息.编写zeros_like的目的不是为了提高速度,而是为了简化程序维护(重用empty_like).

My guess is that zeros, ones, empty are all early compiled creations. empty_like was added as a convenience, just drawing shape and type info from its input. zeros_like was written with more of an eye toward easy programming maintenance (reusing empty_like) than for speed.

np.onesnp.full也使用np.empty ... copyto序列,并显示相似的时序.

np.ones and np.full also use the np.empty ... copyto sequence, and show similar timings.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/array_assign_scalar.c 似乎是将标量(例如0)复制到数组的文件.我看不到memset的用法.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/array_assign_scalar.c appears to be file that copies a scalar (such as 0) to an array. I don't see a use of memset.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/alloc.c 调用了malloccalloc.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c -zerosempty的源.两者都调用PyArray_NewFromDescr_int,但是其中一个最终使用npy_alloc_cache_zero,另一个使用npy_alloc_cache.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c - source for zeros and empty. Both call PyArray_NewFromDescr_int, but one ends up using npy_alloc_cache_zero and the other npy_alloc_cache.

npy_alloc_cache调用alloc. npy_alloc_cache_zero调用npy_alloc_cache,后跟memset. alloc.c中的代码进一步与THREAD选项相混淆.

npy_alloc_cache in alloc.c calls alloc. npy_alloc_cache_zero calls npy_alloc_cache followed by a memset. Code in alloc.c is further confused with a THREAD option.

有关calloc v malloc+memset差异的更多信息: 为什么malloc + memset比calloc慢?

More on the calloc v malloc+memset difference at: Why malloc+memset is slower than calloc?

但是对于缓存和垃圾回收,我想知道calloc/memset区别是否适用.

But with caching and garbage collection, I wonder whether the calloc/memset distinction applies.

使用memory_profile软件包进行的此简单测试支持以下说法:zerosempty即时"分配内存,而zeros_like预先分配所有内容:

This simple test with the memory_profile package supports the claim that zeros and empty allocate memory 'on-the-fly', while zeros_like allocates everything up front:

N = (1000, 1000) 
M = (slice(None, 500, None), slice(500, None, None))

Line #    Mem usage    Increment   Line Contents
================================================
     2   17.699 MiB    0.000 MiB   @profile
     3                             def test1(N, M):
     4   17.699 MiB    0.000 MiB       print(N, M)
     5   17.699 MiB    0.000 MiB       x = np.zeros(N)   # no memory jump
     6   17.699 MiB    0.000 MiB       y = np.empty(N)
     7   25.230 MiB    7.531 MiB       z = np.zeros_like(x) # initial jump
     8   29.098 MiB    3.867 MiB       x[M] = 1     # jump on usage
     9   32.965 MiB    3.867 MiB       y[M] = 1
    10   32.965 MiB    0.000 MiB       z[M] = 1
    11   32.965 MiB    0.000 MiB       return x,y,z

这篇关于为什么numpy.zeros和numpy.zeros_like之间的性能差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆