为什么np.compress比布尔索引更快? [英] Why is np.compress faster than boolean indexing?
问题描述
np.compress
在内部做什么,使其比布尔索引更快?
What is np.compress
doing internally that makes it faster than boolean indexing?
在此示例中,compress
快20%,但是节省的时间取决于a
的大小和布尔数组b
中True
值的数量,但是在我的机器上
In this example, compress
is ~20% faster, but the time savings varies on the size of a
and the number of True
values in the boolean array b
, but on my machine compress
is always faster.
import numpy as np
a = np.random.rand(1000000,4)
b = (a[:,0]>0.5)
%timeit a[b]
#>>> 10 loops, best of 3: 24.7 ms per loop
%timeit a.compress(b, axis=0)
#>>> 10 loops, best of 3: 20 ms per loop
返回的是数据的副本,而不是一个带有切片的视图
What is returned is a copy of the data, not a view as one gets with slices 相比之下,压缩文档说 沿给定轴返回数组的选定切片." Return selected slices of an array along given axis". 但是,使用此处提供的方法来确定两个数组是否共享相同的数据缓冲区表明,这两个方法均未与其父级 However, using the method provided here for determining whether two arrays share the same data buffer shows that neither method shares data with its parent 我只是好奇,因为我在工作中经常执行这些操作.我运行的是通过Anaconda安装的python 3.5.2,numpy v 1.11.1. I am simply curious because I perform these operations frequently in my work. I run python 3.5.2, numpy v 1.11.1, installed via Anaconda. 在 对于示例数组, With your sample arrays, 实际上,如果我在[]索引中使用此索引数组,我得到的时间可比: In fact if I use this index array in the [] indexing I get comparable times: []索引会转换为 [] indexing gets translated into a 这篇关于为什么np.compress比布尔索引更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
a
共享数据,这意味着两个方法均未返回实际切片.
a
, which I take to mean neither method returns an actual slice.def get_data_base(arr):
base = arr
while isinstance(base.base, np.ndarray):
base = base.base
return base
def arrays_share_data(x, y):
return get_data_base(x) is get_data_base(y)
arrays_share_data(a, a.compress(b, axis=0))
#>>> False
arrays_share_data(a, a[b])
#>>> False
推荐答案
numpy
github
上找到的/numpy/core/src/multiarray/item_selection.c
PyArray_Compress(PyArrayObject *self, PyObject *condition, int axis,
PyArrayObject *out)
# various checks
res = PyArray_Nonzero(cond);
ret = PyArray_TakeFrom(self, PyTuple_GET_ITEM(res, 0), axis,
out, NPY_RAISE);
compress
与执行where
以获得索引数组相同,然后执行take
:compress
is the same as doing where
to get a index array, and then take
:In [135]: a.shape
Out[135]: (1000000, 4)
In [136]: b.shape
Out[136]: (1000000,)
In [137]: a.compress(b, axis=0).shape
Out[137]: (499780, 4)
In [138]: a.take(np.nonzero(b)[0], axis=0).shape
Out[138]: (499780, 4)
In [139]: timeit a.compress(b, axis=0).shape
100 loops, best of 3: 14.3 ms per loop
In [140]: timeit a.take(np.nonzero(b)[0], axis=0).shape
100 loops, best of 3: 14.3 ms per loop
In [141]: idx=np.where(b)[0]
In [142]: idx.shape
Out[142]: (499780,)
In [143]: timeit a[idx,:].shape
100 loops, best of 3: 14.6 ms per loop
In [144]: timeit np.take(a,idx, axis=0).shape
100 loops, best of 3: 9.9 ms per loop
np.take
代码更复杂,因为它包含clip
和wrap
模式.np.take
code is more involved since it includes clip
and wrap
modes.__getitem__
调用,并且会经过不同的层.我没有追踪到代码的差异很大,但是我认为可以肯定地说compress
(或更确切地说take
)只是采用了一条更直接的路线来执行任务,因此速度得到了适度的提高. 30-50%的速度差异表明编译后的代码细节上的差异,而不是诸如views
和copies
之类的主要内容,也不是解释与编译的主要内容.__getitem__
call, and through various layers. I haven't traced that code vary far, but I think it's safe to say that compress
(or rather take
) just takes a more direct route to the task, and thus gets a modest speed increase. A speed difference of 30-50% suggests differences in compiled code details, not something major like views
vs copies
, or interpreted vs compiled.