分配给numpy数组时避免np.where [英] Avoiding np.where when assigning to numpy array
问题描述
我希望以下(或类似内容)能够正常工作(不使用np.where
)
I would like for the following (or similar) to work (without using np.where
)
>>> A = np.arange(0,10)
>>> ind = np.logical_and(A>4, A%2)
>>> k = np.array([0,1,0],dtype=bool)
>>> A[ind][k] = np.pi # Doesn't actually assign to A
也就是说,我希望k
是对ind
值为true的附加布尔掩码.
That is, I want k
to be an additional boolean mask on the values of ind
that are true.
我知道我可以使用np.where(ind)[0][k]
,但这比逻辑索引更昂贵.
I know that I can use np.where(ind)[0][k]
, but this is more expensive than logical indexing.
有没有办法引用A[ind]
来引用A
的基本内存?
Is there a way to reference A[ind]
that will refer to the base memory of A
?
推荐答案
从经常引用的numpy索引页面:
From the oft-referenced numpy indexing page:
....单个布尔索引数组实际上与x [obj.nonzero()]相同....但是,当obj.shape == x.shape时,它会更快.
.... A single boolean index array is practically identical to x[obj.nonzero()] .... However, it is faster when obj.shape == x.shape.
np.where(cond)
是np.nonzero(cond)
.
但是让我们做一些简单的计时
But let's do some simple timing
In [239]: x = np.arange(10000)
In [240]: y = (x%2).astype(bool)
In [241]: x[y].shape
Out[241]: (5000,)
In [242]: idx = np.nonzero(y)
In [243]: x[idx].shape
Out[243]: (5000,)
In [244]: timeit x[y].shape
89.9 µs ± 726 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [245]: timeit x[idx].shape
13.3 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [246]: timeit x[np.nonzero(y)].shape
34.2 µs ± 893 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
因此,即使我们使用显式的where
,数组索引也比布尔索引快.
So array indexing is faster than boolean indexing, even when we use an explicit where
.
A[ind][k]=
不起作用,因为A[ind]
是副本而不是视图.
A[ind][k]=
does not work because A[ind]
is a copy, not a view.
In [251]: A = np.arange(100,110)
In [252]: ind = np.logical_and(A>104, A%2)
In [253]: ind
Out[253]:
array([False, False, False, False, False, True, False, True, False,
True])
In [254]: k = np.array([0,1,0], dtype=bool)
In [255]: A[ind]
Out[255]: array([105, 107, 109])
In [256]: A[ind][k]
Out[256]: array([107])
In [257]: A[ind][k] = 12
In [258]: A
Out[258]: array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
但是使用k
从np.where(ind)
中选择索引是可行的:
But using the k
to select indices from np.where(ind)
works:
In [262]: A[np.where(ind)[0][k]]=12
In [263]: A
Out[263]: array([100, 101, 102, 103, 104, 105, 106, 12, 108, 109])
获取而不是集合的时间:
Timings for a fetch rather than a set:
In [264]: timeit A[np.where(ind)[0][k]]
1.94 µs ± 75.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [265]: timeit A[ind][k]
1.34 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
所以是的,在这种情况下,双重掩膜要快一些,但是如果它不起作用并不重要.不要浪费时间来改善时间.
So yes, the double masking is a bit faster in this case, but that doesn't matter if it doesn't work. Don't sweat the small time improvements.
In [345]: ind1=ind.copy()
In [346]: ind1[ind] = k
In [348]: A[ind1]=3
In [349]: A
Out[349]: array([100, 101, 102, 103, 104, 105, 106, 3, 108, 109])
在这个小例子中,时间基本上与A[np.where(ind)[0][k]]=12
相同.
In this small example timeit is basically the same as for A[np.where(ind)[0][k]]=12
.
这篇关于分配给numpy数组时避免np.where的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!