有效地重新排列NumPy数组 [英] Rearrange NumPy Array Efficiently
问题描述
假设我有一个简单的一维NumPy数组:
Let's say I have a simple 1D NumPy array:
x = np.random.rand(1000)
然后检索排序的索引:
idx = np.argsort(x)
但是,我需要将索引列表移到 idx
的前面。因此,假设 indices = [10、20、30、40、50]
必须始终是前5个,然后其余部分将从开始idx
(减去在索引
中找到的索引)
However, I need to move a list of indices to the front of idx
. So, let's say indices = [10, 20, 30, 40, 50]
need to always be the first 5 and then the rest will follow from idx
(minus the indices found in indices
)
一种简单的方法将会是:
A naive way to accomplish this would be:
indices = np.array([10, 20, 30, 40, 50])
out = np.empty(idx.shape[0], dtype=int64)
out[:indices.shape[0]] = indices
n = indices.shape[0]
for i in range(idx.shape[0]):
if idx[i] not in indices:
out[n] = idx[i]
n += 1
有没有一种方法可以有效地并且可能就地执行此操作?
Is there a way to do this efficiently and, possibly, in-place?
推荐答案
方法1
一种方法是使用 np.isin
掩码-
mask = np.isin(idx, indices, invert=True)
out = np.r_[indices, idx[mask]]
方法2:跳过第一个 argsort
另一个使这些给定的索引最小,从而迫使它们以 argsorting
。我们不需要在此方法中使用 idx
,因为无论如何我们在解决方案中都是argsorting-
Another with making those given indices minimum, thus forcing them to be at the start with argsorting
. We don't need idx
for this method as we are argsort-ing in our solution anyway -
def argsort_constrained(x, indices):
xc = x.copy()
xc[indices] = x.min()-np.arange(len(indices),0,-1)
return xc.argsort()
基准化-更紧密
让我们研究一下如何跳过启动 argsort
<$ c的整个过程$ c> idx 帮助我们采用第二种方法。
Let's study how does this entire thing of skipping the computation of starting argsort
idx
helps us with the second approach.
我们将从给定的示例开始:
We will start off with the given sample :
In [206]: x = np.random.rand(1000)
In [207]: indices = np.array([10, 20, 30, 40, 50])
In [208]: %timeit argsort_constrained(x, indices)
38.6 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [209]: idx = np.argsort(x)
In [211]: %timeit np.argsort(x)
27.7 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [212]: %timeit in1d_masking(x, idx, indices)
...: %timeit isin_masking(x, idx, indices)
44.4 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
50.7 µs ± 303 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
请注意,如果使用 np.concatenate
用这些小型数据集代替 np.r _
,您可以做得更好。
Note that if you use np.concatenate
in place of np.r_
with these small datasets, you could do better.
因此, argsort_constrained
的总运行时成本为大约 38.6 µs
,而其他两个具有屏蔽功能的时钟在各自的定时编号之上大约为 27.7 µs
。
So, argsort_constrained
has a total runtime cost of around 38.6 µs
, whereas the other two with masking have around 27.7 µs
on top of their individual timing numbers.
让我们将所有内容按 10x
放大,并进行相同的实验:
Let's scale up everything by 10x
and do the same experiments :
In [213]: x = np.random.rand(10000)
In [214]: indices = np.sort(np.random.choice(len(x), 50, replace=False))
In [215]: %timeit argsort_constrained(x, indices)
740 µs ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [216]: idx = np.argsort(x)
In [217]: %timeit np.argsort(x)
731 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [218]: %timeit in1d_masking(x, idx, indices)
...: %timeit isin_masking(x, idx, indices)
1.07 ms ± 47.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.02 ms ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
使用掩蔽标记的运行时成本要高于 argsort_constrained
。随着我们进一步扩大,这种趋势应该继续下去。
Again, the individual runtime costs with masking ones are higher than with argsort_constrained
. And this trend should continue as we scale up further.
这篇关于有效地重新排列NumPy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!