有效地重新排列NumPy数组 [英] Rearrange NumPy Array Efficiently

查看:200
本文介绍了有效地重新排列NumPy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个简单的一维NumPy数组:

Let's say I have a simple 1D NumPy array:

x = np.random.rand(1000)

然后检索排序的索引:

idx = np.argsort(x)

但是,我需要将索引列表移到 idx 的前面。因此,假设 indices = [10、20、30、40、50] 必须始终是前5个,然后其余部分将从开始idx (减去在索引中找到的索引)

However, I need to move a list of indices to the front of idx. So, let's say indices = [10, 20, 30, 40, 50] need to always be the first 5 and then the rest will follow from idx (minus the indices found in indices)

一种简单的方法将会是:

A naive way to accomplish this would be:

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(idx.shape[0], dtype=int64)
out[:indices.shape[0]] = indices

n = indices.shape[0]
for i in range(idx.shape[0]):
    if idx[i] not in indices:
        out[n] = idx[i] 
        n += 1

有没有一种方法可以有效地并且可能就地执行此操作?

Is there a way to do this efficiently and, possibly, in-place?

推荐答案

方法1

一种方法是使用 np.isin 掩码-

mask = np.isin(idx, indices, invert=True)
out = np.r_[indices, idx[mask]]

方法2:跳过第一个 argsort

另一个使这些给定的索引最小,从而迫使它们以 argsorting 。我们不需要在此方法中使用 idx ,因为无论如何我们在解决方案中都是argsorting-

Another with making those given indices minimum, thus forcing them to be at the start with argsorting. We don't need idx for this method as we are argsort-ing in our solution anyway -

def argsort_constrained(x, indices):
    xc = x.copy()
    xc[indices] = x.min()-np.arange(len(indices),0,-1)
    return xc.argsort()

基准化-更紧密

让我们研究一下如何跳过启动 argsort <$ c的整个过程$ c> idx 帮助我们采用第二种方法。

Let's study how does this entire thing of skipping the computation of starting argsort idx helps us with the second approach.

我们将从给定的示例开始:

We will start off with the given sample :

In [206]: x = np.random.rand(1000)

In [207]: indices = np.array([10, 20, 30, 40, 50])

In [208]: %timeit argsort_constrained(x, indices)
38.6 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [209]: idx = np.argsort(x)

In [211]: %timeit np.argsort(x)
27.7 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [212]: %timeit in1d_masking(x, idx, indices)
     ...: %timeit isin_masking(x, idx, indices)
44.4 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
50.7 µs ± 303 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

请注意,如果使用 np.concatenate 用这些小型数据集代替 np.r _ ,您可以做得更好。

Note that if you use np.concatenate in place of np.r_ with these small datasets, you could do better.

因此, argsort_constrained 的总运行时成本为大约 38.6 µs ,而其他两个具有屏蔽功能的时钟在各自的定时编号之上大约为 27.7 µs

So, argsort_constrained has a total runtime cost of around 38.6 µs, whereas the other two with masking have around 27.7 µs on top of their individual timing numbers.

让我们将所有内容按 10x 放大,并进行相同的实验:

Let's scale up everything by 10x and do the same experiments :

In [213]: x = np.random.rand(10000)

In [214]: indices = np.sort(np.random.choice(len(x), 50, replace=False))

In [215]: %timeit argsort_constrained(x, indices)
740 µs ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [216]: idx = np.argsort(x)

In [217]: %timeit np.argsort(x)
731 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [218]: %timeit in1d_masking(x, idx, indices)
     ...: %timeit isin_masking(x, idx, indices)
1.07 ms ± 47.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.02 ms ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用掩蔽标记的运行时成本要高于 argsort_constrained 。随着我们进一步扩大,这种趋势应该继续下去。

Again, the individual runtime costs with masking ones are higher than with argsort_constrained. And this trend should continue as we scale up further.

这篇关于有效地重新排列NumPy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆