无法理解numpy argpartition输出 [英] Cannot understand numpy argpartition output

查看:425
本文介绍了无法理解numpy argpartition输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用numpy中的arpgpartition,但似乎出了点问题,我似乎无法弄清楚.这是正在发生的事情:

I am trying to use arpgpartition from numpy, but it seems there is something going wrong and I cannot seem to figure it out. Here is what's happening:

这些是排序数组norms

np.sort(norms)[:5]
array([ 53.64759445,  54.91434479,  60.11617279,  64.09630585,  64.75318909], dtype=float32)

但是当我使用indices_sorted = np.argpartition(norms, 5)[:5]

norms[indices_sorted]
array([ 60.11617279,  64.09630585,  53.64759445,  54.91434479,  64.75318909], dtype=float32)

当我认为我应该得到与排序数组相同的结果吗?

When I think I should get the same result as the sorted array?

当我使用3作为参数indices_sorted = np.argpartition(norms, 3)[:3]

norms[indices_sorted]
array([ 53.64759445,  54.91434479,  60.11617279], dtype=float32)

这对我来说意义不大,希望有人可以提供一些见解?

This isn't making much sense to me, hoping someone can offer some insight?

将这个问题改写为argpartition是否保留k个分区元素的顺序更有意义.

Rephrasing this question as whether argpartition preserves order of the k partitioned elements makes more sense.

推荐答案

我们需要使用按排序顺序保留的索引列表,而不是将第k个参数作为标量.因此,要保持第一个5元素(而不是np.argpartition(a,5)[:5])的排序性质,只需执行-

We need to use list of indices that are to be kept in sorted order instead of feeding the kth param as a scalar. Thus, to maintain the sorted nature across the first 5 elements, instead of np.argpartition(a,5)[:5], simply do -

np.argpartition(a,range(5))[:5]

这里是一个使情况更清晰的示例-

Here's a sample run to make things clear -

In [84]: a = np.random.rand(10)

In [85]: a
Out[85]: 
array([ 0.85017222,  0.19406266,  0.7879974 ,  0.40444978,  0.46057793,
        0.51428578,  0.03419694,  0.47708   ,  0.73924536,  0.14437159])

In [86]: a[np.argpartition(a,5)[:5]]
Out[86]: array([ 0.19406266,  0.14437159,  0.03419694,  0.40444978,  0.46057793])

In [87]: a[np.argpartition(a,range(5))[:5]]
Out[87]: array([ 0.03419694,  0.14437159,  0.19406266,  0.40444978,  0.46057793])

请注意,argpartition在性能方面很有意义,如果我们希望获取元素的一小部分的排序索引,则可以说k elems的数量,它占elem总数的一小部分.

Please note that argpartition makes sense on performance aspect, if we are looking to get sorted indices for a small subset of elements, let's say k number of elems which is a small fraction of the total number of elems.

让我们使用更大的数据集,并尝试获取所有元素的排序索引,以使上述要点更明确-

Let's use a bigger dataset and try to get sorted indices for all elems to make the above mentioned point clear -

In [51]: a = np.random.rand(10000)*100

In [52]: %timeit np.argpartition(a,range(a.size-1))[:5]
10 loops, best of 3: 105 ms per loop

In [53]: %timeit a.argsort()
1000 loops, best of 3: 893 µs per loop

因此,要对所有元素进行排序,np.argpartition并非可行之路.

Thus, to sort all elems, np.argpartition isn't the way to go.

现在,假设我想仅获取具有该大数据集的前5个元素的排序索引,并且还保留这些元素的顺序-

Now, let's say I want to get sorted indices for only the first 5 elems with that big dataset and also keep the order for those -

In [68]: a = np.random.rand(10000)*100

In [69]: np.argpartition(a,range(5))[:5]
Out[69]: array([1647,  942, 2167, 1371, 2571])

In [70]: a.argsort()[:5]
Out[70]: array([1647,  942, 2167, 1371, 2571])

In [71]: %timeit np.argpartition(a,range(5))[:5]
10000 loops, best of 3: 112 µs per loop

In [72]: %timeit a.argsort()[:5]
1000 loops, best of 3: 888 µs per loop

在这里非常有用!

这篇关于无法理解numpy argpartition输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆