可以按降序使用argsort吗? [英] Is it possible to use argsort in descending order?

查看:143
本文介绍了可以按降序使用argsort吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下代码:

avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]

这给了我n个最小元素的索引.是否可以按降序使用相同的argsort来获取n最高元素的索引?

This gives me indices of the n smallest elements. Is it possible to use this same argsort in descending order to get the indices of n highest elements?

推荐答案

如果对数组求反,则最低的元素将成为最高的元素,反之亦然.因此,n最高元素的索引为:

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n highest elements are:

(-avgDists).argsort()[:n]

对此的另一种解释方法,如

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n highest elements:

avgDists.argsort()[::-1][:n]

这两个方法的时间复杂度均为 O(n log n),因为argsort调用是此处的主要术语.但是第二种方法有一个很好的优点:用 O(1)切片替换了数组的 O(n)取反.如果在循环中使用小型数组,则避免这种求反可能会获得一些性能提升;如果使用大型数组,则可以节省内存使用量,因为这种否定会创建整个阵列的副本.

Both methods are O(n log n) in time complexity, because the argsort call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

请注意,这些方法并不总是会给出等效的结果:如果argsort请求了稳定的排序实现,例如通过传递关键字参数kind='mergesort',第一种策略将保留排序稳定性,而第二种策略将破坏稳定性(即,相等项目的位置将颠倒).

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind='mergesort', then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

计时示例:

Example timings:

使用100个浮点数和30个尾巴长度的小数组,查看方法快大约15%

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

对于较大的数组,argsort占主导地位,并且没有明显的时序差异

For larger arrays, the argsort is dominant and there is no significant timing difference

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

请注意, nedim的评论以下是不正确的.在反向操作之前还是在反向操作之后截断都不会影响效率,因为这两种操作都只是以不同的方式遍历数组的视图,而实际上并未复制数据.

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

这篇关于可以按降序使用argsort吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆