是否可以按降序使用 argsort? [英] Is it possible to use argsort in descending order?

查看:29
本文介绍了是否可以按降序使用 argsort?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下代码:

avgDists = np.array([1, 8, 6, 9, 4])ids = avgDists.argsort()[:n]

这给了我 n 个最小元素的索引.是否可以按降序使用相同的 argsort 来获取 n 最高元素的索引?

解决方案

如果对数组求反,则最低元素变为最高元素,反之亦然.因此,n 个最高元素的索引为:

(-avgDists).argsort()[:n]

另一种推理方式,如评论,是观察大元素在 argsort 中最后.因此,您可以从 argsort 的尾部读取以找到 n 最高的元素:

avgDists.argsort()[::-1][:n]

这两种方法的时间复杂度都是 O(n log n),因为 argsort 调用是这里的主要术语.但是第二种方法有一个很好的优点:它用 O(1) 切片替换了数组的 O(n) 否定.如果您在循环内使用小数组,那么您可能会通过避免这种否定而获得一些性能提升,如果您正在使用大型数组,那么您可以节省内存使用量,因为否定会创建整个数组的副本.

请注意,这些方法并不总是给出相同的结果:如果对 argsort 请求稳定的排序实现,例如通过传递关键字参数kind='mergesort',那么第一个策略将保持排序稳定性,但第二个策略会破坏稳定性(即相等项的位置将被颠倒).

时间示例:

使用一个包含 100 个浮点数和长度为 30 的尾部的小数组,视图方法快了大约 15%

<预><代码>>>>avgDists = np.random.rand(100)>>>n = 30>>>timeit (-avgDists).argsort()[:n]每个循环 1.93 µs ± 6.68 ns(7 次运行的平均值 ± 标准偏差,每次 1000000 次循环)>>>时间 avgDists.argsort()[::-1][:n]每个循环 1.64 µs ± 3.39 ns(7 次运行的平均值 ± 标准偏差,每次 1000000 次循环)>>>时间 avgDists.argsort()[-n:][::-1]每个循环 1.64 µs ± 3.66 ns(7 次运行的平均值 ± 标准偏差,每次 1000000 次循环)

对于较大的数组,argsort 占主导地位,没有显着的时间差异

<预><代码>>>>avgDists = np.random.rand(1000)>>>n = 300>>>timeit (-avgDists).argsort()[:n]每个循环 21.9 µs ± 51.2 ns(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)>>>时间 avgDists.argsort()[::-1][:n]每个循环 21.7 µs ± 33.3 ns(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)>>>时间 avgDists.argsort()[-n:][::-1]每个循环 21.9 µs ± 37.1 ns(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)

请注意来自nedim的评论 下面是不正确的.在反转之前或之后进行截断对效率没有影响,因为这两种操作只是以不同的方式跨过数组的视图,而不是实际复制数据.

Consider the following code:

avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]

This gives me indices of the n smallest elements. Is it possible to use this same argsort in descending order to get the indices of n highest elements?

解决方案

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n highest elements are:

(-avgDists).argsort()[:n]

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n highest elements:

avgDists.argsort()[::-1][:n]

Both methods are O(n log n) in time complexity, because the argsort call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind='mergesort', then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

Example timings:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For larger arrays, the argsort is dominant and there is no significant timing difference

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

这篇关于是否可以按降序使用 argsort?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆