从长度为 N 的数组中返回前 k 个值的最佳算法 [英] Optimal algorithm for returning top k values from an array of length N

查看:25
本文介绍了从长度为 N 的数组中返回前 k 个值的最佳算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 n 个浮点数的数组,我希望返回前 k 个(在我的情况下 n ~ 100, k ~ 10)

I have an array of n floats, and I wish to return the top k (in my case n ~ 100, k ~ 10)

这个问题是否有已知的最优解路径?

Is there a known optimal solution path for this problem?

谁能提供一个C算法?

实际上这里有两个问题:排序和未排序.我对 unsorted 感兴趣,应该更快!

actually there are two problems here: sorted and unsorted. I am interested in unsorted, which should be faster!

推荐答案

方法一

由于k很小,你可以用锦标赛的方法找到第k大的.这种方法在 Knuth 的编程艺术,第 3 卷,第 212 页中有所描述.

Since k is small, you can use the tournament method to find the kth largest. This method is described in Knuth's Art of Programming, Volume 3, Page 212.

首先在 n-k+2 个元素上创建一个锦标赛.类似于淘汰赛网球锦标赛.首先,您分成两对并比较对中的成员(就好像这两个人打了一场比赛,一个输了一样).然后获胜者,你再次分成对,依此类推,直到你有一个获胜者.您可以将其视为一棵树,获胜者在顶部.

First create a tournament on n-k+2 elements. Something like a knockout tennis tournament. First you split into pairs and compare the members of the pairs (as if those two played a match and one lost). Then the winners, you split in to pairs again and so on, till you have a winner. You can view it as a tree, with the winner at the top.

这需要 n-k+1 次准确比较.

This takes n-k+1 compares exactly.

现在这些 n-k+2 的赢家不能是你的第 k 大元素.考虑它在锦标赛中的路径 P.

Now the winner of these n-k+2 cannot be your kth largest element. Consider its path P up the tournament.

现在从剩余的 k-2 中选择一个,然后沿着路径 P 继续前进,它将为您提供一个新的最大值.基本上,您可以用 k-2 元素之一替换之前的获胜者来重做锦标赛.让 P 成为新赢家的路径.现在从 k-3 中选择另一个,然后沿着新路径向上走.

Of the remaining k-2 now pick one, and follow that up the path P which will give you a new largest. Basically you sort of redo the tournament with the previous winner being replaced by one of the k-2 elements. Let P be the path of the new winner. Now pick another from the k-3 and follow the new path up and so on.

最后用完 k-2 后,将最大的替换为 -infinity,锦标赛的最大将是第 k 大.你扔掉的元素是前k-1个元素.

At the end after you exhaust the k-2, replace the largest with -infinity and the largest of the tournament will be the kth largest. The elements you have thrown away are the top k-1 elements.

这最多需要 n - k + (k-1) [log (n-k+2)] 次比较才能找到前 k.虽然它使用 O(n) 内存.

This takes at most n - k + (k-1) [log (n-k+2)] comparisons to find the top k. It uses O(n) memory though.

就比较次数而言,这可能胜过任何选择算法.

In terms of number of compares this should likely beat any selection algorithms.

方法二

作为替代方案,您可以维护一个包含 k 个元素的最小堆.

As an alternative, you can maintain a min-heap of k elements.

先插入k个元素.然后对于数组的每个元素,如果小于堆的最小元素,就扔掉.否则,删除堆的最小并从数组中插入元素.

First insert k elements. Then for each element of array, if it is less than the min element of the heap, throw it away. Otherwise, delete-min of the heap and insert the element from the array.

最后,堆将包含前 k 个元素.这将需要 O(n log k) 比较.

At the end, the heap will contain the top k elements. This will take O(n log k) compares.

当然,如果n很小,只要对数组进行排序应该就够了.代码也会更简单.

Of course, if n is small, just sorting the array should be good enough. The code will be simpler too.

这篇关于从长度为 N 的数组中返回前 k 个值的最佳算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆