查找第K人数最多的单次而不存储整个数组 [英] Find Kth largest number in single pass without storing the whole array

查看:104
本文介绍了查找第K人数最多的单次而不存储整个数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该算法中我已经记

  • 在保持规模的MaxHeap氏/ LI>
  • 插入每个元素
  • 退出较小的值,如果堆是满
  • 在结束时,第K max是MaxHeap
  • 的小
  • keep an MaxHeap of size K
  • insert each element
  • drop out smaller value if heap is full
  • In the end, Kth max is the smaller of MaxHeap

这会给我O(NlogK)。有没有更好的算法?我不能做快速选择,因为数组不能被存储在内存中。

That will give me O(NlogK). Is there a better algorithm? I can't do quick selection, because the array can't be stored in memory.

推荐答案

根据你的内存限制,你可以用中位数的中位数算法的修改版本,以解决O(n)时间和O的问题( k)的空间。

Depending on your memory restrictions, you can use a modified version of the median-of-medians algorithm to solve the problem in O(n) time and O(k) space.

的思想如下。保持大小2K的内存中的数组。填充这个缓冲器与来自阵列的前2k元件,然后在其上运行中位数的位数的算法把最大k个元素中的第一个阵列的一半,并在第二半个的最小k个元素。然后,丢弃最小的k个元素。现在,加载下k个元素到阵列的第二半部分,使用中位数的位数的算法,以再次把在左侧顶部k个元素,右边底部k个元素。如果遍历这个过程在阵列 - 替换缓冲区的后半部分与从阵列下一k个元素,然后使用中值 - 的中位数的那些的前k个移动到左半 - 然后在结束你会具有在所述阵列的左半顶端k个元素。寻找最小的那些(在O(k)的时间),然后给你的第k个最大的元素。

The idea is as follows. Maintain an array of size 2k in memory. Fill this buffer with the first 2k elements from the array, then run the median-of-medians algorithm on it to put the largest k elements in the first half of the array and the smallest k elements in the second half. Then, discard the smallest k elements. Now, load the next k elements into the second half of the array, use the median-of-medians algorithm to again put the top k elements in the left side and the bottom k elements on the right. If you iterate this process across the array - replacing the second half of the buffer with the next k elements from the array, then using median-of-medians to move the top k of those to the left half - then at the end you will have the top k elements in the left half of the array. Finding the smallest of those (in O(k) time) will then give you the kth largest element.

总之,你最终使O(N / K)调用中位数的中位数的算法的大小O(K)的阵列,这是O(n / k)的调用一个算法,需要O( k)的时间,为O(n)的净运行时间。这一点,再加上最后一步,运行在O(N + K)= O(n)的时间。此外,由于中值,中位数的步骤内存使用量为O(K),因为你有大小O(K)躺在附近的缓冲区,这个仅使用O(k)的内存。换句话说,它是渐近比最小堆溶液更快,在存储器渐近等效

Overall, you end up making O(n / k) calls to the median-of-medians algorithm with an array of size O(k), which is O(n / k) calls to an algorithm that takes O(k) time, for a net runtime of O(n). This, combined with the final step, runs in O(n + k) = O(n) time. Moreover, since the memory usage for the median-of-medians step is O(k), and since you have a buffer of size O(k) lying around, this uses only O(k) memory. In other words, it's asymptotically faster than the min-heap solution and asymptotically equivalent in memory.

希望这有助于!

这篇关于查找第K人数最多的单次而不存储整个数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆