查找大量数字的中位数太大而无法容纳到内存中 [英] Finding median of large set of numbers too big to fit into memory
本文介绍了查找大量数字的中位数太大而无法容纳到内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
最近在一次采访中有人问我这个问题。
I was asked this question in an interview recently.
有N个数字,太多了,无法容纳。它们分为k个数据库表(未排序),每个表都可以放入内存。找出所有数字的中位数。
There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of which can fit into memory. Find the median of all the numbers.
不确定这个答案。
推荐答案
有一些潜在的解决方案:
There's a few potential solutions:
- 外部合并排序-O(n log n)
您基本上在第一遍上对数字进行排序,然后在第二遍上找到中位数。 - 订单统计分布式选择算法-O(n)
将问题简化为在未排序的数组中找到第k个数字的原始问题。 - 计数排序直方图O(n)
您必须假设一些关于数字范围的属性-该范围是否适合内存? - 如果知道数字的分布情况,可以生成其他
算法。
- External merge sort - O(n log n)
You basically sort the numbers on the first pass, then find the median on the second. - Order statistics distributed selection algorithm - O(n)
Simplify the problem to the original problem of finding the kth number in an unsorted array. - Counting sort histogram O(n)
You have to assume some properties about the range of the numbers - can the range fit in the memory? - If anything is known about the distribution of the numbers other algorithms can be produced.
更多信息和实现请参见:
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html
For more details and implementation see:
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html
这篇关于查找大量数字的中位数太大而无法容纳到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文