数据排序比RAM大小 [英] Sorting Data larger than the RAM size

查看:110
本文介绍了数据排序比RAM大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个谷歌面试问题: 由于2台机器,每个都具有64 GB的RAM,包含所有整数(8字节),排序整个128 GB的数据。你可以采取少量的额外的RAM。该扩展存储在1000台机器的数据进行排序。

This is a Google interview question: Given 2 machines, each having 64 GB RAM, containing all integers (8 byte), sort the entire 128 GB data. You may assume a small amount of additional RAM. Extend this to sort data stored in 1000 machines.

我凸轮与外部排序,也就是我们把整个数据分成块,并使用归并在他们身上,这是第一次排序的块,并把它们放回去,并让他们重新拼凑明智和合并。有没有更好的办法?什么是复杂性?

I cam up with external sort, that is we divide the entire data into chunks and use mergesort on them, that is first sort the chunks and put them back and get them again piece wise and merge them. is there a better way ? What would be the complexity ?

推荐答案

ChingPing提出了一个为O(n log n)的排序为每个子集,后跟一个线性合并(通过交换的元素)。与快速排序(和问题大部分在n为log N种,是他们需要N个内存。我建议,而不是使用的 SmoothSort 它采用恒定的内存,仍然为O(N log n)的。

ChingPing proposes a O(n log n) sort for each subset, followed by a linear merge (by swapping the elements). The problem with Quicksort (and most of the n log n sorts, is that they require n memory. I'd recommend instead using a SmoothSort which uses constant memory, still runs in O(n log n).

最糟糕的情况是,你有这样的:

The worst case scenario is where you have something like:

setA = [maxInt .. 1]
setB = [0..minInt]

,其中这两组是有序的相反,但随后的合并是在相反的顺序

where both sets are ordered in reverse, but then the merger is in the reverse order.

本 - 对ChingPing的解决方案(IMO更清晰)的解释是:

The (IMO - more clear) explanation of ChingPing's solution is:

Have a pointers 'pointerA', 'pointerB' initialized at the beginning of each array
While setA's pointer is not at the end
  if (setA[pointerA] < setB[pointerB])
    then { pointerA++; }
    else { swap(setA[pointerA], setB[pointerB]); pointerB++; }

该套都应该现在进行排序。

The sets should both now be sorted.

这篇关于数据排序比RAM大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆