如何排序100GB的字符串 [英] How to sort 100GB worth of strings

查看:108
本文介绍了如何排序100GB的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于120GB的硬盘驱动器,其中100个填充了长度为256和2 GB的字符串,我如何最有效地对Java中的字符串进行排序?
需要多长时间?

Given a harddrive with 120GB, 100 of which are filled with the strings of length 256 and 2 GB Ram how do I sort those strings in Java most efficiently? How long will it take?

推荐答案

我基本上重复 Krystian的答案,但详细说明:

I am basically repeating Krystian's answer, but elaborating:

是的,您需要或多或少地执行此操作,因为您可用的RAM很少。但是天真的就地排序将是一场灾难,仅仅是因为移动字符串的成本。

Yes you need to do this more-or-less in place, since you have little RAM available. But naive in-place sorts would be a disaster here just due to the cost of moving strings around.

而不是实际移动字符串,只需跟踪哪些字符串应该与其他人交换并实际移动它们,最后一次到达最终位置。也就是说,如果您有1000个字符串,请创建一个1000个整数的数组。 array [i]是字符串i应该结束的位置。如果array [17] == 133在结尾,则意味着字符串17应该在字符串133的最后位置.array [i] == i表示所有i开始。然后交换字符串只是交换两个整数的问题。

Rather than actually move strings around, just keep track of which strings should swap with which others and actually move them, once, at the end, to their final spot. That is, if you had 1000 strings, make an array of 1000 ints. array[i] is the location where string i should end up. If array[17] == 133 at the end, it means string 17 should end up in the spot for string 133. array[i] == i for all i to start. Swapping strings, then, is just a matter of swapping two ints.

然后,像quicksort这样的任何就地算法都能很好地工作。

Then, any in-place algorithm like quicksort works pretty well.

运行时间肯定取决于琴弦的最终移动。假设每一个都移动,你就会以合理大小的写入移动大约100GB的数据。我可能会认为驱动器/控制器/操作系统可以为您移动大约100MB /秒。那么,1000秒左右? 20分钟?

The running time is surely dominated by the final move of the strings. Assuming each one moves, you're moving around about 100GB of data in reasonably-sized writes. I might assume the drive / controller / OS can move about 100MB/sec for you. So, 1000 seconds or so? 20 minutes?

但是它适合记忆吗?你有100GB的字符串,每个字符串是256个字节。多少串? 100 * 2 ^ 30/2 ^ 8,或约419M字符串。您需要419M整数,每个是4个字节,或大约1.7GB。瞧,适合你的2GB。

But does it fit in memory? You have 100GB of strings, each of which is 256 bytes. How many strings? 100 * 2^30 / 2^8, or about 419M strings. You need 419M ints, each is 4 bytes, or about 1.7GB. Voila, fits in your 2GB.

这篇关于如何排序100GB的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆