如何排序非常大的文件 [英] How do I sort very large files
问题描述
我有一些文件应该按照每一行开始处的id进行排序。
这些文件大约是2-3 GB。
我试图将所有数据读入 ArrayList
并对它们进行排序。但记忆力还不足以让它们全部存在。它不起作用。
行看起来像
0052304 0000004000000000000000000000000000000041 John Teddy 000023
0022024 0000004000000000000000000000000000000041 George Clan 00013
如何对文件进行排序
这不是一个Java问题。您需要研究一种有效的算法来排序未完全读入内存的数据。对Merge-Sort进行一些修改可以实现这一点。
看看这个:
http://en.wikipedia.org/wiki/Merge_sort
和:
< a href =http://en.wikipedia.org/wiki/External_sorting =noreferrer> http://en.wikipedia.org/wiki/External_sorting
基本上,这里的想法是将文件分解成较小的部分,对它们进行排序(使用合并排序或其他方法),然后使用合并排序合并创建新的排序文件。 p>
I have some files that should be sorted according to id at the beginning of each line. The files are about 2-3 gb.
I tried to read all data into an ArrayList
and sort them. But memory is not enough to keep them all. It does not work.
Lines look like
0052304 0000004000000000000000000000000000000041 John Teddy 000023
0022024 0000004000000000000000000000000000000041 George Clan 00013
How can I sort the files??
That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.
Take a look at this: http://en.wikipedia.org/wiki/Merge_sort
and: http://en.wikipedia.org/wiki/External_sorting
Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.
这篇关于如何排序非常大的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!