在java中对巨大的file.txt进行排序 [英] sorting lines of an enormous file.txt in java

查看:113
本文介绍了在java中对巨大的file.txt进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个非常大的文本文件(755Mb)。
我需要对行进行排序(大约1890000),然后将它们写回另一个文件。

I'm working with a very big text file (755Mb). I need to sort the lines (about 1890000) and then write them back in another file.

我已经注意到有一个起始文件的讨论非常相似我的:
根据单词中的单词对行进行排序作为键

I already noticed that discussion that has a starting file really similar to mine: Sorting Lines Based on words in them as keys

问题是我无法将行存储在内存中的集合中,因为我得到了Java堆空间异常(即使我将它扩展为最大)..(已经尝试过了!)

The problem is that i cannot store the lines in a collection in memory because I get a Java Heap Space Exception (even if i expanded it at maximum)..(already tried!)

我无法用excel打开它并使用排序功能,因为文件太大而且无法完全加载..

I can't either open it with excel and use the sorting feature because the file is too large and it cannot be completely loaded..

我考虑过使用数据库..但我认为编写所有行然后使用SELECT查询它在执行时间方面太长了...我错了?

I thought about using a DB ..but i think that writing all the lines then use the SELECT query it's too much long in terms of time executing..am I wrong?

任何提示赞赏
提前致谢

Any hints appreciated Thanks in advance

推荐答案

我认为这里的解决方案是做一个使用临时文件合并排序:

I think the solution here is to do a merge sort using temporary files:


  1. 读取第一个文件的第一行 n 行( n 是您可以在内存中存储和排序的行数,对它们进行排序,并将它们写入文件 1.tmp (或者你叫它)。对下一个 n 行执行相同操作,并将其存储在 2.tmp 中。重复,直到处理完原始文件的所有行。

  1. Read the first n lines of the first file, (n being the number of lines you can afford to store and sort in memory), sort them, and write them to file 1.tmp (or however you call it). Do the same with the next n lines and store it in 2.tmp. Repeat until all lines of the original file has been processed.

读取每个临时文件的第一行。确定最小的一个(根据您的排序顺序),将其写入目标文件,并从相应的临时文件中读取下一行。重复直到所有行都已处理完毕。

Read the first line of each temporary file. Determine the smallest one (according to your sort order), write it to the destination file, and read the next line from the corresponding temporary file. Repeat until all lines have been processed.

删除所有临时文件。

这适用于任意大文件,只要你有足够的磁盘空间。

This works with arbitrary large files, as long as you have enough disk space.

这篇关于在java中对巨大的file.txt进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆