查找大文件副本 [英] Find duplicates in large file

查看：119 发布时间：2015/11/30 16:34:18 algorithm data-structures

本文介绍了查找大文件副本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有真正的大文件约15万个条目。文件中的每一行包含一个字符串（称为键）。

I have really large file with approximately 15 million entries. Each line in the file contains a single string (call it key).

我需要找到使用Java文件中的重复的条目。我试图用一个HashMap和检测重复的条目。显然，这种做法是扔我一个java.lang.OutOfMemoryError：Java堆空间的错误。

I need to find the duplicate entries in the file using java. I tried to use a hashmap and detect duplicate entries. Apparently that approach is throwing me a "java.lang.OutOfMemoryError: Java heap space" error.

我该如何解决这个问题呢？

How can I solve this problem?

我想我可以增加堆空间和尝试，但我想知道是否有更好的有效的解决方案，而无需调整的堆空间。

I think I could increase the heap space and try it, but I wanted to know if there are better efficient solutions without having to tweak the heap space.

查找大文件副本 [英] Find duplicates in large file

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

查找大文件副本 [英] Find duplicates in large file

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭