将大文本文件读取到HashMap - 堆溢出 [英] Read big text file to HashMap - heap overflow

查看：285 发布时间：2018/6/4 13:50:14 java hashmap text-files heap-memory

本文介绍了将大文本文件读取到HashMap - 堆溢出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从一个文本文件中获取数据到一个HashMap中。
文本文件格式如下：

它有700万行......（大小：700MB）

所以我做的是：
我读了每一行，然后我把这些字段变成绿色，并将它们连接成一个字符串
，这就是HashMap键。每当我读取一行时，我必须检查HashMap，如果已经有这样一个关键字
的条目，那么这个值就是红色的fild。

如果是这样，我只是更新值与红色的总和值;
如果不是，则会向HashMap添加一个新条目。

我试着用70.000行的文本文件来做这件事，它效果很好。

但是现在有了700万行文本文件，我得到了一个java堆空间的问题，如图中所示：

这是由于HashMap？
是否可以优化我的算法？

解决方案

您应该增加堆空间

  -Xms< size>设置初始Java堆大小
 -Xmx< size>设置最大Java堆大小
 
 java -Xms1024m -Xmx2048m

从Java代码到Java堆

 表3. HashMap的属性
默认容量16项
空白大小128字节
开销64字节加36字节条目
 10K集合的开销〜360K 
搜索/插入/删除性能O（1） - 无论元素的数量如何（假设没有散列冲突），所花费的时间都是恒定时间

如果您考虑使用 7百万以上的表格开销 246 MB 所以你的最小堆大小必须大于 1000 MB

I'm trying to get the data from a text file into a HashMap. The text-file has the following format:

it has something like 7 million lines... (size: 700MB)

So what I do is: I read each line, then I take the fields in green and concatenate them into a string which will the HashMap key. The Value will be the fild in red.

everytime I read a line I have to check in the HashMap if there is already an entry with such key, if so, I just update the value summing the value with the red; If not, a new entry is added to the HashMap.

I tried this with text-files with 70.000 lines, and it works quite well.

But now with the 7 Million line text-file I get a "java heap space" issue, like in the image:

Is this due to the HashMap ? Is it possible to optimize my algorithm ?
解决方案
You should increase your heap space
-Xms<size> set initial Java heap size -Xmx<size> set maximum Java heap size java -Xms1024m -Xmx2048m
A nice read From Java code to Java heap
Table 3. Attributes of a HashMap Default capacity 16 entries Empty size 128 bytes Overhead 64 bytes plus 36 bytes per entry Overhead for a 10K collection ~ 360K Search/insert/delete performance O(1) — Time taken is constant time, regardless of the number of elements (assuming no hash collisions)
If you consider above table overhead for 7 Million records come to around 246 MB so your minimum heap size must be around 1000 MB

这篇关于将大文本文件读取到HashMap - 堆溢出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将大文本文件读取到HashMap - 堆溢出 [英] Read big text file to HashMap - heap overflow

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

将大文本文件读取到HashMap - 堆溢出 [英] Read big text file to HashMap - heap overflow

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭