将大文本文件读取到HashMap - 堆溢出 [英] Read big text file to HashMap - heap overflow
问题描述
我试图从一个文本文件中获取数据到一个HashMap中。
文本文件格式如下:
它有700万行......(大小:700MB)
所以我做的是:
我读了每一行,然后我把这些字段变成绿色,并将它们连接成一个字符串
,这就是HashMap键。每当我读取一行时,我必须检查HashMap,如果已经有这样一个关键字
的条目,那么这个值就是红色的fild。
如果是这样,我只是更新值与红色的总和值;
如果不是,则会向HashMap添加一个新条目。
我试着用70.000行的文本文件来做这件事,它效果很好。
但是现在有了700万行文本文件,我得到了一个java堆空间的问题,如图中所示:
这是由于HashMap?
是否可以优化我的算法?
您应该增加堆空间
-Xms< size>设置初始Java堆大小
-Xmx< size>设置最大Java堆大小
java -Xms1024m -Xmx2048m
表3. HashMap的属性
默认容量16项
空白大小128字节
开销64字节加36字节条目
10K集合的开销〜360K
搜索/插入/删除性能O(1) - 无论元素的数量如何(假设没有散列冲突),所花费的时间都是恒定时间
如果您考虑使用 7百万以上的表格开销
246 MB
所以你的最小堆大小必须大于 1000 MB
I'm trying to get the data from a text file into a HashMap. The text-file has the following format:
it has something like 7 million lines... (size: 700MB)
So what I do is: I read each line, then I take the fields in green and concatenate them into a string which will the HashMap key. The Value will be the fild in red.
everytime I read a line I have to check in the HashMap if there is already an entry with such key, if so, I just update the value summing the value with the red; If not, a new entry is added to the HashMap.
I tried this with text-files with 70.000 lines, and it works quite well.
But now with the 7 Million line text-file I get a "java heap space" issue, like in the image:
Is this due to the HashMap ? Is it possible to optimize my algorithm ?
You should increase your heap space
-Xms<size> set initial Java heap size
-Xmx<size> set maximum Java heap size
java -Xms1024m -Xmx2048m
A nice read From Java code to Java heap
Table 3. Attributes of a HashMap
Default capacity 16 entries
Empty size 128 bytes
Overhead 64 bytes plus 36 bytes per entry
Overhead for a 10K collection ~ 360K
Search/insert/delete performance O(1) — Time taken is constant time, regardless of the number of elements (assuming no hash collisions)
If you consider above table overhead for 7 Million
records come to around 246 MB
so your minimum heap size must be around 1000 MB
这篇关于将大文本文件读取到HashMap - 堆溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!