Java堆空间:Hashmap、ArrayList [英] Java heap space: Hashmap, ArrayList

查看:32
本文介绍了Java堆空间:Hashmap、ArrayList的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想处理一个文本文件(大约 400 MB),以便根据每行给出的数据创建递归父子结构.必须为自上而下的导航准备数据(输入:父级,输出:所有子级和子级).例如.要读取的行数:(,id1,id2,,id3)

I would like to process a text file (about 400 MB) in order to create a recursive parent-child-structure from the data given in each line. The data have to be prepared for a top down navigation (input: parent, output: all children and sub children). E.g. of lines to be read: (child,id1,id2,parent,id3)

132142086;1;2;132528589;132528599
132142087;1;3;132528589;132528599
132142088;1;0;132528589;132528599
323442444;1;0;132142088;132528599
454345434;1;0;323442444;132528599

132142086;1;2;132528589;132528599
132142087;1;3;132528589;132528599
132142088;1;0;132528589;132528599
323442444;1;0;132142088;132528599
454345434;1;0;323442444;132528599

132528589:是 132142086,132142087,132142088 的父级
132142088:是 323442444
的父级323442444:是 454345434

132528589: is parent of 132142086,132142087,132142088
132142088: is parent of 323442444
323442444: is parent of 454345434

给定:操作系统 windows xp,32 位,2GB 可用内存和 -Xmx1024m这是我准备数据的方式:

Given: OS windows xp, 32bit, 2GB available Memory and -Xmx1024m Here is the way I prepare the data:

HashMap<String,ArrayList<String>> hMap=new HashMap<String,ArrayList<String>>();
  while ((myReader = bReader.readLine()) != null) 
          {
             String [] tmpObj=myReader.split(delimiter);
                   String valuesArrayS=tmpObj[0]+";"+tmpObj[1]+";"+tmpObj[2]+";"+tmpObj[3]+";"+tmpObj[4];
                        ArrayList<String> valuesArray=new ArrayList<String>();
                        //case of same key
                        if(hMap.containsKey(tmpObj[3]))
                            {
                            valuesArray=(ArrayList<String>)(hMap.get(tmpObj[3])).clone();
                            }

                        valuesArray.add(valuesArrayS);
                        hMap.put(tmpObj[3],valuesArray);
                        tmpObj=null;
                        valuesArray=null;
                        }

return hMap;

之后我使用递归函数:

After then I use a recursive function:

HashMap<String,ArrayList<String>> getChildren(input parent)

用于创建所需的数据结构.计划是使用 getChildren 函数让 hMap 可用于多个线程(只读).
我用 90 MB 的输入文件测试了这个程序,它似乎工作正常.但是,使用超过 380 MB 的真实文件运行它会导致:
线程main"中的异常 java.lang.OutOfMemoryError: Java heap space
我需要一些内存资源管理方面的帮助

for creating the data structure needed. The plan is to let the hMap available (read only) for more than one thread using the function getChildren.
I tested this program with an input file of 90 MB and it seemed to work properly. However, running it with the real file with more than 380 MB lead to:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I need some help in memory resource management

推荐答案

按照其他人的建议,检查一下增加你的记忆力.此外,您可以按照 Sbodd 和其他人的建议更好地将数据存储在表中.

Do check out increasing your memory, as suggested by others. Also, you can store your data within the table better as suggested by Sbodd and others.

但是,您可能会遇到内存碎片问题.哈希映射使用数组.大哈希映射使用大数组.您没有指定哈希图的大小,因此每次它决定需要更大时,它都会丢弃旧数组并分配一个新数组.一段时间后,您的内存将被丢弃的哈希表数组填满,即使您在技术上有足够的空闲内存,您也会收到 OutOfMemoryException.(您可能有 90% 的内存可用,但碎片太小而无法使用.)

However, you may be running afoul of memory fragmentation. Hash maps use arrays. Big hash maps use big arrays. You are not specifying the size of your hashmap, so every time it decides it needs to be bigger, it discards its old array and allocates a new one. After a while, your memory will fill up with discarded hash table arrays and you get an OutOfMemoryException even though you technically have plenty of free memory. (90% of your memory could be available, but in pieces too small to use.)

垃圾收集器 (GC) 将不断工作,将所有这些空闲位组合成足够大的块以供使用.如果您的程序运行得足够慢,您就不会遇到问题,但是您的程序正在全速运行并且 GC 将落后.如果 GC 不能足够快地组装一个足够大的空闲块,它将抛出异常;记忆存在这一事实并不能阻止它.(这意味着可以运行的程序将不会运行,但它可以防止 JVM 运行非常缓慢并且对用户来说看起来非常糟糕.)

The garbage collector (GC) will work continuously to combine all these free bits into blocks big enough to use. If your program ran slowly enough, you would not have a problem, but your program is running full tilt and the GC is going to get behind. The GC will throw the exception if it cannot assemble a free block big enough fast enough; the mere fact that the memory exists will not stop it. (This means that a program that could run won't, but it keeps the JVM from running real slow and looking real bad to users.)

鉴于您知道哈希映射必须有多大,我会预先设置大小.即使大小不正确,它也可以在不增加堆大小的情况下解决您的内存问题,并且肯定会使您的程序运行得更快(或与您的文件读取速度一样快--使用 big文件缓冲区).

Given that you know how big your hash map has to be, I'd set the size up front. Even if the size isn't precisely right, it may solve your memory problem without increasing the heap size and will definitely make your program run faster (or as fast as your file read lets it--use big file buffers).

如果您不知道您的桌子有多大,请使用 TreeMap.它有点慢,但不会分配巨大的数组,因此对 GC 更友好.我发现它们很多更灵活和有用.您甚至可以查看 ConcurrentSkipTreeMap,它比 TreeMap 慢,但允许您同时从多个线程添加、读取和删除.

If you have no real idea how big your table might be, use a TreeMap. It's a bit slower but does not allocate huge arrays and is hence a lot kinder to the GC. I find them a lot more flexible and useful. You might even look at the ConcurrentSkipTreeMap, which is slower than the TreeMap, but lets you add and read and delete from multiple threads simultaneously.

但你最好的选择是:

hMap = new HashMap<String,ArrayList<String>>( 10000000 );

这篇关于Java堆空间:Hashmap、ArrayList的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆