Java的堆空间:是HashMap,ArrayList的 [英] Java heap space: Hashmap, ArrayList

查看:170
本文介绍了Java的堆空间:是HashMap,ArrayList的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以创建递归从每一行中给出的数据父子结构来处理文本文件(约400 MB)。该数据必须ppared一个自上而下的导航$ P $(输入:父母,输出:所有的儿童和子儿童)。例如。线的要被读取:
孩子,ID1,ID2,,ID3)

I would like to process a text file (about 400 MB) in order to create a recursive parent-child-structure from the data given in each line. The data have to be prepared for a top down navigation (input: parent, output: all children and sub children). E.g. of lines to be read: (child,id1,id2,parent,id3)

132142086 ; 1; 2; 132528589 ; 132528599

132142087 ; 1; 3; 132528589 ; 132528599

132142088 ; 1; 0; 132528589 ; 132528599

323442444 ; 1; 0; 132142088 ; 132528599

454345434 ; 1; 0; 323442444 ; 132528599

132142086;1;2;132528589;132528599
132142087;1;3;132528589;132528599
132142088;1;0;132528589;132528599
323442444;1;0;132142088;132528599
454345434;1;0;323442444;132528599

132528589:是132142086,132142087,132142088
132142088:是323442444
323442444:是454345434

132528589: is parent of 132142086,132142087,132142088
132142088: is parent of 323442444
323442444: is parent of 454345434

假设:操作系统Windows XP,32位,2GB可用内存和-Xmx1024m
这里是我prepare数据的方式:

Given: OS windows xp, 32bit, 2GB available Memory and -Xmx1024m Here is the way I prepare the data:

HashMap<String,ArrayList<String>> hMap=new HashMap<String,ArrayList<String>>();
  while ((myReader = bReader.readLine()) != null) 
          {
             String [] tmpObj=myReader.split(delimiter);
                   String valuesArrayS=tmpObj[0]+";"+tmpObj[1]+";"+tmpObj[2]+";"+tmpObj[3]+";"+tmpObj[4];
                        ArrayList<String> valuesArray=new ArrayList<String>();
                        //case of same key
                        if(hMap.containsKey(tmpObj[3]))
                            {
                            valuesArray=(ArrayList<String>)(hMap.get(tmpObj[3])).clone();
                            }

                        valuesArray.add(valuesArrayS);
                        hMap.put(tmpObj[3],valuesArray);
                        tmpObj=null;
                        valuesArray=null;
                        }

return hMap;

然后后,我用一个递归函数:

After then I use a recursive function:

HashMap<String,ArrayList<String>> getChildren(input parent)

,用于创建所需要的数据结构。我们的计划是让HMAP可用(只读)使用功能的getChildren多个线程。

我测试这个方案以90 MB输入文件,它似乎正常工作。然而,有超过380 MB真正的文件运行它导致:

异常线程mainjava.lang.OutOfMemoryError:Java堆空间

我需要在内存中的资源管理有所帮助

推荐答案

不要退房增加你的记忆,别人的建议。此外,您还可以通过Sbodd和其他人的建议表中更好地存储数据。

Do check out increasing your memory, as suggested by others. Also, you can store your data within the table better as suggested by Sbodd and others.

但是,您可能会触犯内存碎片的运行。哈希映射使用数组。大哈希映射用大数组。你是不是说明您的HashMap的大小,所以每次它决定它需要更大的时间,它丢弃旧数组,并分配一个新的。一段时间后,你的记忆将填补丢弃的哈希表的数组,你会得到一个OutOfMemoryException即使你在技术上有足够的可用内存。 (90%的内存可能是可用的,但在片太小,无法使用。)

However, you may be running afoul of memory fragmentation. Hash maps use arrays. Big hash maps use big arrays. You are not specifying the size of your hashmap, so every time it decides it needs to be bigger, it discards its old array and allocates a new one. After a while, your memory will fill up with discarded hash table arrays and you get an OutOfMemoryException even though you technically have plenty of free memory. (90% of your memory could be available, but in pieces too small to use.)

在垃圾收集器(GC)会连续工作到所有这些免费的位组合成足够大,使用块。如果你的程序运行起来慢慢就好了,你会不会有问题,但你的程序正在运行全速和GC是会得到后面。 GC将抛出异常,如果它不能组装空闲块足够大足够快;这一事实,该内存中也不会停止。 (这意味着程序能的运行不会,但它运行很慢并期待真正的坏用户保持JVM)。

The garbage collector (GC) will work continuously to combine all these free bits into blocks big enough to use. If your program ran slowly enough, you would not have a problem, but your program is running full tilt and the GC is going to get behind. The GC will throw the exception if it cannot assemble a free block big enough fast enough; the mere fact that the memory exists will not stop it. (This means that a program that could run won't, but it keeps the JVM from running real slow and looking real bad to users.)

既然你知道你的哈希地图有多大是,我将大小设定前面。即使大小不是precisely正确的,它可以解决你的记忆问题,而无需增加堆大小,绝对会让你的程序运行速度更快(或者随着你的文件的读取让它 - 使用的的文件缓冲区)。

Given that you know how big your hash map has to be, I'd set the size up front. Even if the size isn't precisely right, it may solve your memory problem without increasing the heap size and will definitely make your program run faster (or as fast as your file read lets it--use big file buffers).

如果你有没有真正的知道你的表可能有多大,使用TreeMap的。这是一个有点慢,但是不分配巨大的阵列,因此是金德很多的GC。我觉得他们的很多的更加灵活和有用的。你甚至可以看ConcurrentSkipTreeMap,这是比TreeMap的慢,但允许您添加和读取,并同时从多个线程中删除。

If you have no real idea how big your table might be, use a TreeMap. It's a bit slower but does not allocate huge arrays and is hence a lot kinder to the GC. I find them a lot more flexible and useful. You might even look at the ConcurrentSkipTreeMap, which is slower than the TreeMap, but lets you add and read and delete from multiple threads simultaneously.

不过,最好的办法是这样的:

But your best bet is something like:

hMap = new HashMap<String,ArrayList<String>>( 10000000 );

这篇关于Java的堆空间:是HashMap,ArrayList的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆