巨大的LinkedList导致GC开销限制,是否有另一种解决方案? [英] Huge LinkedList is causing GC overhead limit, is there another solution?

查看:464
本文介绍了巨大的LinkedList导致GC开销限制,是否有另一种解决方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是我的代码:

  public void mapTrace(String Path)抛出FileNotFoundException,IOException {
FileReader arq =新FileReader(新文件(路径));
BufferedReader leitor = new BufferedReader(arq,41943040);
整数页;
String std;
整数位置= 0; ((std = leitor.readLine())!= null){
position ++;

while
page = Integer.parseInt(std,16);
LinkedList<整数> values = map.get(page);
if(values == null){
values = new LinkedList<>();
map.put(page,values);
}
values.add(position); (LinkedList< Integer> referenceList:map.values()){
Collections.reverse(referenceList);
}


}

}

这是HashMap结构 p>

 映射< Integer,LinkedList< Integer>> map = new HashMap<>(); 

对于50mb-100mb跟踪文件,我没有任何问题,但是对于更大的文件,我有:

 线程中的异常AWT-EventQueue-0java.lang.OutOfMemoryError:GC开销限制超过
code>

我不知道反向方法是否会增加内存使用量,如果LinkedList使用的空间比其他List结构多或如果我将这个列表添加到地图的方式会占用更多的空间。有人可以告诉我什么是使用这么多空间吗?

解决方案


有人可以告诉我什么是使用这么大的空间?

简短的回答是,它可能是您选择的数据结构的空间开销,它使用空间。


  1. 据我推测,一个 LinkedList< Integer> 64位JVM使用列表中每个整数约48个字节的存储空间,包括整数本身。

  2. 通过我的推算,一个 Map<在一台64位机器上,将用于每个条目48个字节的存储空间,不包括需要表示键和值对象的空间。 / p>

现在,您的跟踪规模估算值对我来说太过模糊了,不过我希望1.5Gb跟踪文件需要大于2Gb的堆。




给定数字您已经提供了一个合理的经验法则,即跟踪文件将占用其堆内存中文件大小的大约10倍......使用您当前使用的数据结构。

您不想配置JVM尝试使用比可用物理RAM更多的内存。否则,您有责任推动机器发生颠簸......操作系统可能会开始查杀流程。所以对于一个8Gb的机器,我不会建议通过-Xmx8g。



把它放在一起,使用8Gb的机器,你应该能够处理600Mb的跟踪文件(假设我的估计是正确的),但1.5Gb跟踪文件是不可行的。如果你确实需要处理那些很大的跟踪文件,我的建议是:

您的特定用例更有效地使用内存,
  • 重新考虑您的算法,以便您不需要将整个跟踪文件保存在内存中, / b>



  • $
  • 获得更大的机器。 b
    $ b


    在阅读您的评论前,我做了一些测试,我把-Xmx14g和处理600mb文件,花了几分钟(约10),但它没有问题。

    -Xmx14g 选项设置最大堆大小。根据观察到的行为,我预计JVM不需要像那么多内存......并且不需要从操作系统请求它。如果您在任务管理器中查看了内存使用情况,我希望您看到的数字与此一致。

    然后我放-Xmx18g并试图处理1,5GB文件,并已运行约20分钟。我在任务管理器中的记忆从7,80到7,90。我想知道这是否会完成,我如何使用比我更多的记忆?它是否将HD用作虚拟内存?


    是的,它就是这样。



    是的,您的进程虚拟地址空间的每个页面都对应于硬盘上的页面。

    如果您的虚拟页面数量超过物理内存页面,那么在任何给定时间,这些虚拟内存页面中的一些将仅存在于磁盘上。当您的应用程序尝试使用其中一个非常驻页面时,虚拟机硬件会生成一个中断,操作系统会找到一个未使用的页面并从光盘副本中填充该页面,然后再将控制权交还给您的程序。但是如果你的应用程序很忙,那么它将不得不通过驱逐另一个页面来制作该物理内存页面。这可能涉及将被驱逐页面的内容写入光盘。结果是,当您尝试使用显着更多的虚拟地址页面而不是物理内存时,该应用程序会产生大量的中断,导致大量的磁盘读写操作。这就是所谓的颠簸。如果您的系统宕机太严重,系统将花费大部分时间等待光盘读取和写入操作,性能将大幅下降。在某些操作系统上,操作系统会试图通过杀死进程来修复问题。


    here is my code:

     public void mapTrace(String Path) throws FileNotFoundException, IOException {
        FileReader arq = new FileReader(new File(Path));
        BufferedReader leitor = new BufferedReader(arq, 41943040);
        Integer page;
        String std;
        Integer position = 0;
    
        while ((std = leitor.readLine()) != null) {
            position++;
            page = Integer.parseInt(std, 16);
            LinkedList<Integer> values = map.get(page);
            if (values == null) {
                values = new LinkedList<>();
                map.put(page, values);
            }
            values.add(position);
        }
    
        for (LinkedList<Integer> referenceList : map.values()) { 
            Collections.reverse(referenceList); 
        }
    
    }
    

    This is the HashMap structure

           Map<Integer, LinkedList<Integer>> map = new HashMap<>();
    

    For 50mb - 100mb trace files i don't have any problem, but for bigger files i have:

    Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
    

    I don't know if the reverse method is increasing the memory use, if the LinkedList is using more space than other List structure or if the way i'm adding the list to the map is taking more space than it should. Does anyone can tell me what's using so much space?

    解决方案

    Does anyone can tell me what's using so much space?

    The short answer is that it is probably the space overheads of the data structure you have chosen that is using the space.

    1. By my reckoning, a LinkedList<Integer> on a 64 bit JVM uses about 48 bytes of storage per integer in the list including the integers themselves.

    2. By my reckoning, a Map<?, ?> on a 64 bit machine will use in the region of 48 bytes of storage per entry excluding the space need to represent the key and the value objects.

    Now, your trace size estimates are rather too vague for me to plug the numbers in, but I'd expect a 1.5Gb trace file to need a LOT more than 2Gb of heap.


    Given the numbers you've provided, a reasonable rule-of-thumb is that a trace file will occupy roughly 10 times its file size in heap memory ... using the data structure that you are currently using.

    You don't want to configure a JVM to try to use more memory than the physical RAM available. Otherwise, you are liable to push the machine into thrashing ... and the operating system is liable to start killing processes. So for an 8Gb machine, I wouldn't advise going over -Xmx8g.

    Putting that together, with an 8Gb machine you should be able to cope with a 600Mb trace file (assuming my estimates are correct), but a 1.5Gb trace file is not feasible. If you really need to handle trace files that big, my advice would be to either:

    • design and implement custom collection types for your specific use-case that use memory more efficiently,

    • rethink your algorithms so that you don't need to hold the entire trace files in memory, or

    • get a bigger machine.


    I did some tests before reading your comment, i put -Xmx14g and processed the 600mb file, it took some minutes(about 10) but it did fine.

    The -Xmx14g option sets the maximum heap size. Based on the observed behaviour, I expect that the JVM didn't need anywhere like that much memory ... and didn't request it from the OS. And if you'd looked at memory usage in the task manager, I expect you'd have seen numbers consistent with that.

    Then i put -Xmx18g and tried to process the 1,5gb file, and its been running for about 20 minutes. My memory in the task manager is going from 7,80 to 7,90. I wonder if this will finish, how could i use MORE memory than i have? Does it use the HD as virtual memory?

    Yes that it is what it does.

    Yes, each page of your processes virtual address space corresponds to a page on the hard disc.

    If you've got more virtual pages than physical memory pages, at any given time some of those virtual memory pages will live on disk only. When your application tries to use a one of those non-resident pages, the VM hardware generates an interrupt, and the operating system finds an unused page and populates it from the disc copy and then hands control back to your program. But if your application is busy, then it will have had to make that physical memory page by evicting another page. And that may have involved writing the contents of the evicted page to disc.

    The net result is that when you try to use significantly more virtual address pages than you have physical memory, the application generates lots of interrupts that result in lots of disc reads and writes. This is known as thrashing. If your system thrashes too badly, the system will spend most of its waiting for disc reads and writes to finish, and performance will drop dramatically. And on some operating systems, the OS will attempt to "fix" the problem by killing processes.

    这篇关于巨大的LinkedList导致GC开销限制,是否有另一种解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆