Multimap空间问题:番石榴 [英] Multimap Space Issue: Guava

查看:110
本文介绍了Multimap空间问题:番石榴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的Java代码中,我使用Guava的Multimap( com.google.common.collect.Multimap ):

  Multimap< Integer ,整数> Index = HashMultimap.create()

这里,Multimap键是URL的一部分,值是另一个部分URL(转换为整数)。现在,我分配我的JVM 2560 Mb(2.5 GB)堆空间(使用Xmx和Xms)。但是,它只能存储9百万个这样的(键,值)整数对(约1000万)。但是,理论上(根据 int 占用的内存)它应该存储更多。



任何人都可以帮助我, / p>


  1. 为什么 Multimap 使用大量内存?我检查了我的代码,并且没有将对插入到 Multimap 中,它只使用了1/2 MB的内存。
    2.

是否有解决这个内存问题的另一种方法或家庭解决方案?意味着,是否有任何方法来减少这些对象开销只存储int-int?在任何其他语言?或者任何其他解决方案(首选)解决我面临的问题,意味着基于数据库或类似的解决方案。 解决方案

Multimap 相关的开销很大。至少:


  • 每个键和值都是 Integer 对象,它(至少)将每个 int 值的存储需求加倍。
  • HashMultimap中的每个唯一键值与一个集合值关联(根据来源集合 Hashset )。
  • 每个 Hashset 都使用默认空间创建对于8个值。


    因此,每个键/值对都需要(至少)比您预期的要多一个数量级的空间对于两个 int 值。 (当多个值存储在一个键下时会少一些)。我预计1000万个键/值对可能需要400MB。



    虽然你有2.5GB的堆空间,如果这还不够,我不会感到惊讶。我认为上述估计偏低。另外,它只是说明一旦建立地图后需要存储多少地图。随着地图的增长,表格需要重新分配和重新编制,暂时至少使用的空间增加一倍。最后,所有这些都假定 int 值和对象引用需要4个字节。如果JVM使用64位寻址,则字节数可能会加倍。


    In my Java code, I am using Guava's Multimap (com.google.common.collect.Multimap) by using this:

     Multimap<Integer, Integer> Index = HashMultimap.create()
    

    Here, Multimap key is some portion of a URL and value is another portion of the URL (converted into an integer). Now, I assign my JVM 2560 Mb (2.5 GB) heap space (by using Xmx and Xms). However, it can only store 9 millions of such (key,value) pairs of integers (approx 10 million). But, theoretically (according to memory occupied by int) it should store more.

    Can anybody help me,

    1. Why is Multimap using lots of memory? I checked my code and without inserting pairs into the Multimap, it only uses 1/2 MB of memory.
    2. 2.

    Is there another way or home-baked solution to solve this memory issue? Means, Is there any way to reduce those object overheads as I want to store only int-int? In any other language ? Or any other solution (home-baked preferred) to solve issue I faced, means DB based or something like that solution.

    解决方案

    There's a huge amount of overhead associated with Multimap. At a minimum:

    • Each key and value is an Integer object, which (at a minimum) doubles the storage requirements of each int value.
    • Each unique key value in the HashMultimap is associated with a Collection of values (according to the source, the Collection is a Hashset).
    • Each Hashset is created with default space for 8 values.

    So each key/value pair requires (at a minimum) perhaps an order of magnitude more space than you might expect for two int values. (Somewhat less when multiple values are stored under a single key.) I would expect 10 million key/value pairs to take perhaps 400MB.

    Although you have 2.5GB of heap space, I wouldn't be all that surprised if that's not enough. The above estimate is, I think, on the low side. Plus, it only accounts for how much is needed to store the map once it is built. As the map grows, the table needs to be reallocated and rehashed, which temporarily at least doubles the amount of space used. Finally, all this assumes that int values and object references require 4 bytes. If the JVM is using 64-bit addressing, the byte count probably doubles.

    这篇关于Multimap空间问题:番石榴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆