多地图空间问题:番石榴 [英] Multimap Space Issue: Guava

查看:37
本文介绍了多地图空间问题:番石榴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的 Java 代码中,我使用 Guava 的 Multimap (com.google.common.collect.Multimap) 使用这个:

In my Java code, I am using Guava's Multimap (com.google.common.collect.Multimap) by using this:

 Multimap<Integer, Integer> Index = HashMultimap.create()

这里,Multimap 键是 URL 的一部分,值是 URL 的另一部分(转换为整数).现在,我分配我的 JVM 2560 Mb (2.5 GB) 堆空间(通过使用 Xmx 和 Xms).但是,它只能存储 900 万个这样的(键、值)整数对(大约 1000 万个).但是,理论上(根据 int 占用的内存)它应该存储更多.

Here, Multimap key is some portion of a URL and value is another portion of the URL (converted into an integer). Now, I assign my JVM 2560 Mb (2.5 GB) heap space (by using Xmx and Xms). However, it can only store 9 millions of such (key,value) pairs of integers (approx 10 million). But, theoretically (according to memory occupied by int) it should store more.

谁能帮帮我,

  1. 为什么 Multimap 使用大量内存?我检查了我的代码,没有在 Multimap 中插入对,它只使用了 1/2 MB 的内存.
  2. 2.
  1. Why is Multimap using lots of memory? I checked my code and without inserting pairs into the Multimap, it only uses 1/2 MB of memory.
  2. 2.

是否有另一种方法或自制的解决方案来解决这个内存问题?意思是,有没有办法减少这些对象开销,因为我只想存储 int-int?在任何其他语言?或任何其他解决方案(首选自制)来解决我面临的问题,意味着基于数据库或类似的解决方案.

Is there another way or home-baked solution to solve this memory issue? Means, Is there any way to reduce those object overheads as I want to store only int-int? In any other language ? Or any other solution (home-baked preferred) to solve issue I faced, means DB based or something like that solution.

推荐答案

Multimap 相关的开销很大.至少:

There's a huge amount of overhead associated with Multimap. At a minimum:

  • 每个键和值都是一个 Integer 对象,它(至少)使每个 int 值的存储需求翻倍.
  • HashMultimap 中的每个唯一键值都与一个 Collection 值相关联(根据 来源Collection哈希集).
  • 每个 Hashset 都使用 8 个值的默认空间创建.
  • Each key and value is an Integer object, which (at a minimum) doubles the storage requirements of each int value.
  • Each unique key value in the HashMultimap is associated with a Collection of values (according to the source, the Collection is a Hashset).
  • Each Hashset is created with default space for 8 values.

因此,每个键/值对(至少)需要的空间可能比您对两个 int 值的预期多一个数量级.(当多个值存储在一个键下时会少一些.)我预计 1000 万个键/值对可能占用 400MB.

So each key/value pair requires (at a minimum) perhaps an order of magnitude more space than you might expect for two int values. (Somewhat less when multiple values are stored under a single key.) I would expect 10 million key/value pairs to take perhaps 400MB.

虽然您有 2.5GB 的堆空间,但如果这还不够,我也不会感到惊讶.我认为,上述估计偏低.此外,它仅说明地图构建后需要存储多少.随着映射的增长,表需要重新分配和重新散列,这暂时至少使使用的空间量增加一倍.最后,所有这些都假设 int 值和对象引用需要 4 个字节.如果 JVM 使用 64 位寻址,字节数可能会翻倍.

Although you have 2.5GB of heap space, I wouldn't be all that surprised if that's not enough. The above estimate is, I think, on the low side. Plus, it only accounts for how much is needed to store the map once it is built. As the map grows, the table needs to be reallocated and rehashed, which temporarily at least doubles the amount of space used. Finally, all this assumes that int values and object references require 4 bytes. If the JVM is using 64-bit addressing, the byte count probably doubles.

这篇关于多地图空间问题:番石榴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆