垃圾收集需要多少额外内存? [英] How much extra memory does garbage collection require?
问题描述
我曾经听说过一种语言要正确地执行和运行垃圾回收 ,平均需要3倍的内存。我不知道这是假设应用程序是小型的,大型的还是其他的。
所以我想知道是否有任何研究或实际数量的垃圾收集开销。另外我想说GC是一个非常不错的功能。您需要的内存空间大小取决于内部的分配率你的程序。如果你的分配率很高,在GC工作时需要更多的增长空间。
另一个因素是对象生命周期。如果你的对象通常只有很短的一生,那么你可以用一代代收集器来略微控制空间。
有许多研究论文可能会让你感兴趣您。编辑(2011年1月):
我正在考虑一篇我现在无法找到的具体论文。下面的内容很有趣,并包含一些相关的性能数据。作为一个经验法则,您通常可以使用大约两倍的程序驻留内存。有些程序需要更多,但其他程序即使在受限制的环境中也能很好地运行。有很多变数影响这个,但分配率是最重要的变量。
-
<编辑(2013年2月):该编辑在引用的论文中增加了一个平衡的观点,并解决了Tim Cooper提出的反对意见。 - 分配的空间(包含活的,死的和未匹配的对象)
- 保留空间(从中分配新对象)
- 工作区域(长期和短期GC数据结构)
- 硬实时涉及最坏情况下的延迟(对于任务关键系统) - 分配器的延迟响应是错误的。
real-time涉及平均延迟或中值延迟 - 分配器的延迟响应可以,但不应该经常发生。 li> - 均值居留权:1000 KB
- 预留空间: 100 KB
- GC循环持续时间:1000 ms
- 预留空间: 100 KB
- 分配率A:100 KB / s
分配率B:200 KB / s
现在我们可以看到下面的分配率为A的事件时间表:
li> T + 0000 ms:GC循环开始,100 KB可用于分配,1000 KB已分配 - T + 1000 ms:
- 保留空间0KB,分配1100KB
- GC周期结束,100 KB发布
- 100 KB保留空间,1000 KB分配
- T + 2000 ms:与上面相同
- T + 0000 ms:GC循环开始,100 KB可用于分配,1000 KB已经分配
- T + 0500 ms:
- 0 KB保留空间,1100 KB已分配
-
- 延迟到GC循环结束(坏的,但有时是强制性的)或
- 将保留大小增加到200 KB, 100 KB免费(在这里假设)
- 延迟到GC循环结束(坏的,但有时是强制性的)或
- T + 1000 ms:
- 在保留空间中释放0 KB,分配1200 KB
- GC周期结束,已发布200 KB
- 200 KB免费储备,1000 KB分配
- T + 2000 ms:
- 0 KB的空闲空间,已分配1200 KB
- GC周期结束,200 KB已发布
Myths and realities: the performance impact of garbage collection
Edit (February 2013): This edit adds a balanced perspective on a paper cited, and also addresses objections raised by Tim Cooper.
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, as noted by Natan Yellin, is actually the reference I was first trying to remember back in January 2011. However, I don't think the interpretation Natan has offered is correct. That study does not compare GC against conventional manual memory management. Rather it compares GC against an oracle which does perfect explicit releases. In otherwords, it leaves us not know how well conventional manual memory management compares to the magic oracle. It is also very hard to find this out because the source programs are either written with GC in mind, or with manual memory management in mind. So any benchmark retains in inherent bias.
- Allocated space (contains live, dead, and untraced objects)
- Reserved space (from which new objects are allocated)
- Working region (long-term and short-term GC data structures)
- hard-realtime concerns worst case delay (for mission critical systems) -- a late response from the allocator is an error.
- soft-realtime concerns either average or median delay -- a late response from the allocator is ok, but shouldn't happen often.
- mean residency: 1000 KB
- reserved headroom: 100 KB
- GC cycle duration: 1000 ms
- allocation rate A: 100 KB/s
- allocation rate B: 200 KB/s
- T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
- T+1000 ms:
- 0 KB free in reserved space, 1100 KB allocated
- GC cycle ends, 100 KB released
- 100 KB free in reserve, 1000 KB allocated
- T+2000 ms: same as above
- T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
- T+0500 ms:
- 0 KB free in reserved space, 1100 KB allocated
- either
- delay until end of GC cycle (bad, but sometimes mandatory), or
- increase reserved size to 200 KB, with 100 KB free (assumed here)
- T+1000 ms:
- 0 KB free in reserved space, 1200 KB allocated
- GC cycle ends, 200 KB released
- 200 KB free in reserve, 1000 KB allocated
- T+2000 ms:
- 0 KB free in reserved space, 1200 KB allocated
- GC cycle ends, 200 KB released
- 200 KB free in reserve, 1000 KB allocated
继Tim Cooper的反对意见后,我想澄清一下我在内存空间的话题。我主要是为后人做这件事,因为我认为堆栈溢出答案应该成为许多人的长期资源。
典型GC系统中有许多存储区域,但有三种抽象类型:
什么是 headroom ?净空是保持所需性能水平所需的最小预留空间量。 我相信这是OP所要求的。 您也可以将余量视为实际程序驻留(最大实时内存)以获得良好性能的附加内存。
是 - 增加净空可以延迟垃圾收集并提高吞吐量。这对离线非关键操作非常重要。
实际上,大多数问题域需要实时解决方案。有两种实时,它们是非常不同的: b
$ b
大多数最先进的垃圾收集器旨在实现软实时,这对桌面应用程序以及按需提供服务的服务器。如果一个人要求实时消除,那么不妨使用一个停止世界的垃圾收集器,其中 headroom 开始失去意义。 (注意:主要是短期对象和高分配率的应用程序可能是个例外,因为成功率很低。)现在假设我们正在编写一个具有软实时要求的应用程序。为了简单起见,我们假设GC在专用处理器上同时运行。假设该程序具有以下人工属性:
和:
分配率为B的事件的时间线不同:
请注意分配率如何直接影响所需余量的大小?在分配率B的情况下,我们需要两倍的空间以防止暂停并保持相同的性能水平。
这是一个非常简化的 >例子旨在说明只有一个想法。还有很多其他因素,但它确实显示了意图。请记住我提到的另一个主要因素:平均物体寿命。短寿命造成低存活率,与分配率一起工作以影响维持给定性能水平所需的存储器数量。
总之,不能制造在不知道和理解应用程序特征的情况下,需要有关空间的一般声明。
I heard once that for a language to implement and run garbage collection correctly there is on average of 3x more memory required. I am not sure if this is assuming the application is small, large or either.
So i wanted to know if theres any research or actually numbers of garbage collection overhead. Also i want to say GC is a very nice feature.
The amount of memory headroom you need depends on the allocation rate within your program. If you have a high allocation rate, you need more room for growth while the GC works.
The other factor is object lifetime. If your objects typically have a very short lifetime, then you may be able to manage with slightly less headroom with a generational collector.
There are plenty of research papers that may interest you. I'll edit a bit later to reference some.
Edit (January 2011):
I was thinking of a specific paper that I can't seem to find right now. The ones below are interesting and contain some relevant performance data. As a rule of thumb, you are usually ok with about twice as much memory available as your program residency. Some programs need more, but other programs will perform very well even in constrained environments. There are lots of variables that influence this, but allocation rate is the most important one.
Following Tim Cooper's objections, I'd like to clarify my position on the topic of memory headroom. I do this mainly for posterity, as I believe Stack Overflow answers should serve as a long-term resource for many people.
There are many memory regions in a typical GC system, but three abstract kinds are:
What is headroom anyway? Headroom is the minimum amount of reserved space needed to maintain a desired level of performance. I believe that is what the OP was asking about. You can also think of the headroom as memory additional to the actual program residency (maximum live memory) neccessary for good performance.
Yes -- increasing the headroom can delay garbage collection and increase throughput. That is important for offline non-critical operations.
In reality most problem domains require a realtime solution. There are two kinds of realtime, and they are very different:
Most state of the art garbage collectors aim for soft-realtime, which is good for desktop applications as well as for servers that deliver services on demand. If one eliminates realtime as a requirement, one might as well use a stop-the-world garbage collector in which headroom begins to lose meaning. (Note: applications with predominantly short-lived objects and a high allocation rate may be an exception, because the survival rate is low.)
Now suppose that we are writing an application that has soft-realtime requirements. For simplicity let's suppose that the GC runs concurrently on a dedicated processor. Suppose the program has the following artificial properties:
And:
Now we might see the following timeline of events with allocation rate A:
The timeline of events with allocation rate B is different:
Notice how the allocation rate directly impacts the size of the headroom required? With allocation rate B, we require twice the headroom to prevent pauses and maintain the same level of performance.
This was a very simplified example designed to illustrate only one idea. There are plenty of other factors, but it does show what was intended. Keep in mind the other major factor I mentioned: average object lifetime. Short lifetimes cause low survival rates, which work together with the allocation rate to influence the amount of memory required to maintain a given level of performance.
In short, one cannot make general claims about the headroom required without knowing and understanding the characteristics of the application.
这篇关于垃圾收集需要多少额外内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!