垃圾收集机制如何工作? [英] How does the Garbage Collection mechanism work?

查看:82
本文介绍了垃圾收集机制如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个俗语术语中,垃圾收集机制是如何工作的?

一个对象如何被识别为可用于垃圾收集?



另外,GC算法中的引用计数,标记和扫描,复制,训练均值是什么?

解决方案

当您使用垃圾收集语言时,您无法直接访问内存。相反,您可以在数据上访问一些抽象。被正确抽象出来的东西之一就是数据块内存的实际位置,以及指向其他数据块的指针。当垃圾收集器运行时(偶尔会发生这种情况),它将检查是否仍持有对其分配给它的每个内存块的引用。如果你不这样做,它将释放内存。



垃圾收集器的不同类型之间的主要区别在于它们的效率以及对何种分配方案的限制他们可以处理。



最简单的就是正确的引用计数。当您创建对象的引用时,该对象上的内部计数器会递增,如果偶然引用或它不在范围内,则(之前)目标对象上的计数器递减。当这个计数器达到零时,该对象不再被引用并且可以被释放。

引用计数垃圾收集器的问题是它们不能处理循环数据。如果对象A具有对象B的引用,并且对对象A具有一些(直接或间接)引用,则即使链中没有任何对象在链之外被引用,它们也永远不会被释放(因此,所有程序都可以访问)。



另一方面,标记和扫描算法可以处理这个问题。标记和扫描算法的工作原理是定期停止程序的执行,将程序分配的每个项目标记为不可访问。然后程序运行程序所有的变量,并标记它们指向可达的地方。如果这些分配中的任何一个包含对程序中其他数据的引用,那么这些数据同样会被标记为可访问等。

这是算法的标记部分。在这一点上,程序可以访问,不管间接如何,被标记为可达,程序无法访问的所有内容都被标记为不可访问。垃圾收集器现在可以安全地回收与标记为不可访问的对象关联的内存。



标记和扫描算法的问题在于效率不高 - 必须停止整个程序才能运行它,并且很多对象引用不会改变。



为了改善这一点,标记和扫描算法可以通过所谓的分代垃圾收集进行扩展。在这种模式下,一些垃圾收集系统中的对象被提升为老一代,这种情况经常不会被检查。

这提高了效率,因为对象往往会年轻化(想象一个字符串在一个循环内被改变,导致可能终生几百个循环)或活得很长(用于表示应用程序主窗口的对象,或者一个servlet的数据库连接)。



更多详细信息可以在wikipedia上找到。



根据评论添加:使用标记和扫描算法(以及除引用计数外的其他垃圾收集算法),垃圾收集不会执行 在程序的上下文中运行,因为它必须能够访问程序无法直接访问的内容。因此,说垃圾收集器在堆栈上运行是不正确的。


In a lay-man terminology how does the garbage collection mechanism work?

How an object is identified to be available for garbage collection?

Also, what do Reference Counting, Mark and Sweep, Copying, Train mean in GC algorithms?

解决方案

When you use a language with garbage collection you wont get access to the memory directly. Rather you are given access to some abstraction on top of that data. One of the things that is properly abstracted away is the the actual location in memory of the data block, as well as pointers to other datablocks. When the garbage collector runs (this happens occasionally) it will check if you still hold a reference to each of the memory blocks it has allocated for you. If you don't it will free that memory.

The main difference between the different types of garbage collectors is their efficiency as well as any limitations on what kind of allocation schemes they can handle.

The simplest is properly reference counting. When ever you create a reference to an object an internal counter on that object is incremented, when you chance the reference or it is no longer in scope, the counter on the (former) target object is decremented. When this counter reaches zero, the object is no longer referred at all and can be freed.

The problem with reference counting garbage collectors is that they cannot deal with circular data. If object A has a reference to object B and that in turn has some (direct or indirect) reference to object A, they can never be freed, even if none of the objects in the chain are refereed outside the chain (and therefore aren't accessible to the program at all).

The Mark and sweep algorithm on the other hand can handle this. The mark and sweep algorithm works by periodically stopping the execution of the program, mark each item the program has allocated as unreachable. The program then runs through all the variables the program has and marks what they point to as reachable. If either of these allocations contain references to other data in the program, that data is then likewise marked as reachable, etc.

This is the mark part of the algorithm. At this point everything the program can access, no matter how indirectly, is marked as reachable and everything the program can't reach is marked as unreachable. The garbage collector can now safely reclaim the memory associated with the objects marked as unreachable.

The problem with the mark and sweep algorithm is that it isn't that efficient -- the entire program has to be stopped to run it, and a lot of the object references aren't going to change.

To improve on this, the mark and sweep algorithm can be extended with so called "generational garbage collection". In this mode objects that have been in the system for some number of garbage collections are promoted to the old generation, which is not checked that often.

This improves efficiency because objects tend to die young (think of a string being changed inside a loop, resulting in perhaps a lifetime of a few hundred cycles) or live very long (the objects used to represent the main window of an application, or the database connection of a servlet).

Much more detailed information can be found on wikipedia.

Added based on comments:

With the mark and sweep algorithm (as well as any other garbage collection algorithm except reference counting) the garbage collection do not run in the context of your program, since it has to be able to access stuff that your program is not capable of accessing directly. Therefore it is not correct to say that the garbage collector runs on the stack.

这篇关于垃圾收集机制如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆