托管/本机循环引用清理问题 [英] Managed/Native Circular Reference Clean-up Problem

查看:84
本文介绍了托管/本机循环引用清理问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我在Google上进行了搜索,但找不到类似的内容,因此我认为应该在此处发布.

我有一个成熟的,高性能的本机库,其中进行了许多优化,包括使用区域分配器来提高性能.各种旧有软件都在使用它,并且/但是我正在构建一个托管包装器,以将该功能公开给(例如)C#.由于各种原因,包括性能,都不能将所有代码都移至托管代码.我必须在本机和托管对象之间建立循环引用:
*托管类需要一个指向本机类的用于业务逻辑的指针;包装程序只需要调用本机代码即可
*本机代码将本机对象存储在有序集合中,因此我保留了从本机类返回到托管类的迭代器等的引用.-我不想重复列表和集管理

示例说明代码:

Hi,

I did a google search and couldn''t find something quite like this, so I thought I''d post here.

I''ve got a mature, high performance native library with lots of optimizations including using zone allocators to increase performance. It''s used by a variety of legacy software, and/but I''m building a managed wrapper to expose that functionality to (e.g.) C#. Moving all the code to managed is not an option for a variety of reasons, including performance. I''ve had to build a circular reference between the native and managed objects:
* the managed class needs a pointer to the native class for business logic; the wrapper does nothing other than call to the native code
* the native code stores native objects in ordered sets, so I retain a reference from the native class back to the managed class for iterators, etc. - I don''t want to duplicate list and set management

Example illustrative code:

ref class Managed_eh {
    class Native_eh *n_this;                    // this is always valid
};

class Native_eh {                               // handled by a zone allocator
    Native_eh *prev, *next;                     // doubly-linked list
    msclr::auto_gcroot<managed_eh> ^gc_this;    // lazy assignment
};</managed_eh>



*如果客户端代码创建了托管对象,则必须创建本机对象.
*如果本机对象是由于内部逻辑而创建的,那么只有在需要-避免不必要的分配等之前,我们才创建托管包装器,否则将失去区域分配器等的好处.同样,在很多时候,实际上不需要托管包装器.
* gc_this不得清除(以保持一致性),只要存在带有Managed_eh对象句柄的客户端托管代码即可.

清理是个问题.包装类和本机类都不知道什么时候它们不再对客户端代码和清理有用.有时我可以通过清理管理Native_eh集的容器对象来解决此问题.但这不是通用解决方案.
*在严格的原生世界中,通过更改层次结构,这将是微不足道的解决方案.
*在COM参考计数器世界中,使用其他解决方案同样容易:
—如果未设置gc_this,则当Managed_eh的引用计数器达到0
时进行清理 —如果设置了gc_this,则当托管指针上的引用计数器降为1时,我们知道可以清除,因为我们确切知道"1"是什么(gc_this).

解决此问题的合适的".Net"方法是什么?

谢谢,

—Rob



有趣的讨论,谢谢,尽管我认为这没有帮助解决我的特定问题.

我并不特别关注何时清理垃圾收集的对象,我可以轻松地解决该问题.我更担心他们可以.
*只要本机对象保留托管包装器的句柄,垃圾收集器就不会执行清理.测试表明了这一点,这是有道理的,因为垃圾收集器无法在本机端跟踪引用.
*但是...本机对象无法释放其句柄,直到知道被管理方没有其他人拥有句柄-否则就有使一致性失效的风险.

换句话说,如果本机对象具有对托管包装器的唯一引用,则我们(它)可以执行清理.但是本机对象如何确定没有其他人对其包装感兴趣?

在COM世界中,我们可以检查参考计数器得出的结论是:"我"是唯一拥有该对象句柄的对象,因此我可以释放它并清理".但是我看不到.Net世界中可用的内容.

有没有办法(有效地)查询垃圾收集器使用的依赖关系图?

是否有事件告诉您给定对象(托管包装器)的依赖关系图的状态何时更改?

—Rob



* If client code creates the managed object, then the native object is necessarily created.
* If the native object is created due to internal logic, then we don''t create the managed wrapper until we need to - to avoid unnecessary allocations, etc., otherwise the benefits of the zone allocators, etc. are lost. Also, there are many times when the managed wrapper is never actually needed.
* gc_this must not be cleared (to maintain consistency) so long as there''s client managed code with a handle to the Managed_eh object.

Clean-up is the concern. Neither the wrapper class, nor the native class know when they are no longer interesting to client code and clean up. Sometimes I can solve this problem by cleaning up the container objects managing the sets of Native_eh. But it''s not a generic solution.
* In the strictly native world, this would be a trivial fix by changing the hierarchy.
* In the COM reference-counter world, this is equally easy with a different solution:
— if gc_this is not set, then clean up when the reference counter to Managed_eh reaches 0
— if gc_this is set, then when the reference counter on the managed pointer drops to 1, we know we can clean up because we know exactly what that ''1'' is (gc_this).

What is the appropriate ".Net" approach to solving this problem?

Thanks,

—Rob



Interesting discussion, thanks, though I don''t think it helps my particular problem.

I''m not particularly concerned on precisely when the clean-up of the garbage-collected objects occur, I can easily work around that. I''m more concerned that they can.
* So long as the native object retains the handle to the managed wrapper, then the garbage collector will not perform clean-up. Tests show this, and it makes sense because the garbage collector cannot track references in the native side.
* But...the native object cannot release its handle until it knows that nobody else on the managed side has a handle - or risk invalidating consistency.

Said another way, if the native object has the only reference to the managed wrapper, than we (it) can perform clean-up. But how can the native object determine that nobody else is interested in its wrapper?

In the COM world, we could examine reference counters to conclude "I''m the only one left with a handle to this object so I can release it and we can clean up". But I don''t see that available in the .Net world.

Is there a way to (efficiently) query the dependency graph used by the garbage collector?

Is there an event that tells you when the state of the dependency graph for a given object (the managed wrapper) changes?

—Rob

推荐答案

让我们从纯托管代码开始.

这很简单:当您松散"对某个对象的引用时,该垃圾将被垃圾回收器(GC)安排销毁.实际的破坏发生在一段时间之后.它取决于GC设计和您的代码.不建议假设它发生的任何特定时间.不建议尝试更改GC行为(这是可能的,因此这只是一个经验法则).买这个理由,也不建议写析构函数.但是,我认为,在混合模式下,托管代码析构函数可能非常有用,请参见下文.

现在,即使两个引用是互为参考的或多个引用是循环的,此机制也始终可以正常工作.您可以轻松地进行检查.

这并不意味着不可能发生内存泄漏.很有可能,但不是由于C ++中的一些随机错误.而是可能是由于设计.而且,泄漏的定义并不像人们想象的那么简单.基本上,当您的最终状态机返回到应该与以前相同的状态时,您会发生内存泄漏,但是可访问对象的数量并不相同(严格来讲,您无法说出内存量,因为GC是独立的参与者,请参见上文).假设您有一个多缓冲区编辑器.如果加载N个文档,更改它们的数据并最终将它们全部关闭,则您将摆脱中间的所有对象.试想一下,但是,您还收集了一些字典"中某个对象的引用,例如快速搜索某些内容.现在想像一下:关闭文档的Tab对象时,内部创建的每个对象的每个实例都会分层丢失,但是……您忘记了从Dictionary中查找和删除某些对象.这样,您会遇到内存泄漏.这足够琐碎了.

为了完成仅托管的C ++/CLI代码,请注意:该语言是唯一的,因为它甚至允许引用类型实现值语义.这是一个很棒的功能.再次,没有什么特别的事情要做:随着C ++中代码脱离堆栈框架,堆栈对象将被删除.如果该对象包含一些引用,则GC将在以后处理它们.

如您所知,使用混合模式并非易事.可以应用不同的方法.我将根据托管类型和非托管类型之间紧密或松散的耦合对它们进行分类.由于松散耦合通常是非常好的事情,因此在我试图使用该术语的意义上,并非总是如此.

让我们首先考虑非常松散的耦合:您有一个单独的C ++(非托管)代码和C ++/CLI托管代码.这部分协作非常罕见,因此您可以隔离所有协作案例,并特别注意内存问题.在这种情况下,您对非托管部分使用C ++技术,而对.NET部分使用C ++/CLI.这确实很容易,但对于许多应用程序却不是很现实.在很多情况下,您需要紧密协作.

在紧密协作的情况下,我建议将所有协作放到极限:将C ++代码的每个语义单元(大概是任何相对独立的C ++类)包装在C ++/CLI包装器中.在每个这样的单元中,删除C ++堆对象(从某种意义上来说)应该是易于管理的.现在,您应该确保在包装类的析构函数中完成了这种非托管删除.您还应该注意处理异常,这也不是很困难.此技术将有一个重要的局限性:由于在未知的情况下回收托管内存的时间(由于GC的性质),非托管内存清理应该没有副作用,只能删除.对于纯非托管C ++项目,这将是合理的设计原则.

另外,还有基于IDisposable的技术.关于不同的方法存在很多争议,甚至是圣战.我认为,这取决于.首先要了解的是,IDisposable只是保证对方法object::Dispose的调用的一种方法.即使此方法主要用于回收非托管资源,它也可以执行任何操作.
我将结合使用两种方法而没有大问题.我更喜欢将IDisposable对象用于短离开的对象(通常带有单个堆栈框架).这样,此类托管对象(就生命周期而言)将类似于在堆栈上使用并在离开当前堆栈帧时销毁的C ++类的实例.这样的托管对象非常适合控制C ++非托管堆对象的生命周期.这样,应首选将非托管破坏包装在object::Dispose中.

IDisposable中包装非托管堆对象的另一种特殊情况是在object::Dispose中而不是在构造函数中进行破坏:非托管对象占用大量堆,尤其是存在碎片的可能性.将此类对象留给基于析构函数的托管包装器可能是一件令人不舒服的事情:谁知道GC何时决定"收回该包装器?

因此,这样的编程需要良好的战略计划并非常准确地遵循战略.同时,我认为混合模式过程中的.NET部分只能在内存管理方面提高非托管代码的健壮性.


作为对后续措施的回应(之后,我将其移至原始问题的文本).
没有您正在寻找的活动,而且从未有过.对于这样的事情,人们使用了众所周知的引用计数技术,可能带有C ++典型的智能指针".您是否在纯本机代码中使用了这种方法?一种变体是遵循松耦合模型继续使用它.

进一步的步骤取决于您的总体设计(对象模型或其他),以及您决定如何设计或重新设计,您愿意进行多少重新设计和权衡的决定. (也请参阅我对Olivier答案的评论.)您应注意不要混淆使用这些方法,以免它们互相对抗:-).

我实际上建议您使用托管等效项或智能指针:将非托管对象包装在托管对象中.这不是那么简单,因为您所引用和引用的对象之间的关系位于非托管字内.您需要进行更改,以便这些关系将由托管引用驱动.

一个更好的主意是删除所有托管代码,除了一些对性能至关重要的计算和与非托管库的接口.您所有的对象模型都可以进行完全管理.它几乎可以消除您的本机内存分配问题.

在这一点上,我认为抽象的讨论(不协调您的具体架构)不会带来更多的好处.您可以解释您的项目并尝试获取适用于您的具体情况的建议,也可以自己完成新架构.

祝你好运,
—SA
Let''s start with the pure managed code.

This is simple: when you "loose" the reference to some object, it is scheduled for destruction by the Garbage Collector (GC). The actual destruction happens some time later; it depends on both GC design and your code. It is not recommended to assume any certain moment of time when it happens; it is not recommended to attempt to change GC behavior (which is possible, so this is only a rule of thumb). Buy this reason, it also not recommended to write destructors. However, I think, in mixed mode the managed-code destructors may be very useful, please see below.

Now, this mechanism always works correctly even if two references are mutual or several references are cyclic. You can easily check it up.

This does not mean the memory leak is not possible. It is quite possible, but not due to some random bug as in C++. Rather, it can be due to design. Also, the definition of leak is not as simple as one could think of. Basically, you have memory leak when your Final State Machine comes back to some state which is supposed to be the same as before, but number of the accessible object is not the same (you cannot speak of the amount of memory, strictly speaking, because the GC is the independent actor, see above). Imagine you have something as a multi-buffer editor. If you load N documents, changes they data and eventually close them all, you''re supposes to get rid of all objects you had in the middle. Imagine, however, you also collected references to some object in some Dictionary used, say for quick search of something. Now imagine this: when you close the Tab object with the document, every instance of every object created inside gets hierarchically lost, but… you forget to find and remove some of the objects from the Dictionary. In this way, you get a memory leak. This is trivial enough.

To finish with managed-only C++/CLI code, let''s note: this language is unique as it allows for value semantic even for reference types. This is a great feature. Again, there is nothing special to do about it: stack objects are removed as the code goes out of stack frame as in C++. If this object hold some references, so GC will take care of them later.

As you perfectly understand, it''s not so easy with mixed mode. Different approaches can be applied. I would classify them based on tight or loose coupling between managed and unmanaged types. As loose coupling in general is very good thing, it is not always so in the sense I''m trying to use this term.

Let''s consider very loose coupling first: you have a separate C++ (unmanaged) code and C++/CLI managed code. This part collaboration is very rare, so you can isolate all cases of collaboration and pay special attention for memory issues. In this case, you use you C++ technique for unmanaged part and C++/CLI for .NET part. This is really easy but not very realistic for many applications. There are many cases when you need tight collaboration.

In the case of tight collaboration, I would suggest to put all collaboration to extreme: wrap every semantic unit of C++ code (presumably, any relatively independent C++ class) in the C++/CLI wrapper. The deletion of C++ heap objects (in umnanaged sense) should be well manageable in each such unit. Now, you should make sure that such unmanaged deletion is completed in the destructor of the wrapper class. You also should take care about processing exceptions, which is also not very difficult. This technique will one have important limitation: as the moment of the reclaiming of the managed memory in unknown (due to the nature of GC), the unmanaged memory clean-up should have no side effect, just deletion. This would a reasonable design principle for the purely unmanaged C++ project.

Also, there are techniques based on IDisposable. There is a lot of controversies and even holy wars about different approaches. I think, it depends. First thing to understand, IDisposable is nothing more than a way to guarantee a call to the method object::Dispose. Even though this method is mostly used to reclaim unmanaged resources, it can do anything, anything at all.

I would combine the two approached without big problems. I would prefer using IDisposable objects for short-leaving objects (typically withing a single stack frame). In this way, such managed objects would be similar (in terms of lifetime) to instances of C++ classes used on stack and destroyed at the moment of leaving the current stack frame. Such managed objects are well suitable for control of the life time of C++ unmanaged heap objects. In this way, wrapping the unmanaged destruction in object::Dispose should be preferred.

Another special case of wrapping unmanaged heap objects in IDisposable, with destruction in object::Dispose rather than constructors is this: unmanaged objects claiming a lot of heap, especially with the possibility of fragmentation. Leaving such object to a destructor-based managed wrapper can be an uncomfortable thing: who knows when GC "decides" to reclaim this wrapper?

So, such programming needs good strategic planning and following the strategy very accurately. At the same time, I think the .NET part of the mixed-mode process can only improve the robustness of the unmanaged code in terms of memory management.


In response to the follow-up (I moved it to the text of original Question after ).
There is no event you are looking for and never was. For things like that people uses well-known technique of reference counting, possibly with "smart pointers" typical for C++. Did you use this approach in pure native code? One variant is to keep using it following loose-coupled model.

Your further steps depends on your general design (Object Model or something) and your decision on how you want to design or redesign it, on how much you''re willing to re-design and the trade off. (Please also see my comment to the answer by Olivier.) You should be careful not to mix the approaches so they would not struggle against each other :-).

I actually advised you a managed equivalent or smart pointers: wrapping unmanaged objects in the managed. This is not so simple, because your relationships between referenced and referencing objects lie inside the unmanaged word. You need to change is so these relationships would be driven by the managed references.

On more idea is to remove all managed code except some performance-critical calculation and interfacing with unmanaged libraries. All you Object Model can become purely managed. It could nearly eliminate you native memory allocation problem.

At this point, I don''t think abstract discussion (not concerting your concrete architecture) could bring much more. You can either explain you project and try to get advice applicable to your concrete situation or do the new architecture all by yourself.

Good luck,
—SA


如果您希望允许受管客户端决定何时不再需要包装器,则可以实现 Dispose -模式.在C ++/CLI中,处置与在引用句柄上调用delete 相同,这时包装程序可以在Native_eh(类似于CleanUpRequested)和可以将成员句柄重置为nullptr.这将释放托管包装器以在下一个GC周期中进行收集(此时,本机对象不再关心它了,它取决于CLR/GC).

[回应1]

Dispose模式适用于.NET,它不是特定于语言的. C ++/CLI内部使用Dispose模式(IL与C#编译器生成的内容几乎相同).

我很好奇为什么您不希望调用者决定何时处置该对象.因为那是在土著世界中总是要做的.而且您的托管代码仅包装了本机代码,实际上应该遵循本机世界的确定性处置语义.
If you are willing to allow the managed clients to dictate when they no longer need the wrapper, you can implement the Dispose-pattern. In C++/CLI Disposing is the same as calling delete on the reference handle, and at this point the wrapper can call a method in Native_eh (something like CleanUpRequested), and Native_eh::CleanUpRequested can reset the member handle to nullptr. This frees up the managed wrapper for collection during the next GC cycle (and at this point the native object does not care about it anymore, it''s up to the CLR/GC).

[Response 1]

The Dispose pattern is for .NET, it''s not language specific. C++/CLI internally uses the Dispose pattern (IL is nearly identical to what the C# compiler generates).

I am curious why you don''t want the caller to decide when to dispose off the object. Because that''s what is always done in the native world. And your managed code merely wraps native code and should really be following native world deterministic disposal semantics.


我认为您的设计存在问题.我不理解需要在您的本机代码中保存对托管objetcs的引用:所谓的gc_this.

它不再是包装器了.托管包装器只是您在本机"代码周围放置的某些代码,而无需更改该本机代码.

您应该添加本机回调或一些东西给包装器一些反馈,然后让包装器按照Nishant和SA的解释进行清洁:处置,确定性清洁,...
I think there is a problem in your design. I don''t understand the need to hold references to managed objetcs inside your native code: the so-called gc_this.

It is not a wrapper anymore. A managed wrapper is just some code you put "around" your native code without changing anything to that native code.

You should rather add native callbacks or something to give some feedback to your wrapper, and let the wrapper do the cleaning with what Nishant and SA explained: Dispose, deterministic cleaning, ...


这篇关于托管/本机循环引用清理问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆