`std :: shared_ptr`的自动循环断路器的可行性 [英] Feasibility of automatic cycle breaker for `std::shared_ptr`

查看:146
本文介绍了`std :: shared_ptr`的自动循环断路器的可行性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++ 11引入了引用计数智能指针, std :: shared_ptr 。被引用计数,这些指针不能自动回收循环数据结构。然而,自动收集参考周期是可能的,例如通过 Python 和< href =http://php.net/manual/en/features.gc.collecting-cycles.php =nofollow noreferrer> PHP 。为了将这种技术与垃圾收集区分开来,问题的其余部分将它称为 cycle break

鉴于似乎没有为C ++添加等效功能的建议是否有一个根本原因,这就是为什么类似于已经用其他语言部署的自行车断路器不适用于 std :: shared_ptr



请注意,这个问题并不归结为为什么C ++没有GC,它之前已被问过。 C ++ GC通常指的是自动管理所有动态分配对象的系统,通常使用某种形式的Boehm的保守收集器来实现。已经指出这样的收藏家与RAII不太匹配。由于垃圾收集器主要管理内存,甚至可能在内存不足时才被调用,而C ++析构函数管理其他资源,依靠GC运行析构函数会导致非确定性和最坏情况下的资源匮乏。它还指出,在更明确和可预测的智能指针存在的情况下,完全不需要全面的GC。

然而,基于库的循环断路器对于智能指针(类似于引用计数的解释器所使用的指针),与通用GC有重要区别:


  • 关心通过 shared_ptr 管理的对象。这些对象已经参与共享所有权,因此必须处理延迟的析构函数调用,其确切时间取决于所有权结构。
  • 由于其范围有限,循环断路器不关心Boehm GC中断或放慢的模式,如指针掩码或包含偶然指针的巨大不透明堆块。
  • 它可以选择加入,如的std :: enable_shared_from_this 。不使用它的对象不必支付控制块中的额外空间来保存周期断路器元数据。

  • 循环断路器不需要完整的根对象,这在C ++中很难获得。与标记扫描GC不同,该标记扫描GC可以找到所有活物并丢弃剩余的物体,循环断路器只会遍历可以形成循环的物体。在现有的实现中,类型需要以一种函数的形式提供帮助,该函数枚举引用(直接或间接)给其他可以参与循环的对象。

  • 它依赖于常规的当引用计数下降到零时销毁语义来销毁循环垃圾。一旦确定了一个循环,参与它的对象就被要求清除它们的强保持引用,例如通过调用 reset()。这足以打破周期并自动销毁物体。要求对象提供并清除其强有力的引用(根据请求)确保循环断路器不会破坏封装。



缺乏自动循环中断的建议表明,这个想法因实际或哲学原因而被拒绝。我很好奇,原因是什么。为了完整性,这里有一些可能的反对意见:


  • 它会引入对循环的非确定性破坏, shared_ptr 对象。如果程序员在控制循环断路器的调用,它不会是非确定性的。而且,一旦被调用,断路器的行为将是可预测的 - 它会破坏所有当前已知的周期。这与类似于 shared_ptr 析构函数在其引用计数下降到零之后销毁底层对象的情况类似,尽管这可能导致进一步破坏的非确定性级联。 / b>


  • 循环断点程序,就像任何其他形式的垃圾收集一样,会在程序执行过程中引入暂停。实现此功能的运行时的经验表明,由于GC仅处理循环垃圾,所以暂停很少,并且所有其他对象都通过引用计数回收。如果周期检测器从未被自动调用,那么断路器的暂停可能是运行它的可预见结果,类似于如何销毁大型 std :: vector 可能运行大量的析构函数。 (在Python中,循环gc会自动运行,但有API可用于在不需要它的代码段中暂时禁用它。稍后重新启用GC将收集在此期间创建的所有循环垃圾。)


  • < 一个循环断路器是不必要的,因为循环不是那么频繁,并且可以使用 std :: weak_ptr 轻松避免。事实上,循环很容易以许多简单的数据结构 - 例如孩子们有一个指向父母的反向指针的树,或者一个双向链表。在某些情况下,复杂系统中异构对象之间的循环仅偶尔形成某些数据模式,难以预测和避免。在某些情况下,用弱变量替换哪个指针是远远不够的。
  • >

    这里有很多问题需要讨论,所以我重写了我的文章以更好地压缩这些信息。



    自动循环检测



    你的想法是有一个 circle_ptr 智能指针(我知道你想把它添加到 shared_ptr ,但谈论一个新类型比较两者会更容易)。这个想法是,如果智能指针所绑定的类型来自某些 cycle_detector_mixin ,这将激活自动循环检测。



    该mixin还要求该类型实现一个接口。它必须提供枚举该实例直接拥有的所有 circle_ptr 实例的功能。而且它必须提供使其中一个无效的手段。



    我认为这是对这个问题非常不切实际的解决方案。它过于脆弱,需要用户进行大量的手动工作。因此,将其纳入标准库是不合适的。这就是为什么。



    决定论和成本




    它会引入对循环 shared_ptr 对象的非确定性破坏。仅当 shared_ptr 的引用计数降至零时才会发生循环检测,因此程序员可以控制何时发生。因此它不会是非确定性的。它的行为是可预测的 - 它会从该指针中销毁所有当前已知的周期。这与类似于 shared_ptr 析构函数在其引用计数下降到零之后销毁底层对象的情况类似,尽管这可能导致进一步破坏的非确定性级联。 / p>

    这是真的,但并不是有帮助的。



    常规 shared_ptr 破坏的决定性与你所建议的确定性之间存在实质性差异。即: shared_ptr 便宜。



    shared_ptr 的s析构函数执行一次原子递减,然后进行条件测试以查看该值是否递减为零。如果是这样,一个析构函数被调用并释放内存。就是这样。



    你的建议让这变得更加复杂。最坏的情况下,每当> circle_ptr 被销毁时,代码将不得不遍历数据结构以确定是否有循环。大多数时候,周期不会存在。但它仍然需要寻找它们,只是为了确保。它必须每隔一次你销毁一个 circle_ptr



    Python et et 。人。解决这个问题是因为它们被嵌入到语言中。他们能够看到所发生的一切。因此,他们可以检测指针在分配时的赋值时间。通过这种方式,这些系统一直在做少量的工作来构建循环链。一旦引用消失,它可以查看其数据结构,并在创建循环链时采取行动。



    但是,您建议的是一个库功能,而不是语言功能。和库类型不能真的做到这一点。或者说,他们可以,但只能在帮助下。



    请记住: circle_ptr 的一个实例无法知道子对象是的成员。它不能自动将指向自己的指针转换为指向其拥有的类的指针。如果没有这种能力,它就不能更新拥有它的 cycle_detector_mixin 中的数据结构,如果它被重新分配的话。



    现在,它可以手动做到这一点,但只有在拥有它的实例的帮助下。这意味着 circle_ptr 将需要一组被赋予一个指针到它所属的实例,该实例从 cycle_detector_mixin 导出的构造的。然后,它的 operator = 将能够通知其拥有者它已被更新。显然,复制/移动任务不会复制/移动拥有的实例指针。

    当然,这需要拥有实例给出指向它自己的每个 circle_ptr 它自己创建的指针。在创建 circle_ptr 实例的每个构造函数和函数中。在它本身以及它拥有的任何类中,它们也不由 cycle_detection_mixin 管理。没有失败。这会在系统中造成一定程度的脆弱性;必须为每个类型拥有的 circle_ptr 实例花费人力。



    这还需要 circle_ptr 包含3种指针类型:指向您从 operator * 获得的对象的指针,指向实际托管存储的指针以及指针对该实例的所有者。实例必须包含指向其所有者的指针的原因是它是每个实例的数据,而不是与块本身相关的信息。它是 circle_ptr 的实例,它需要能够在它的所有者被反弹时告诉它的所有者,所以实例需要这些数据。



    这必须是静态开销。您无法知道何时 circle_ptr 实例处于另一种类型中,何时不是。因此,每一个 circle_ptr ,即使那些不使用循环检测功能的人也必须承担这3个指针的成本。



    因此,这不仅需要很大程度的脆弱性,而且还很昂贵,将类型的尺寸膨胀了50%。用这种类型替换 shared_ptr (或者更重要的一点,用这个功能增加 shared_ptr )是不可行的。 / p>

    另一方面,您不再需要从 cycle_detector_mixin 派生的用户实现一种获取 circle_ptr 个实例。相反,您可以使用 circle_ptr 实例注册自己的类。这允许可以循环的 circle_ptr 实例直接与它们自己的 cycle_detector_mixin 进行对话。



    所以有一些东西。



    封装和不变量



    需要能够告诉一个类使其中一个 circle_ptr 对象无效从根本上改变了类可以与其任何 circle_ptr 成员进行交互的方式。

    不变量是一段代码假设为真的状态,因为它在逻辑上不可能是假的。如果你检查一个 const int 变量> 0,那么你为后面的代码建立了一个不变量,这个值是正值。



    存在封装以允许您能够在类中构建不变量。仅靠构造函数是不行的,因为外部代码可以修改类存储的任何值。封装允许您防止外部代码进行此类修改。因此,您可以为类存储的各种数据开发不变量。



    封装是



    使用 shared_ptr ,可以围绕这样一个指针的存在构建一个不变量。你可以设计你的类,使指针永远不为空。因此,没有人必须检查它是否为空。



    circle_ptr 不是这种情况。如果您实现 cycle_detector_mixin ,那么您的代码必须能够处理任何这些 circle_ptr 变为null的实例。因此,你的析构函数不能认为它们是有效的,你的析构函数调用的任何代码都不会做出这样的假设。

    因此,你的类不能建立与 circle_ptr 指向的对象不变。至少,如果它是 cycle_detector_mixin 及其关联注册和whatnot的一部分的话。



    您可以争辩说,您的设计不会在技术上破坏封装,因为 circle_ptr 实例仍然可以是私有的。但是班级愿意放弃封装到周期检测系统。因此,这个类不能再保证某些类型的不变量。



    这听起来像破封装给我。



    < h3>线程安全性

    为了访问 weak_ptr ,用户必须 lock 它。这会返回一个 shared_ptr ,这可以确保对象保持活动状态(如果仍然存在的话)。锁定是一个原子操作,就像引用递增/递减一样。所以这是所有线程安全的。



    circle_ptr s可能不是线程安全的。如果另一个线程释放了最后一个非循环引用,那么 circle_ptr 可能会从另一个线程变为无效。



    我对此并不完全确定。可能是因为这种情况只会在对象已销毁或正在使用非拥有引用的情况下进行数据竞争时出现。但我不确定你的设计是否可以线程安全。

    毒力因素



    这个想法令人难以置信病毒。循环引用可能发生的其他类型必须实现此接口。这不是你可以放在一个类型上的东西。为了获得好处,每一种可以参与循环参考的类型都必须使用它。一致且正确。

    如果您试图让 circle_ptr 要求它管理的对象实现 cycle_detector_mixin ,那么你就不可能将这样的指针与任何其他类型一起使用。它不会替代(或扩充) shared_ptr 。因此,有没有办法让编译器来帮助检测意外的误操作。



    当然,也有 make_shared_from_this 的意外误用这是编译器无法检测到的。但是,这不是一个病毒式的构造。因此,对于那些需要这个功能的人来说,这只是一个问题。相比之下,从 cycle_detector_mixin 获得收益的唯一方法是尽可能全面地使用它。



    同样地重要的是,因为这个想法如此病毒,你会使用它很多。因此,与 make_shared_from_this 的用户相比,您很可能会遇到多重继承问题。这不是一个小问题。特别是因为 cycle_detector_mixin 可能会使用 static_cast 来访问派生类,因此您将无法使用虚拟继承。

    Summation



    所以这里是你必须做的,没有失败,为了检测周期,没有其中编译器将验证:


    1. 参与循环的每个类都必须从 cycle_detector_mixin


    2. 无论何时, cycle_detector_mixin - 派生类都会构造 circle_ptr 实例(直接或间接,但不在自身派生自 cycle_detector_mixin 的类中),传递一个指向自己的指针 cycle_ptr


    3. 不要认为任何 cycle_ptr 类的子对象是有效的。可能甚至会由于线程问题而在成员函数中变得无效。


    以下是成本:
    $ b


    1. 循环检测 cycle_detector_mixin 中的数据结构。


    2. 每个 cycle_ptr 必须大50%,即使是没有用于循环检测的大小也不例外。

      >



    关于所有权的误解



    最终,直到对 shared_ptr 实际存在的误解。


    循环检测器是不必要的,因为循环不是那么频繁,并且可以使用 std :: weak_ptr 来轻松避免。事实上,循环很容易以许多简单的数据结构 - 例如孩子们有一个指向父母的反向指针的树,或者一个双向链表。在某些情况下,复杂系统中异构对象之间的循环仅偶尔形成某些数据模式,难以预测和避免。在某些情况下,用弱变量替换哪个指针是远远不明显的。


    这是一个非常普遍的通用参数GC。这个观点的问题是,它通常会假设智能指针的使用是无效的。



    要使用 shared_ptr 的意思是。如果一个类存储 shared_ptr ,表示该类拥有该对象的所有权



    所以解释一下:为什么链表中的节点需要拥有下一个和前一个节点?为什么树中的子节点需要拥有其父节点?哦,他们需要能够引用其他节点。但是他们不需要控制它们的生命周期。例如,我将以<数组code> unique_ptr 给他们的孩子,用一个指向父母的指针。一个常规指针,而不是一个智能指针。毕竟,如果树构建正确,父母将拥有自己的孩子。所以如果一个子节点存在,它就是父节点必须存在;如果没有一个有效的父项,该子项就不能存在。


    使用双链表时,我可能使左指针为 unique_ptr ,右边是常规指针。或相反亦然;一种方式并不比其他方式好。



    您的想法似乎是我们应该始终使用 shared_ptr 事情,并让自动系统找出如何处理这些问题。无论是循环引用还是其他,只要让系统找出它就行。



    这不是 shared_ptr 的用处。智能指针的目标是不是,您不再考虑所有权;这就是说,您可以直接在代码中 表示拥有关系。 这是否改善了使用 weak_ptr 打破循环?无论什么时候周期可能发生和做额外的工作,你现在都会做一堆额外的工作 。非常脆弱的工作;如果你做错了,你最好不要错过一个你应该使用 weak_ptr 的地方。只有它更糟,因为你可能认为你的代码是安全的。



    安全性的错觉比没有安全性更糟糕。至少后者让你小心。



    你能否实现这样的事情?有可能。它是标准库的适当类型吗?不,它太脆弱了。你必须在任何时候,任何时候,任何时候,任何地方都可以正确地实施它,或者你什么都得不到。



    权威参考



    对于从未提出过的内容,不存在权威性引用,为了标准化,建议甚至想象。升压有没有这样的类型,这样的结构被甚至从来没有考虑升压:: shared_ptr的。即使是第一个智能指针文件(PDF)也从未考虑过可能性。将 shared_ptr 扩展为能够通过手动操作自动处理周期的主题为甚至没有在标准提案论坛上讨论过 far stupider想法已被审议过。



    最接近我可以提供的参考是 shared_ptr 和 weak_ptr 部分相同(这是他们甚至不认为有可能写一个 shared_ptr 来允许一个 shared_ptr< T> U T 的基数时,c $ c>到a shared_ptr< U> )。但即便如此,它明确表示不会收集周期。它没有花太多时间为什么不,但它确实表明了这一点:


    但是,收集的对象的周期清理
    功能有问题。如果A和B可以从
    彼此到达,那么首先销毁任何一个将违反
    的订单保证,留下一个悬挂指针。如果
    收集器任意打破了这个循环,程序员将
    没有真正的订购保证,并且可能导致微妙的,与时间相关的
    错误。迄今为止,没有人为此问题设计出安全的
    通用解决方案[Hayes 92]。

    我指出的封装/不变问题:使一个无效类型的指针成员打破一个不变。



    基本上,很少有人甚至考虑过这种可能性,谁迅速抛弃它是不切实际的。如果你真的相信他们错了,证明它的最好方法就是自己实施它。然后提出它用于标准化。


    C++11 introduced reference-counted smart pointers, std::shared_ptr. Being reference counted, these pointers are unable to automatically reclaim cyclic data structures. However, automatic collection of reference cycles was shown to be possible, for example by Python and PHP. To distinguish this technique from garbage collection, the rest of the question will refer to it as cycle breaking.

    Given that there seem to be no proposals to add equivalent functionality to C++, is there a fundamental reason why a cycle breaker similar to the ones already deployed in other languages wouldn't work for std::shared_ptr?

    Note that this question doesn't boil down to "why isn't there a GC for C++", which has been asked before. A C++ GC normally refers to a system that automatically manages all dynamically allocated objects, typically implemented using some form of Boehm's conservative collector. It has been pointed out that such a collector is not a good match for RAII. Since a garbage collector primarily manages memory, and might not even be called until there is a memory shortage, and C++ destructors manage other resources, relying on the GC to run destructors would introduce non-determinism at best and resource starvation at worst. It has also bee pointed out that a full-blown GC is largely unnecessary in the presence of the more explicit and predictable smart pointers.

    However, a library-based cycle breaker for smart pointers (analogous to the one used by reference-counted interpreters) would have important differences from a general-purpose GC:

    • It only cares about objects managed through shared_ptr. Such objects already participate in shared ownership, and thus have to handle delayed destructor invocation, whose exact timing depends on ownership structure.
    • Due to its limited scope, a cycle breaker is unconcerned with patterns that break or slow down Boehm GC, such as pointer masking or huge opaque heap blocks that contain an occasional pointer.
    • It can be opt-in, like std::enable_shared_from_this. Objects that don't use it don't have to pay for the additional space in the control block to hold the cycle breaker metadata.
    • A cycle breaker doesn't require a comprehensive list of "root" objects, which is hard to obtain in C++. Unlike a mark-sweep GC which finds all live objects and discards the rest, a cycle breaker only traverses objects that can form cycles. In existing implementations, the type needs to provide help in the form of a function that enumerates references (direct or indirect) to other objects that can participate in a cycle.
    • It relies on regular "destroy when reference count drops to zero" semantics to destroy cyclic garbage. Once a cycle is identified, the objects that participate in it are requested to clear their strongly-held references, for example by calling reset(). This is enough to break the cycle and would automatically destroy the objects. Asking the objects to provide and clear its strongly-held references (on request) makes sure that the cycle breaker does not break encapsulation.

    Lack of proposals for automatic cycle breaking indicates that the idea was rejected for practical or philosophical reasons. I am curious as what the reasons are. For completeness, here are some possible objections:

    • "It would introduce non-deterministic destruction of cyclic shared_ptr objects." If the programmer were in control of the cycle breaker's invocation, it would not be non-deterministic. Also, once invoked, the cycle breaker's behavior would be predictable - it would destroy all currently known cycles. This is akin to how shared_ptr destructor destroys the underlying object once its reference count drops to zero, despite the possibility of this causing a "non-deterministic" cascade of further destructions.

    • "A cycle breaker, just like any other form of garbage collection, would introduce pauses in program execution." Experience with runtimes that implement this feature shows that the pauses are minimal because the GC only handles cyclic garbage, and all other objects are reclaimed by reference counting. If the cycle detector is never invoked automatically, the cycle breaker's "pause" could be a predictable consequence of running it, similar to how destroying a large std::vector might run a large number of destructors. (In Python, the cyclic gc is run automatically, but there is API to disable it temporarily in code sections where it is not needed. Re-enabling the GC later will pick up all cyclic garbage created in the meantime.)

    • "A cycle breaker is unnecessary because cycles are not that frequent and they can be easily avoided using std::weak_ptr." Cycles in fact turn up easily in many simple data structures - e.g. a tree where children have a back-pointer to the parent, or a doubly-linked list. In some cases, cycles between heterogenous objects in complex systems are formed only occasionally with certain patterns of data and are hard to predict and avoid. In some cases it is far from obvious which pointer to replace with the weak variant.

    解决方案

    There are a number of issues to be discussed here, so I've rewritten my post to better condense this information.

    Automatic cycle detection

    Your idea is to have a circle_ptr smart pointer (I know you want to add it to shared_ptr, but it's easier to talk about a new type to compare the two). The idea is that, if the type that the smart pointer is bound to derives from some cycle_detector_mixin, this activates automatic cycle detection.

    This mixin also requires that the type implement an interface. It must provide the ability to enumerate all of the circle_ptr instances directly owned by that instance. And it must provide the means to invalidate one of them.

    I submit that this is a highly impractical solution to this problem. It is excessively fragile and requires immense amounts of manual work from the user. And therefore, it is not appropriate for inclusion in the standard library. And here are some reasons why.

    Determinism and cost

    "It would introduce non-deterministic destruction of cyclic shared_ptr objects." Cycle detection only happens when a shared_ptr's reference count drops to zero, so the programmer is in control of when it happens. It would therefore not be non-deterministic. Its behavior would be predictable - it would destroy all currently known cycles from that pointer. This is akin to how shared_ptr destructor destroys the underlying object once its reference count drops to zero, despite the possibility of this causing a "non-deterministic" cascade of further destructions.

    This is true, but not in a helpful way.

    There is a substantial difference between the determinism of regular shared_ptr destruction and the determinism of what you suggest. Namely: shared_ptr is cheap.

    shared_ptr's destructor does an atomic decrement, followed by a conditional test to see if the value was decremented to zero. If so, a destructor is called and memory is freed. That's it.

    What you suggest makes this more complicated. Worst-case, every time a circle_ptr is destroyed, the code will have to walk through data structures to determine if there's a cycle. Most of the time, cycles won't exist. But it still has to look for them, just to make sure. And it must do so every single time you destroy a circle_ptr.

    Python et. al. get around this problem because they are built into the language. They are able to see everything that's going on. And therefore, they can detect when a pointer is assigned at the time those assignments are made. In this way, such systems are constantly doing small amounts of work to build up cyclic chains. Once a reference goes away, it can look at its data structures and take action if that creates a cyclical chain.

    But what you're suggesting is a library feature, not a language feature. And library types can't really do that. Or rather, they can, but only with help.

    Remember: an instance of circle_ptr cannot know the subobject it is a member of. It cannot automatically transform a pointer to itself into a pointer to its owning class. And without that ability, it cannot update the data structures in the cycle_detector_mixin that owns it if it is reassigned.

    Now, it could manually do this, but only with help from its owning instance. Which means that circle_ptr would need a set of constructors that are given a pointer to its owning instance, which derives from cycle_detector_mixin. And then, its operator= would be able to inform its owner that it has been updated. Obviously, the copy/move assignment would not copy/move the owning instance pointer.

    Of course, this requires the owning instance to give a pointer to itself to every circle_ptr that it creates. In every constructor&function that creates circle_ptr instances. Within itself and any classes it owns which are not also managed by cycle_detection_mixin. Without fail. This creates a degree of fragility in the system; manual effort must be expended for each circle_ptr instance owned by a type.

    This also requires that circle_ptr contain 3 pointer types: a pointer to the object you get from operator*, a pointer to the actual managed storage, and a pointer to that instance's owner. The reason that the instance must contain a pointer to its owner is that it is per-instance data, not information associated with the block itself. It is the instance of circle_ptr that needs to be able to tell its owner when it is rebound, so the instance needs that data.

    And this must be static overhead. You can't know when a circle_ptr instance is within another type and when it isn't. So every circle_ptr, even those that don't use the cycle detection features, must bear this 3 pointer cost.

    So not only does this require a large degree of fragility, it's also expensive, bloating the type's size by 50%. Replacing shared_ptr with this type (or more to the point, augmenting shared_ptr with this functionality) is just not viable.

    On the plus side, you no longer need users who derive from cycle_detector_mixin to implement a way to fetch the list of circle_ptr instances. Instead, you have the class register itself with the circle_ptr instances. This allows circle_ptr instances that could be cyclic to talk directly to their owning cycle_detector_mixin.

    So there's something.

    Encapsulation and invariants

    The need to be able to tell a class to invalidate one of its circle_ptr objects fundamentally changes the way the class can interact with any of its circle_ptr members.

    An invariant is some state that a piece of code assumes is true because it should be logically impossible for it to be false. If you check that a const int variable is > 0, then you have established an invariant for later code that this value is positive.

    Encapsulation exists to allow you to be able to build invariants within a class. Constructors alone can't do it, because external code could modify any values that the class stores. Encapsulation allows you to prevent external code from making such modifications. And therefore, you can develop invariants for various data stored by the class.

    This is what encapsulation is for.

    With a shared_ptr, it is possible to build an invariant around the existence of such a pointer. You can design your class so that the pointer is never null. And therefore, nobody has to check for it being null.

    That's not the case with circle_ptr. If you implement the cycle_detector_mixin, then your code must be able to handle the case of any of those circle_ptr instances becoming null. Your destructor therefore cannot assume that they are valid, nor can any code that your destructor calls make that assumption.

    Your class therefore cannot establish an invariant with the object pointed to by circle_ptr. At least, not if it's part of a cycle_detector_mixin with its associated registration and whatnot.

    You can argue that your design does not technically break encapsulation, since the circle_ptr instances can still be private. But the class is willingly giving up encapsulation to the cycle detection system. And therefore, the class can no longer ensure certain kinds of invariants.

    That sounds like breaking encapsulation to me.

    Thread safety

    In order to access a weak_ptr, the user must lock it. This returns a shared_ptr, which ensures that the object will remain alive (if it still was). Locking is an atomic operation, just like reference incrementing/decrementing. So this is all thread-safe.

    circle_ptrs may not be very thread safe. It may be possible for a circle_ptr to become invalid from another thread, if the other thread released the last non-circular reference to it.

    I'm not entirely sure about this. It may be that such circumstances only appear if you've already had a data race on the object's destruction, or are using a non-owning reference. But I'm not sure that your design can be thread safe.

    Virulence factors

    This idea is incredibly viral. Every other type where cyclic references can happen must implement this interface. It's not something you can put on one type. In order to get the benefits, every type that could participate in a cyclical reference must use it. Consistently and correctly.

    If you try to make circle_ptr require that the object it manages implement cycle_detector_mixin, then you make it impossible to use such a pointer with any other type. It wouldn't be a replacement of (or augmentation for) shared_ptr. So there is no way for a compiler to help detect accidental misuse.

    Sure, there are accidental misuses of make_shared_from_this that cannot be detected by compilers. However, that is not a viral construct. It is therefore only a problem for those who need this feature. By contrast, the only way to get a benefit from cycle_detector_mixin is to use it as comprehensively as possible.

    Equally importantly, because this idea is so viral, you will be using it a lot. And therefore, you are far more likely to encounter the multiple-inheritance problem than users of make_shared_from_this. And that's not a minor issue. Especially since cycle_detector_mixin will likely use static_cast to access the derived class, so you won't be able to use virtual inheritance.

    Summation

    So here is what you must do, without fail, in order to detect cycles, none of which the compiler will verify:

    1. Every class participating in a cycle must be derived from cycle_detector_mixin.

    2. Anytime a cycle_detector_mixin-derived class constructs a circle_ptr instance within itself (either directly or indirectly, but not within a class that itself derives from cycle_detector_mixin), pass a pointer to yourself to that cycle_ptr.

    3. Don't assume that any cycle_ptr subobject of a class is valid. Possibly even to the extent of becoming invalid within a member function thanks to threading issues.

    And here are the costs:

    1. Cycle-detecting data structures within cycle_detector_mixin.

    2. Every cycle_ptr must be 50% bigger, even the ones that aren't used for cycle detection.

    Misconceptions about ownership

    Ultimately, I think this whole idea comes down to a misconception about what shared_ptr is actually for.

    "A cycle detector is unnecessary because cycles are not that frequent and they can be easily avoided using std::weak_ptr." Cycles in fact turn up easily in many simple data structures - e.g. a tree where children have a back-pointer to the parent, or a doubly-linked list. In some cases, cycles between heterogenous objects in complex systems are formed only occasionally with certain patterns of data and are hard to predict and avoid. In some cases it is far from obvious which pointer to replace with the weak variant.

    This is a very common argument for general-purpose GC. The problem with this argument is that it usually makes an assumption about the use of smart pointers that just isn't valid.

    To use a shared_ptr means something. If a class stores a shared_ptr, that represents that the class has ownership of that object.

    So explain this: why does a node in a linked list need to own both the next and previous nodes? Why does a child node in a tree need to own its parent node? Oh, they need to be able to reference the other nodes. But they do not need to control the lifetime of them.

    For example, I would implement a tree node as an array of unique_ptr to their children, with a single pointer to the parent. A regular pointer, not a smart pointer. After all, if the tree is constructed correctly, the parent will own its children. So if a child node exists, it's parent node must exist; the child cannot exist without having a valid parent.

    With a double linked list, I might have the left pointer be a unique_ptr, with the right being a regular pointer. Or vice-versa; one way is no better than the other.

    Your mentality seems to be that we should always be using shared_ptr for things, and just let the automatic system work out how to deal with the problems. Whether it's circular references or whatever, just let the system figure it out.

    That's not what shared_ptr is for. The goal of smart pointers is not that you don't think about ownership anymore; it's that you can express ownership relationships directly in code.

    Overall

    How is any of this an improvement over using weak_ptr to break cycles? Instead of recognizing when cycles might happen and doing extra work, you now do a bunch of extra work everywhere. Work that is exceedingly fraglile; if you do it wrong, you're no better off than if you missed a place where you should have used weak_ptr. Only it's worse, because you probably think your code is safe.

    The illusion of safety is worse than no safety at all. At least the latter makes you careful.

    Could you implement something like this? Possibly. Is it an appropriate type for the standard library? No. It's just too fragile. You must implement it correctly, at all times, in all ways, everywhere that cycles might appear... or you get nothing.

    Authoritative references

    There can be no authoritative references for something that was never proposed, suggested, or even imagined for standardization. Boost has no such type, and such constructs were never even considered for boost::shared_ptr. Even the very first smart pointer paper (PDF) never considered the possibility. The subject of expanding shared_ptr to automatically be able to handle cycles through some manual effort has never been discussed even on the standard proposal forums where far stupider ideas have been deliberated.

    The closest to a reference I can provide is this paper from 1994 about a reference-counted smart pointer. This paper basically talks about making the equivalent of shared_ptr and weak_ptr part of the language (this was in the early days; they didn't even think it was possible to write a shared_ptr that allowed casting a shared_ptr<T> to a shared_ptr<U> when U is a base of T). But even so, it specifically says that cycles would not be collected. It doesn't spend much time on why not, but it does state this:

    However, cycles of collected objects with clean-up functions are problematic. If A and B are reachable from each other, then destroying either one first will violate the ordering guarantee, leaving a dangling pointer. If the collector breaks the cycle arbitrarily, programmers would have no real ordering guarantee, and subtle, time-dependent bugs could result. To date, no one has devised a safe, general solution to this problem [Hayes 92].

    This is essentially the encapsulation/invariant issue I pointed out: making a pointer member of a type invalid breaks an invariant.

    So basically, few people have even considered the possibility, and those few who did quickly discarded it as being impractical. If you truly believe that they're wrong, the single best way to prove it is by implementing it yourself. Then propose it for standardization.

    这篇关于`std :: shared_ptr`的自动循环断路器的可行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆