多态类型的连续存储 [英] Contiguous storage of polymorphic types

查看:76
本文介绍了多态类型的连续存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否存在任何可行的方式来连续存储多态对象数组,以便可以合法地调用基于公共基础上的virtual方法(并将分派给子类中正确的重写方法) ).

I'm interested to know if there is any viable way to contiguously store an array of polymorphic objects, such that virtual methods on a common base can be legally called (and would dispatch to the correct overridden method in a subclass).

例如,考虑以下类别:

struct B {
  int common;
  int getCommon() { return common; }
  virtual int getVirtual() const = 0;
}

struct D1 : B {
  virtual int getVirtual final const { return 5 };
}

struct D2 : B {
  int d2int;
  virtual int getVirtual final const { return d2int };
}

我想分配一个D1和D2对象的连续数组,并将它们视为B对象,包括调用getVirtual(),它将根据对象类型委派给适当的方法.从概念上讲,这似乎是可能的:每个对象通常通过嵌入式 vtable 指针知道其类型,因此可以想象将 n 个对象存储在数组中n * max(sizeof(D1), sizeof(D2)) unsigned char的对象,并使用放置newdelete初始化对象,并将unsigned char指针转换为B*.我敢肯定,强制转换是不合法的.

I would like to allocate a contiguous array of D1 and D2 objects, and treat them as B objects, including calling getVirtual() which will delegate to the appropriate method depending on the object type. Conceptually this seems possible: each object knows its type, typically via an embedded vtable pointer, so you could imagine, storing n objects in an array of n * max(sizeof(D1), sizeof(D2)) unsigned char, and using placement new and delete to initialize the objects, and casting the unsigned char pointer to B*. I'm pretty sure a cast is not legal, however.

也可以想象创建一个联合,例如:

One could also imagine creating a union like:

union Both {
  D1 d1;
  D2 d2;
}

,然后创建一个Both数组,并使用new放置来创建适当类型的对象.但是,这似乎仍然没有提供一种安全地实际调用B::getVirtual()的方法.您不知道元素的最后存储类型,那么如何获取B*?您需要使用 &u.d1&u.d2,但是您不知道使用哪一个!实际上,有一些关于初始公共子序列"的特殊规则,可让您在元素具有某些共同特征的并集上做一些事情,但这仅适用于标准布局类型.具有虚拟方法的类不是标准的布局类型.

and then creating an array of Both, and using placement new to create the objects of the appropriate type. This again doesn't seem to offer a way to actually call B::getVirtual() safely, however. You don't know the last stored type for the elements, so how are you going to get your B*? You need to use either &u.d1 or &u.d2 but you don't know which! There are actually special rules about "initial common subsequences" which let you do some things on unions where the elements share some common traits, but this only applies to standard layout types. Classes with virtual methods are not standard layout types.

有什么方法可以继续吗?理想情况下,解决方案看起来像是非切片std::vector<B>,它实际上可以包含B的多态子类.是的,如果需要,可能会规定所有可能的子类都是预先知道的,但是更好的解决方案将只需要知道任何子类的最大可能大小(如果有人尝试添加太大"的对象,则在编译时会失败)

Is there any way to proceed? Ideally, a solution would look something like a non-slicing std::vector<B> that can actually contain polymorphic subclasses of B. Yes, if required one might stipulate that all possible subclasses are known up front, but a better solution would only need to know the maximum likely size of any subclass (and fail at compile time if someone tries to add a "too big" object).

如果无法使用内置的virtual机制,那么提供类似功能的其他替代方案也将很有趣.

If it isn't possible to do with the built-in virtual mechanism, other alternatives that offer similar functionality would also be interesting.

毫无疑问,有人会问为什么",所以这里有一些动机:

No doubt someone will ask "why", so here's a bit of motivation:

众所周知,使用virtual函数实现运行时多态性实际上是中等开销调用虚拟方法.

It seems generally well-known that using virtual functions to implement runtime polymorphism comes at a moderate overhead when actually calling virtual methods.

但是,很少有人讨论这样的事实,即使用带有虚拟方法的类来实现多态性通常意味着一种完全不同的方式来管理基础对象的内存.您不能只向标准容器中添加不同类型的对象(而是一个通用的基类):如果您有子类D1D2都从基类B派生,则std::vector<B>会切片任何D1或添加了D2个对象.对于此类对象的数组也是如此.

Not as often discussed, however, is the fact that using classes with virtual methods to implement polymorphism usually implies a totally different way of managing the memory for the underlying objects. You cannot just add objects of different types (but a common base) to a standard container: if you have subclasses D1 and D2, both derived from base B, a std::vector<B> would slice any D1 or D2 objects added. Similarly for arrays of such objects.

通常的解决方案是改为使用基类的 pointers 的容器或数组,例如std::vector<B*>std::vector<unique_ptr<B>>std::vector<shared_ptr<B>>.至少,这会在访问每个元素 1 时增加一个额外的间接访问,对于智能指针而言,它会破坏

The usual solution is to instead use containers or arrays of pointers to the base class, like std::vector<B*> or perhaps std::vector<unique_ptr<B>> or std::vector<shared_ptr<B>>. At a minimum, this adds an extra indirection when accessing each element1, and in the case of the smart pointers, it breaks common container optimizations. If you are actually allocating each object via new and delete (including indirectly), then the time and memory cost of storing your objects just increased by a large amount.

从概念上讲,一个公共基类的各个子类似乎可以连续存储(每个对象将消耗相同的空间:最大支持对象的空间),并且指向该对象的指针可以被视为基类-类指针.在某些情况下,这可以极大地简化并加快此类多态对象的使用.当然,总的来说,这可能是一个糟糕的主意,但是出于这个问题的目的,让我们假设它具有某些特殊用途.

Conceptually it seems like various subclasses of a common base can be stored consecutively (each object would consume the same amount of space: that of the largest supported object), and that a pointer to an object could be treated as a base-class pointer. In some cases, this could greatly simply and speed-up use of such polymorphic objects. Of course, in general, it's probably a terrible idea, but for the purposes of this question let's assume it has some niche application.

1 除其他外,此间接操作几乎可以防止对应用于所有元素的同一操作进行任何矢量化,并损害引用的局部性,从而影响缓存和预取.

1 Among other things, this indirection pretty much prevents any vectorization of the same operation applied to all elements and harms locality of reference with implications both for caching and prefetching.

推荐答案

您几乎和工会在一起了.您可以使用带标签的联合(在循环中添加if进行区分)或std::variant(它通过std::find引入了一种双重分派以使对象脱离)来执行此操作.无论哪种情况,您都无法在动态存储上进行分配,因此可以保证数据的局部性.
无论如何,正如您所看到的,在任何情况下都可以用普通的直接调用替换额外级别的间接调用(虚拟调用).您需要以某种方式擦除(多态性不过是一种类型擦除,想一想),并且您不能直接从具有简单的已擦除对象中退出.称呼.需要if或进行额外的调用以填补额外的间接级别的空白.

You were almost there with your union. You can use either a tagged union (add an if to discriminate in your loop) or a std::variant (it introduces a kind of double dispatching through std::find to get the object out of it) to do that. In neither case you have allocations on the dynamic storage, so data locality is guaranteed.
Anyway, as you can see, in any case you can replace an extra level of indirection (the virtual call) with a plain direct call. You need to erase the type somehow (polymorphism is nothing more than a kind of type erasure, think of it) and you cannot get out directly from an erased object with a simple call. ifs or extra calls to fill the gap of the extra level of indirection are required.

以下是使用std::variantstd::find的示例:

#include<vector>
#include<variant>

struct B { virtual void f() = 0; };
struct D1: B { void f() override {} };
struct D2: B { void f() override {} };

void f(std::vector<std::variant<D1, D2>> &vec) {
    for(auto &&v: vec) {
        std::visit([](B &b) { b.f(); }, v);
    }
}

int main() {
    std::vector<std::variant<D1, D2>> vec;
    vec.push_back(D1{});
    vec.push_back(D2{});
    f(vec);
}

因为它真的很接近,发布一个使用标记联合的示例也不值得.

For it's really close, it doesn't worth it posting also an example that uses tagged unions.

另一种实现方法是通过派生类的单独向量和支持向量以正确顺序对其进行迭代的支持向量.
这是一个显示它的最小示例:

Another way to do that is by means of separate vectors for the derived classes and a support vector to iterate them in the right order.
Here is a minimal example that shows it:

#include<vector>
#include<functional>

struct B { virtual void f() = 0; };
struct D1: B { void f() override {} };
struct D2: B { void f() override {} };

void f(std::vector<std::reference_wrapper<B>> &vec) {
    for(auto &w: vec) {
        w.get().f();
    }
}

int main() {
    std::vector<std::reference_wrapper<B>> vec;
    std::vector<D1> d1;
    std::vector<D2> d2;

    d1.push_back({});
    vec.push_back(d1.back());
    d2.push_back({});
    vec.push_back(d2.back());

    f(vec);
}

这篇关于多态类型的连续存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆