集合迭代器中的类型不完整 [英] Incomplete types in collection iterators

查看:98
本文介绍了集合迭代器中的类型不完整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为自己编写了一个自定义的STL样式的容器,该容器内部使用AVL树来组织数据.现在,在一个项目中,我希望有一个迭代器作为其成员:

class vertex {
    ...
    avl_tree<vertex>::iterator partner;
    ...
}

但是,我得到了错误:

error: ‘avl_tree<T, A>::node::data’ has incomplete type
         T data;
           ^

根据我在SO和其他网站上所读到的内容,vertex是一个不完整的类型,直到它被完全定义为止. avl_tree<T,A>::node是我用来管理树的私有结构,它的成员之间有T data;,但是如果T不完整,则是非法的.奇怪的是,当我改用std::list时,没有这样的问题,我知道这是未定义的行为.

是否有解决此问题的简便方法? avl_tree<T,A>::iterator在内部仅维护一个指针node *ptr,这对于不完整的类型应该不成问题,因为指针具有固定的大小.但是我不想将node类公开给public,我想使用iterator s.不管参数如何,iterator总是具有相同的大小,那么有没有办法强迫编译器确认这一事实呢?


结构概述:

template <typename T, typename A = std::allocator<T> >
class avl_tree {
private:
    class node {
    public:
        T data;
        avl_tree *tree;
        short depth;
        size_type n;
        node *parent;
        node *left_child;
        node *right_child;
    };
public:
    class iterator {
    private:
        node *ptr;
    };
private:
    using NodeAlloc = typename std::allocator_traits<A>::template rebind_alloc<node>;
    NodeAlloc alloc;
    node root;
};

完整代码可在 GitHub 上找到.

解决方案

完整类型

首先,让我们看一个简单的示例,该示例说明了定义UDT(用户定义类型)所需的条件.

struct Foo
{
     struct Bar bar;
};

鉴于以上代码,只有理解了struct Bar的定义,编译器才能正确构建它.否则,它将不知道如何对齐此bar数据成员以及结构实际的大小(以及要添加多少填充以确保其正确对齐).

因此,要以这种方式定义Foo,就需要同样定义Bar.类型依赖关系如下所示:

Foo->Bar

如果我们将上面的代码更改为此:

struct Foo
{
     struct Bar* bar;
};

...突然之间,我们正在寻找一种截然不同的方案.在这种情况下,允许Bar是不完整的类型(已声明但未定义),因为Foo仅存储指向它的指针.指针实际上是POD(普通旧数据类型).它指向Bar还是Baz的大小和对齐要求均不变.结果,这里的类型依赖关系基本上是:

Foo->POD

因此,即使Bar的定义未知,我们也可以编译此代码.当然,如果编译器遇到试图访问Bar成员的代码或构造该代码或进行任何需要有关Bar信息的操作,特别是它将产生错误,除非在该位置可用Bar的定义时间.

循环/递归类型依赖项

让我们看一个简单的递归类型依赖示例:

struct Foo
{
    struct Foo next;
};

在这种情况下,为了正确定义Foo,我们必须正确定义Foo.糟糕-无限递归.即使以某种方式允许这样做,系统也会希望为Foo分配无限量的内存.在这种情况下,类型依赖关系如下所示:

Foo->Foo->Foo->...

即使我们在中间引入了新类型,仍然存在相同的问题:

struct Foo
{
     struct Node next;
};

struct Node
{
     struct Foo element;
};

由于类型依赖关系的周期性,在这里我们仍然会遇到编译器错误:

Foo->Node->Foo->Node->Foo->...

除此之外,我们还有鸡还是鸡蛋的问题.如果FooNode之前,则不能在Foo定义时定义Node;如果NodeFoo之前,则在Foo定义时不能定义Node./p>

要打破循环,我们可以添加一个间接寻址:

struct Foo
{
    struct Node* next;
};

struct Node
{
    struct Foo element;
};

现在我们有:

Foo->POD
Node->Foo->POD

...这是有效的,可以避免循环类型依赖性,并且可以正常编译.

树示例

让您更接近树的示例,让我们看一下这样的情况:

template <class T>
struct Tree
{
     struct Node
     {
         T element;
     };
     Node root;
};

在这种情况下,类型依赖关系如下所示:

Tree->Node->T->...

如果T不依赖于TreeNode的定义,则可以很好地编译.

但是,在您的情况下,

Tvertex,它取决于存储有存储顶点的节点的树的类型定义.结果,我们遇到了这种情况:

avl_tree<vertex>->node->vertex->avl_tree<vertex>->node->vertex->...

...,因此我们再次具有该循环类型依赖项.切断这种依赖关系的最简单方法之一(也许是这种链接结构的最常用方法)是将root/head/tail作为指针存储.

template <class T>
struct Tree
{
     struct Node
     {
         T element;
     };
     Node* root;
};

这样,我们就切断了类型依赖关系,如下所示:

Tree->POD
Node->T->...

...,或者根据您的示例进行修改:

avl_tree<vertex>->POD
node->vertex->avl_tree<vertex>->POD

...完全可以,并且可以中断循环.

您可能想知道为什么从概念上讲,这需要完整的avl_tree类型定义:

avl_tree<vertex>::iterator partner;

这里的迭代器很好,因为它存储了指向作为POD的节点的指针.但是这里的问题是,即使它只是一个类型名,我们也要尝试访问avl_tree的成员,这要求编译器具有完整的avl_tree类型定义(在我们可能想要的一种理想的粒度级别).递归以要求完整的node定义,然后要求完整的vertex定义.

奇怪的是,当我改用std::list时,没有这样的问题

这是因为std::list通常看起来像这样(给予或进行一些细微的改动):

template <class T, ...>
class list
{
public:
    ...

private:
    struct node
    {
         node* next;
         node* prev;
         T element;
    };
    ...
    node* head;
    node* tail;
};

这里相关的类型依存关系如下:

list<T>->POD
node->T->...

打破类型依赖性

从上面我们可以看到,我们可以通过指针引入间接来切断/破坏类型依赖.这样,我们不再需要用户定义的类型定义,而是可以将UDT依赖项更改为简单的POD依赖项.

可以将间接寻址放置在您喜欢的任何位置,但是对于链接结构,通常最方便的位置是该结构的root/head/tail.这样可以避免使用链接结构的客户端不必担心这些递归/循环类型依赖项.

间接成本

我经常听到的一件事是间接成本",好像这是非常昂贵的.这完全取决于内存访问模式,以及它们与从寄存器到从二级的内存分页一直到内存层次结构的关系.虽然将其视为间接成本"是一种简单而通用的查看方式,因为指针可以指向内存中的所有位置,但真正使我们与众不同的是,我们如何访问内存递归那些指针.

如果我们要在连续的内存空间中顺序遍历链表,则该链表可能会非常高效,在该连续的内存空间中,多个节点可放入缓存行并在逐出之前对其进行访问.它们通常不那么快的地方是由于这样的事实,即节点通常是由通用分配器分配的,而不是一次分配的,因此它们的内容在内存空间中分散和碎片化,并导致遍历期间发生高速缓存未命中.此处最大的区别是内存布局.

因此,如果您担心破坏类型依赖项所需的间接调用的开销,请不必担心.除非只是在非常精细的情况下仅考虑指针的内存大小,否则通常担心的事情是错误的.而是着眼于内存分配的方式,寻找引用的局部性.使用正确的内存分配策略,即使不可避免地依赖大量间接访问的链接结构也可以变得非常有效.

I wrote myself a custom STL-style container which internally uses an AVL tree to organize data. Now, in a project, I want to have an iterator for it as a member:

class vertex {
    ...
    avl_tree<vertex>::iterator partner;
    ...
}

However, I get the error:

error: ‘avl_tree<T, A>::node::data’ has incomplete type
         T data;
           ^

From what I've read on SO and other websites, vertex is an incomplete type until it is completely defined. avl_tree<T,A>::node is a private struct I use to manage the tree and it has T data; amongst its members which is however illegal if T is incomplete. Oddly enough, when I use std::list instead, there is no such problem, which I understand is undefined behavior.

Is there an easy way around this problem? The avl_tree<T,A>::iterator internally maintains just a pointer node *ptr which should not be a problem for incomplete types as pointers have fixed size. But I don't want to expose the node class to the public, I want to use iterators. Still, the iterator will always have the same size, regardless of the templete parameter, so is there a way to force the compiler to acknowledge that fact?


Struct overview:

template <typename T, typename A = std::allocator<T> >
class avl_tree {
private:
    class node {
    public:
        T data;
        avl_tree *tree;
        short depth;
        size_type n;
        node *parent;
        node *left_child;
        node *right_child;
    };
public:
    class iterator {
    private:
        node *ptr;
    };
private:
    using NodeAlloc = typename std::allocator_traits<A>::template rebind_alloc<node>;
    NodeAlloc alloc;
    node root;
};

The full code is available on GitHub.

解决方案

Complete Types

First, let's look at a simple example of what is required in order to define a UDT (user-defined type).

struct Foo
{
     struct Bar bar;
};

Given the above code, the compiler can only properly build it if the definition of struct Bar is understood. Otherwise it cannot know how to align this bar data member along with how large the structure actually is (and how much padding to add to ensure its proper alignment).

As a result, to be able to define Foo in this way requires Bar to likewise be defined. The type dependencies look like this:

Foo->Bar

If we change the code above to this:

struct Foo
{
     struct Bar* bar;
};

... suddenly we're looking at a very different scenario. In this case, Bar is allowed to be an incomplete type (declared but not defined) as Foo only stores a pointer to it. A pointer is effectively a POD (plain old data type). Its size and alignment requirements do not vary whether it's pointing to a Bar or a Baz. As a result, the type dependency here is basically:

Foo->POD

Therefore we can compile this code even though the definition of Bar is unknown. Of course if the compiler encounters code which is trying to access the members of Bar or constructing it or doing anything which requires information about Bar, specifically, it would generate an error unless the definition of Bar is available at that time.

Circular/Recursive Type Dependencies

Let's look at a simple example of a recursive type dependency:

struct Foo
{
    struct Foo next;
};

For this case, in order to properly define Foo, we must properly define Foo. Whoops -- infinite recursion. Even if this was somehow allowed, the system would want to allocate an infinite amount of memory for Foo. In this case, the type dependencies look like this:

Foo->Foo->Foo->...

The same problem persists even if we introduce a new type in the middle:

struct Foo
{
     struct Node next;
};

struct Node
{
     struct Foo element;
};

Here we still get a compiler error due to the cyclical nature of the type dependencies which look like this:

Foo->Node->Foo->Node->Foo->...

Besides that, we have a chicken or the egg kind of problem. Node cannot possibly be defined at the time Foo is defined if Foo precedes Node, and Node cannot possibly be defined at the time Foo is defined if Node precedes Foo.

To break the cycle, we can add an indirection:

struct Foo
{
    struct Node* next;
};

struct Node
{
    struct Foo element;
};

Now we have:

Foo->POD
Node->Foo->POD

... which is valid, avoids circular type dependencies, and compiles just fine.

Tree Example

Getting closer to your tree example, let's look at a case like this:

template <class T>
struct Tree
{
     struct Node
     {
         T element;
     };
     Node root;
};

In this case, the type dependencies look like this:

Tree->Node->T->...

This will compile fine provided that T does not depend on the definition of Tree or Node.

Nevertheless, in your case, T is a vertex which depends on the type definition of a tree which stores a node which stores a vertex. As a result, we have this kind of scenario:

avl_tree<vertex>->node->vertex->avl_tree<vertex>->node->vertex->...

... and thus we have that circular type dependency again. One of the easiest ways to sever this dependency, and perhaps the most commonly-employed for a linked structure like this, is to store the root/head/tail as a pointer.

template <class T>
struct Tree
{
     struct Node
     {
         T element;
     };
     Node* root;
};

With this, we've severed the type dependencies like so:

Tree->POD
Node->T->...

... or, adapted to your example:

avl_tree<vertex>->POD
node->vertex->avl_tree<vertex>->POD

... which is completely fine and breaks the cycle.

You might be wondering why, conceptually, this needs the full avl_tree type definition:

avl_tree<vertex>::iterator partner;

The iterator here is fine since it stores a pointer to a node which is a POD. Yet the problem here is that we're trying to access a member of avl_tree, even though it's just a typename, and this requires the compiler to have a complete type definition of avl_tree (it doesn't work quite at the ideal kind of granular level we might like). This recurses to require the complete definition of node which then requires the complete definition of vertex.

Oddly enough, when I use std::list instead, there is no such problem

This is because std::list typically looks like this (give or take some minor variations):

template <class T, ...>
class list
{
public:
    ...

private:
    struct node
    {
         node* next;
         node* prev;
         T element;
    };
    ...
    node* head;
    node* tail;
};

The relevant type dependencies here look like this:

list<T>->POD
node->T->...

Breaking Type Dependencies

From the above we can see that we can sever/break a type dependency by introducing an indirection through a pointer. In doing so, we no longer require a user-defined type definition and can instead change a UDT dependency to a simple POD dependency.

The indirection can be placed anywhere you like, but for a linked structure, typically the most convenient place is at the root/head/tail of the structure. This prevents the clients using your linked structure from having to worry about these recursive/circular type dependencies.

Cost of Indirection

One of the things I hear a lot is "the cost of indirection", as though this is very expensive. It all depends on memory access patterns and how they relate to the memory hierarchy going all the way from a register to paging in memory from secondary stage. While thinking about this as the "cost of indirection" is a simple and general kind of way to look at it since pointers can point all over the place in memory, what really makes the difference here is how we're accessing memory as a result of deferencing those pointers.

Even a linked list can be quite efficient if we're traversing it sequentially across a contiguous memory space where multiple nodes fit into a cache line and are accessed prior to eviction. Where they're often not so fast is due to the fact that often the nodes are allocated by a general-purpose allocator and not all at once, scattering and fragmenting their contents across the memory space and leading to cache misses during traversal. It's the memory layout that makes the biggest difference here.

So if you're ever worried about the cost of an indirection required to break a type dependency, don't. It's often the wrong thing to worry about unless it's merely about the memory size of a pointer in a very granular case. Instead look to the way the memory is allocated, seek locality of reference. With the right kind of memory allocation strategy, even linked structures which inevitably rely on a lot of indirection can become very efficient.

这篇关于集合迭代器中的类型不完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆