集合迭代器中的类型不完整 [英] Incomplete types in collection iterators
问题描述
我为自己编写了一个自定义的STL样式的容器,该容器内部使用AVL树来组织数据.现在,在一个项目中,我希望有一个迭代器作为其成员:
class vertex {
...
avl_tree<vertex>::iterator partner;
...
}
但是,我得到了错误:
error: ‘avl_tree<T, A>::node::data’ has incomplete type
T data;
^
根据我在SO和其他网站上所读到的内容,vertex
是一个不完整的类型,直到它被完全定义为止. avl_tree<T,A>::node
是我用来管理树的私有结构,它的成员之间有T data;
,但是如果T
不完整,则是非法的.奇怪的是,当我改用std::list
时,没有这样的问题,我知道这是未定义的行为.
是否有解决此问题的简便方法? avl_tree<T,A>::iterator
在内部仅维护一个指针node *ptr
,这对于不完整的类型应该不成问题,因为指针具有固定的大小.但是我不想将node
类公开给public
,我想使用iterator
s.不管iterator
总是具有相同的大小,那么有没有办法强迫编译器确认这一事实呢?
结构概述:
template <typename T, typename A = std::allocator<T> >
class avl_tree {
private:
class node {
public:
T data;
avl_tree *tree;
short depth;
size_type n;
node *parent;
node *left_child;
node *right_child;
};
public:
class iterator {
private:
node *ptr;
};
private:
using NodeAlloc = typename std::allocator_traits<A>::template rebind_alloc<node>;
NodeAlloc alloc;
node root;
};
完整代码可在 GitHub 上找到.
完整类型
首先,让我们看一个简单的示例,该示例说明了定义UDT(用户定义类型)所需的条件.
struct Foo
{
struct Bar bar;
};
鉴于以上代码,只有理解了struct Bar
的定义,编译器才能正确构建它.否则,它将不知道如何对齐此bar
数据成员以及结构实际的大小(以及要添加多少填充以确保其正确对齐).
因此,要以这种方式定义Foo
,就需要同样定义Bar
.类型依赖关系如下所示:
Foo->Bar
如果我们将上面的代码更改为此:
struct Foo
{
struct Bar* bar;
};
...突然之间,我们正在寻找一种截然不同的方案.在这种情况下,允许Bar
是不完整的类型(已声明但未定义),因为Foo
仅存储指向它的指针.指针实际上是POD(普通旧数据类型).它指向Bar
还是Baz
的大小和对齐要求均不变.结果,这里的类型依赖关系基本上是:
Foo->POD
因此,即使Bar
的定义未知,我们也可以编译此代码.当然,如果编译器遇到试图访问Bar
成员的代码或构造该代码或进行任何需要有关Bar
信息的操作,特别是它将产生错误,除非在该位置可用Bar
的定义时间.
循环/递归类型依赖项
让我们看一个简单的递归类型依赖示例:
struct Foo
{
struct Foo next;
};
在这种情况下,为了正确定义Foo
,我们必须正确定义Foo
.糟糕-无限递归.即使以某种方式允许这样做,系统也会希望为Foo
分配无限量的内存.在这种情况下,类型依赖关系如下所示:
Foo->Foo->Foo->...
即使我们在中间引入了新类型,仍然存在相同的问题:
struct Foo
{
struct Node next;
};
struct Node
{
struct Foo element;
};
由于类型依赖关系的周期性,在这里我们仍然会遇到编译器错误:
Foo->Node->Foo->Node->Foo->...
除此之外,我们还有鸡还是鸡蛋的问题.如果Foo
在Node
之前,则不能在Foo
定义时定义Node
;如果Node
在Foo
之前,则在Foo
定义时不能定义Node
./p>
要打破循环,我们可以添加一个间接寻址:
struct Foo
{
struct Node* next;
};
struct Node
{
struct Foo element;
};
现在我们有:
Foo->POD
Node->Foo->POD
...这是有效的,可以避免循环类型依赖性,并且可以正常编译.
树示例
让您更接近树的示例,让我们看一下这样的情况:
template <class T>
struct Tree
{
struct Node
{
T element;
};
Node root;
};
在这种情况下,类型依赖关系如下所示:
Tree->Node->T->...
如果T
不依赖于Tree
或Node
的定义,则可以很好地编译.
T
是vertex
,它取决于存储有存储顶点的节点的树的类型定义.结果,我们遇到了这种情况:
avl_tree<vertex>->node->vertex->avl_tree<vertex>->node->vertex->...
...,因此我们再次具有该循环类型依赖项.切断这种依赖关系的最简单方法之一(也许是这种链接结构的最常用方法)是将root/head/tail
作为指针存储.
template <class T>
struct Tree
{
struct Node
{
T element;
};
Node* root;
};
这样,我们就切断了类型依赖关系,如下所示:
Tree->POD
Node->T->...
...,或者根据您的示例进行修改:
avl_tree<vertex>->POD
node->vertex->avl_tree<vertex>->POD
...完全可以,并且可以中断循环.
您可能想知道为什么从概念上讲,这需要完整的avl_tree
类型定义:
avl_tree<vertex>::iterator partner;
这里的迭代器很好,因为它存储了指向作为POD的节点的指针.但是这里的问题是,即使它只是一个类型名,我们也要尝试访问avl_tree
的成员,这要求编译器具有完整的avl_tree
类型定义(在我们可能想要的一种理想的粒度级别).递归以要求完整的node
定义,然后要求完整的vertex
定义.
奇怪的是,当我改用
std::list
时,没有这样的问题
这是因为std::list
通常看起来像这样(给予或进行一些细微的改动):
template <class T, ...>
class list
{
public:
...
private:
struct node
{
node* next;
node* prev;
T element;
};
...
node* head;
node* tail;
};
这里相关的类型依存关系如下:
list<T>->POD
node->T->...
打破类型依赖性
从上面我们可以看到,我们可以通过指针引入间接来切断/破坏类型依赖.这样,我们不再需要用户定义的类型定义,而是可以将UDT依赖项更改为简单的POD依赖项.
可以将间接寻址放置在您喜欢的任何位置,但是对于链接结构,通常最方便的位置是该结构的root/head/tail
.这样可以避免使用链接结构的客户端不必担心这些递归/循环类型依赖项.
间接成本
我经常听到的一件事是间接成本",好像这是非常昂贵的.这完全取决于内存访问模式,以及它们与从寄存器到从二级的内存分页一直到内存层次结构的关系.虽然将其视为间接成本"是一种简单而通用的查看方式,因为指针可以指向内存中的所有位置,但真正使我们与众不同的是,我们如何访问内存递归那些指针.
如果我们要在连续的内存空间中顺序遍历链表,则该链表可能会非常高效,在该连续的内存空间中,多个节点可放入缓存行并在逐出之前对其进行访问.它们通常不那么快的地方是由于这样的事实,即节点通常是由通用分配器分配的,而不是一次分配的,因此它们的内容在内存空间中分散和碎片化,并导致遍历期间发生高速缓存未命中.此处最大的区别是内存布局.
因此,如果您担心破坏类型依赖项所需的间接调用的开销,请不必担心.除非只是在非常精细的情况下仅考虑指针的内存大小,否则通常担心的事情是错误的.而是着眼于内存分配的方式,寻找引用的局部性.使用正确的内存分配策略,即使不可避免地依赖大量间接访问的链接结构也可以变得非常有效.
I wrote myself a custom STL-style container which internally uses an AVL tree to organize data. Now, in a project, I want to have an iterator for it as a member:
class vertex {
...
avl_tree<vertex>::iterator partner;
...
}
However, I get the error:
error: ‘avl_tree<T, A>::node::data’ has incomplete type
T data;
^
From what I've read on SO and other websites, vertex
is an incomplete type until it is completely defined. avl_tree<T,A>::node
is a private struct I use to manage the tree and it has T data;
amongst its members which is however illegal if T
is incomplete. Oddly enough, when I use std::list
instead, there is no such problem, which I understand is undefined behavior.
Is there an easy way around this problem? The avl_tree<T,A>::iterator
internally maintains just a pointer node *ptr
which should not be a problem for incomplete types as pointers have fixed size. But I don't want to expose the node
class to the public
, I want to use iterator
s. Still, the iterator
will always have the same size, regardless of the templete parameter, so is there a way to force the compiler to acknowledge that fact?
Struct overview:
template <typename T, typename A = std::allocator<T> >
class avl_tree {
private:
class node {
public:
T data;
avl_tree *tree;
short depth;
size_type n;
node *parent;
node *left_child;
node *right_child;
};
public:
class iterator {
private:
node *ptr;
};
private:
using NodeAlloc = typename std::allocator_traits<A>::template rebind_alloc<node>;
NodeAlloc alloc;
node root;
};
The full code is available on GitHub.
Complete Types
First, let's look at a simple example of what is required in order to define a UDT (user-defined type).
struct Foo
{
struct Bar bar;
};
Given the above code, the compiler can only properly build it if the definition of struct Bar
is understood. Otherwise it cannot know how to align this bar
data member along with how large the structure actually is (and how much padding to add to ensure its proper alignment).
As a result, to be able to define Foo
in this way requires Bar
to likewise be defined. The type dependencies look like this:
Foo->Bar
If we change the code above to this:
struct Foo
{
struct Bar* bar;
};
... suddenly we're looking at a very different scenario. In this case, Bar
is allowed to be an incomplete type (declared but not defined) as Foo
only stores a pointer to it. A pointer is effectively a POD (plain old data type). Its size and alignment requirements do not vary whether it's pointing to a Bar
or a Baz
. As a result, the type dependency here is basically:
Foo->POD
Therefore we can compile this code even though the definition of Bar
is unknown. Of course if the compiler encounters code which is trying to access the members of Bar
or constructing it or doing anything which requires information about Bar
, specifically, it would generate an error unless the definition of Bar
is available at that time.
Circular/Recursive Type Dependencies
Let's look at a simple example of a recursive type dependency:
struct Foo
{
struct Foo next;
};
For this case, in order to properly define Foo
, we must properly define Foo
. Whoops -- infinite recursion. Even if this was somehow allowed, the system would want to allocate an infinite amount of memory for Foo
. In this case, the type dependencies look like this:
Foo->Foo->Foo->...
The same problem persists even if we introduce a new type in the middle:
struct Foo
{
struct Node next;
};
struct Node
{
struct Foo element;
};
Here we still get a compiler error due to the cyclical nature of the type dependencies which look like this:
Foo->Node->Foo->Node->Foo->...
Besides that, we have a chicken or the egg kind of problem. Node
cannot possibly be defined at the time Foo
is defined if Foo
precedes Node
, and Node
cannot possibly be defined at the time Foo
is defined if Node
precedes Foo
.
To break the cycle, we can add an indirection:
struct Foo
{
struct Node* next;
};
struct Node
{
struct Foo element;
};
Now we have:
Foo->POD
Node->Foo->POD
... which is valid, avoids circular type dependencies, and compiles just fine.
Tree Example
Getting closer to your tree example, let's look at a case like this:
template <class T>
struct Tree
{
struct Node
{
T element;
};
Node root;
};
In this case, the type dependencies look like this:
Tree->Node->T->...
This will compile fine provided that T
does not depend on the definition of Tree
or Node
.
Nevertheless, in your case, T
is a vertex
which depends on the type definition of a tree which stores a node which stores a vertex. As a result, we have this kind of scenario:
avl_tree<vertex>->node->vertex->avl_tree<vertex>->node->vertex->...
... and thus we have that circular type dependency again. One of the easiest ways to sever this dependency, and perhaps the most commonly-employed for a linked structure like this, is to store the root/head/tail
as a pointer.
template <class T>
struct Tree
{
struct Node
{
T element;
};
Node* root;
};
With this, we've severed the type dependencies like so:
Tree->POD
Node->T->...
... or, adapted to your example:
avl_tree<vertex>->POD
node->vertex->avl_tree<vertex>->POD
... which is completely fine and breaks the cycle.
You might be wondering why, conceptually, this needs the full avl_tree
type definition:
avl_tree<vertex>::iterator partner;
The iterator here is fine since it stores a pointer to a node which is a POD. Yet the problem here is that we're trying to access a member of avl_tree
, even though it's just a typename, and this requires the compiler to have a complete type definition of avl_tree
(it doesn't work quite at the ideal kind of granular level we might like). This recurses to require the complete definition of node
which then requires the complete definition of vertex
.
Oddly enough, when I use
std::list
instead, there is no such problem
This is because std::list
typically looks like this (give or take some minor variations):
template <class T, ...>
class list
{
public:
...
private:
struct node
{
node* next;
node* prev;
T element;
};
...
node* head;
node* tail;
};
The relevant type dependencies here look like this:
list<T>->POD
node->T->...
Breaking Type Dependencies
From the above we can see that we can sever/break a type dependency by introducing an indirection through a pointer. In doing so, we no longer require a user-defined type definition and can instead change a UDT dependency to a simple POD dependency.
The indirection can be placed anywhere you like, but for a linked structure, typically the most convenient place is at the root/head/tail
of the structure. This prevents the clients using your linked structure from having to worry about these recursive/circular type dependencies.
Cost of Indirection
One of the things I hear a lot is "the cost of indirection", as though this is very expensive. It all depends on memory access patterns and how they relate to the memory hierarchy going all the way from a register to paging in memory from secondary stage. While thinking about this as the "cost of indirection" is a simple and general kind of way to look at it since pointers can point all over the place in memory, what really makes the difference here is how we're accessing memory as a result of deferencing those pointers.
Even a linked list can be quite efficient if we're traversing it sequentially across a contiguous memory space where multiple nodes fit into a cache line and are accessed prior to eviction. Where they're often not so fast is due to the fact that often the nodes are allocated by a general-purpose allocator and not all at once, scattering and fragmenting their contents across the memory space and leading to cache misses during traversal. It's the memory layout that makes the biggest difference here.
So if you're ever worried about the cost of an indirection required to break a type dependency, don't. It's often the wrong thing to worry about unless it's merely about the memory size of a pointer in a very granular case. Instead look to the way the memory is allocated, seek locality of reference. With the right kind of memory allocation strategy, even linked structures which inevitably rely on a lot of indirection can become very efficient.
这篇关于集合迭代器中的类型不完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!