C ++中的有效链表? [英] Efficient linked list in C++?

查看:94
本文介绍了C ++中的有效链表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



说我们要删除一个节点。这是您的标准双链列表删除,除了我们使用索引而不是指针,并且您还将节点推送到空闲列表(仅涉及操作整数):



删除链接的调整:





将已删除的节点推送到空闲列表:





现在假设您插入到此列表中。在这种情况下,您弹出自由头并覆盖该位置的节点。



插入后:





在固定时间内插入中间位置同样应该很容易。基本上,只要空闲堆栈为空,您只需将其插入到空闲头中,或者将 push_back 插入到矢量中。然后,执行标准的双向链接列表插入。空闲列表的逻辑(尽管我为其他人制作了此图,并且涉及到SLL,但是您应该知道这个想法):





请确保使用插入/移除时对new放置新的和手动调用dtor的位置来正确构造和破坏元素。如果您真的想将其概括,则还需要考虑异常安全性,并且我们还需要一个只读的const迭代器。



缺点



这种结构的好处是,它确实允许从列表中的任何位置(甚至对于巨大的列表)进行非常快速的插入/删除操作,甚至可以插入order保留用于遍历,并且不会使未直接删除的元素的迭代器无效(尽管它将使指向它们的指针无效;如果不希望使用 deque 指针无效)。就个人而言,我会发现它比 std :: list 用途更多(我几乎从不使用)。



对于足够大的列表(例如,比您整个L3缓存大的情况,您肯定会期望有很大的优势),对于删除和删除,此列表应大大优于 std :: vector 从中间和前面插入。对于较小的元素,从向量中删除元素的速度可能很快,但是请尝试从正面开始向后向向量中去除矢量中的一百万个元素。那里的事物将开始爬行,而这一瞬间将在瞬间完成。当人们开始使用其擦除方法从该元素中删除元素时, std :: vector 有点过分夸大了IMO。向量跨越10k或更多元素的中间,尽管我认为这仍然比在任何地方天真地使用链表的人们更可取,这种方式是针对每个节点针对通用分配器单独分配,同时导致高速缓存未命中。



缺点是它仅支持顺序访问,每个元素需要两个整数的开销,并且如您在上图中所看到的,如果您不断不定期地删除内容,则其空间局部性会降低。



空间局部性退化



在您开始移除和在中间插入很多/到中间会导致锯齿形的内存访问模式,可能会从高速缓存行中驱出数据,只是在单个连续的
循环中返回并重新加载数据。对于任何允许在恒定时间内从中间删除的数据结构,同时允许在保留插入顺序的同时回收该空间,这通常是不可避免的。但是,您可以通过提供某些方法来恢复空间局部性,或者可以复制/交换列表。复制构造函数可以通过遍历源列表的方式复制列表,并插入所有元素,从而为您提供一个完美的,连续的,友好的,缓存友好的向量,尽管没有孔(尽管这样做会使迭代器无效)。



替代方法:免费列表分配器



满足您要求的替代方法是实施符合以下条件的免费列表 std :: allocator 并与 std :: list 一起使用。我从不喜欢遍历数据结构并搞乱自定义分配器,而且我会通过使用指针而不是32位索引来使64位链接的内存使用量增加一倍,因此我个人更喜欢使用上述解决方案使用 std :: vector 基本上是类比的内存分配器和索引,而不是指针(如果我们使用 std :: vector ,因为当vector保留新容量时指针将失效。



索引链接列表



我称这种事情为索引链接列表,因为链接列表实际上并不是一个容器,而是一种将已经存储在数组中的东西链接在一起的方式。而且我发现这些索引链接列表的功能成倍增加,因为您不必费心地在内存池中避免每个节点的堆分配/取消分配,并且仍然可以保持合理的引用局部性(如果可以负担得起,则可以使用较大的LOR,



如果您再向节点迭代器添加一个整数以存储先前的节点索引,也可以使它单链接(假设 int 的32位对齐要求和指针的64位对齐要求,则在64位上不收取内存费用)。但是,您将失去添加反向迭代器并使所有迭代器成为双向的能力。



基准



由于您似乎对'em'感兴趣,因此我整理了上述内容的快速版本:发行版,MSVC 2012,没有经过检查的迭代器或类似的东西:

  --------------------------------------- ----- 
-test_vector_linked
----------------------------------- ---------
插入200000个元素...
插入所花费的时间:{0.000015秒}

擦除列表的一半...
经过擦除的时间:{0.000021秒}
经过迭代的时间:{0.000002秒}
经过复制的时间:{0.000003秒}

结果(最多显示10个元素):
[11 13 15 17 19 21 23 25 27 29]

完成test_vector_linked:{0.062000 secs}
---- ----------------------------------------
-test_vector
--------------------------------------------
插入200000个元素...
插入所花费的时间:{0.000012秒}

擦除矢量的一半...
擦除所花费的时间:{5.320000秒}
的迭代时间:{0.000000秒}
的复制时间:{0.000000秒}

结果(最多显示10个元素):
[11 13 15 17 19 21 23 25 27 29]

完成test_vector:{5.320000秒}

太懒了以至于无法使用高精度计时器,但希望这可以弄清为什么不应该使用 vector的线性时间<$ c对于非平凡输入大小的关键路径中的$ c>删除方法,在该路径上方的 vector 花费的时间要长〜86倍(并且输入大小-我最初尝试使用200万个元素,但在等待了近10分钟后就放弃了),为什么我认为 vector 经常被这种类型的内容过度炒作用。就是说,我们可以将从中间的移除转换为非常快速的恒定时间操作,而无需重新排列元素的顺序,而无需使索引和迭代器存储它们,并且仍然使用 vector ...我们要做的就是简单地使它存储带有上一个/下一个索引的链接节点,以允许跳过已删除的元素。



为了删除,我使用了随机混合的偶数索引源向量来确定要删除的元素以及顺序。这在某种程度上模仿了一个实际的用例,其中您是通过以前获得的索引/迭代器从这些容器的中间删除的,例如删除用户先前使用选取框工具删除按钮后再次使用的元素(然后,真的不应该使用标量 vector :: erase 来实现非平凡的大小;最好建立一组索引来删除并使用 remove_if -仍然比 vector :: erase 一次调用一个迭代器要好)。



请注意,链接节点的迭代确实会稍微变慢,这与迭代逻辑没有太大关系,因为向量中的每个条目在添加链接的情况下都会更大(更多内存)顺序处理等同于更多的缓存未命中和页面错误)。但是,如果您正在执行从非常大的输入中删除元素之类的操作,则对于线性时间和固定时间删除之间的大型容器而言,性能偏差是如此之大,以至于值得进行一次交换。


This document says std::list is inefficient:

std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation for every element inserted into it, thus having an extremely high constant factor, particularly for small data types.

Comment: that is to my surprise. std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.

My question: Say I need a sequential container for small-sized homogeneous elements, and this container should support element insert/delete in O(1) complexity and does not need random access (though supporting random access is nice, it is not a must here). I also don't want the high constant factor introduced by heap allocation for each element's construction, at least when the number of element is small. Lastly, iterators should be invalidated only when the corresponding element is deleted. Apparently I need a custom container class, which might (or might not) be a variant of doubly linked list. How should I design this container?

If the aforementioned specification cannot be achieved, then perhaps I should have a custom memory allocator, say, bump pointer allocator? I know std::list takes an allocator as its second template argument.

Edit: I know I shouldn't be too concerned with this issue, from an engineering standpoint - fast enough is good enough. It is just a hypothetical question so I don't have a more detailed use case. Feel free to relax some of the requirements!

Edit2: I understand two algorithms of O(1) complexity can have entirely different performance due to the difference in their constant factors.

解决方案

The simplest way I see to fulfill all your requirements:

  1. Constant-time insertion/removal (hope amortized constant-time is okay for insertion).
  2. No heap allocation/deallocation per element.
  3. No iterator invalidation on removal.

... would be something like this, just making use of std::vector:

template <class T>
struct Node
{
    // Stores the memory for an instance of 'T'.
    // Use placement new to construct the object and
    // manually invoke its dtor as necessary.
    typename std::aligned_storage<sizeof(T), alignof(T)>::type element;

    // Points to the next element or the next free
    // element if this node has been removed.
    int next;

    // Points to the previous element.
    int prev;
};

template <class T>
class NodeIterator
{
public:
    ...
private:
    std::vector<Node<T>>* nodes;
    int index;
};

template <class T>
class Nodes
{
public:
    ...
private:
    // Stores all the nodes.
    std::vector<Node> nodes;

    // Points to the first free node or -1 if the free list
    // is empty. Initially this starts out as -1.
    int free_head;
};

... and hopefully with a better name than Nodes (I'm slightly tipsy and not so good at coming up with names at the moment). I'll leave the implementation up to you but that's the general idea. When you remove an element, just do a doubly-linked list removal using the indices and push it to the free head. The iterator doesn't invalidate since it stores an index to a vector. When you insert, check if the free head is -1. If not, overwrite the node at that position and pop. Otherwise push_back to the vector.

Illustration

Diagram (nodes are stored contiguously inside std::vector, we simply use index links to allow skipping over elements in a branchless way along with constant-time removals and insertions anywhere):

Let's say we want to remove a node. This is your standard doubly-linked list removal, except we use indices instead of pointers and you also push the node to the free list (which just involves manipulating integers):

Removal adjustment of links:

Pushing removed node to free list:

Now let's say you insert to this list. In that case, you pop off the free head and overwrite the node at that position.

After insertion:

Insertion to the middle in constant-time should likewise be easy to figure out. Basically you just insert to the free head or push_back to the vector if the free stack is empty. Then you do your standard double-linked list insertion. Logic for the free list (though I made this diagram for someone else and it involves an SLL, but you should get the idea):

Make sure you properly construct and destroy the elements using placement new and manual calls to the dtor on insertion/removal. If you really want to generalize it, you'll also need to think about exception-safety and we also need a read-only const iterator.

Pros and Cons

The benefit of such a structure is that it does allow very rapid insertions/removals from anywhere in the list (even for a gigantic list), insertion order is preserved for traversal, and it never invalidates the iterators to element which aren't directly removed (though it will invalidate pointers to them; use deque if you don't want pointers to be invalidated). Personally I'd find more use for it than std::list (which I practically never use).

For large enough lists (say, larger than your entire L3 cache as a case where you should definitely expect a huge edge), this should vastly outperform std::vector for removals and insertions to/from the middle and front. Removing elements from vector can be quite fast for small ones, but try removing a million elements from a vector starting from the front and working towards the back. There things will start to crawl while this one will finish in the blink of an eye. std::vector is ever-so-slightly overhyped IMO when people start using its erase method to remove elements from the middle of a vector spanning 10k elements or more, though I suppose this is still preferable over people naively using linked lists everywhere in a way where each node is individually allocated against a general-purpose allocator while causing cache misses galore.

The downside is that it only supports sequential access, requires the overhead of two integers per element, and as you can see in the above diagram, its spatial locality degrades if you constantly remove things sporadically.

Spatial Locality Degradation

The loss of spatial locality as you start removing and inserting a lot from/to the middle will lead to zig-zagging memory access patterns, potentially evicting data from a cache line only to go back and reload it during a single sequential loop. This is generally inevitable with any data structure that allows removals from the middle in constant-time while likewise allowing that space to be reclaimed while preserving the order of insertion. However, you can restore spatial locality by offering some method or you can copy/swap the list. The copy constructor can copy the list in a way that iterates through the source list and inserts all the elements which gives you back a perfectly contiguous, cache-friendly vector with no holes (though doing this will invalidate iterators).

Alternative: Free List Allocator

An alternative that meets your requirements is implement a free list conforming to std::allocator and use it with std::list. I never liked reaching around data structures and messing around with custom allocators though and that one would double the memory use of the links on 64-bit by using pointers instead of 32-bit indices, so I'd prefer the above solution personally using std::vector as basically your analogical memory allocator and indices instead of pointers (which both reduce size and become a requirement if we use std::vector since pointers would be invalidated when vector reserves a new capacity).

Indexed Linked Lists

I call this kind of thing an "indexed linked list" as the linked list isn't really a container so much as a way of linking together things already stored in an array. And I find these indexed linked lists exponentially more useful since you don't have to get knee-deep in memory pools to avoid heap allocations/deallocations per node and can still maintain reasonable locality of reference (great LOR if you can afford to post-process things here and there to restore spatial locality).

You can also make this singly-linked if you add one more integer to the node iterator to store the previous node index (comes free of memory charge on 64-bit assuming 32-bit alignment requirements for int and 64-bit for pointers). However, you then lose the ability to add a reverse iterator and make all iterators bidirectional.

Benchmark

I whipped up a quick version of the above since you seem interested in 'em: release build, MSVC 2012, no checked iterators or anything like that:

--------------------------------------------
- test_vector_linked
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000015 secs}

Erasing half the list...
time passed for 'erasing': {0.000021 secs}
time passed for 'iterating': {0.000002 secs}
time passed for 'copying': {0.000003 secs}

Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]

finished test_vector_linked: {0.062000 secs}
--------------------------------------------
- test_vector
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000012 secs}

Erasing half the vector...
time passed for 'erasing': {5.320000 secs}
time passed for 'iterating': {0.000000 secs}   
time passed for 'copying': {0.000000 secs}

Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]

finished test_vector: {5.320000 secs}

Was too lazy to use a high-precision timer but hopefully that gives an idea of why one shouldn't use vector's linear-time erase method in critical paths for non-trivial input sizes with vector above there taking ~86 times longer (and exponentially worse the larger the input size -- I tried with 2 million elements originally but gave up after waiting almost 10 minutes) and why I think vector is ever-so-slightly-overhyped for this kind of use. That said, we can turn removal from the middle into a very fast constant-time operation without shuffling the order of the elements, without invalidating indices and iterators storing them, and while still using vector... All we have to do is simply make it store a linked node with prev/next indices to allow skipping over removed elements.

For removal I used a randomly shuffled source vector of even-numbered indices to determine what elements to remove and in what order. That somewhat mimics a real world use case where you're removing from the middle of these containers through indices/iterators you formerly obtained, like removing the elements the user formerly selected with a marquee tool after he his the delete button (and again, you really shouldn't use scalar vector::erase for this with non-trivial sizes; it'd even be better to build a set of indices to remove and use remove_if -- still better than vector::erase called for one iterator at a time).

Note that iteration does become slightly slower with the linked nodes, and that doesn't have to do with iteration logic so much as the fact that each entry in the vector is larger with the links added (more memory to sequentially process equates to more cache misses and page faults). Nevertheless, if you're doing things like removing elements from very large inputs, the performance skew is so epic for large containers between linear-time and constant-time removal that this tends to be a worthwhile exchange.

这篇关于C ++中的有效链表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆