set vs unordered_set用于最快的迭代 [英] set vs unordered_set for fastest iteration

查看:845
本文介绍了set vs unordered_set用于最快的迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我有以下要求 -


  1. 数据结构将只填充一些值键/值对)。
    这些值可能会重复,但我希望数据结构只存储一次。


  2. 我将通过所有元素重复100次的上面创建的数据结构。


约束1表明我将不得不使用set或unordered_set作为数据不是键值对的形式。



现在设置插入比unordered_set插入更昂贵,但是数据结构只在开始时填充一次我的程序。



我相信决定因素将是我能够多快地遍历数据结构的所有元素。我不知道set或unordered_set是否会更快为此目的。我相信标准没有提到这个事实,因为这个操作将为任一数据结构的O(n)。但我想知道iterator.next()的数据结构会更快些。

解决方案


  1. 您的问题的意见建议保留 std :: unordered_set 最快 O(1)查找/插入和 O(N)迭代(与每个容器一样)。如果你有数据改变很多,或需要大量的随机查找,这可能是最快的。 c> code>复制到 std :: vector 并从连续的内存布局获取100次。 测试这是否比常规 std :: unordered_set 更快。

  2. 的迭代之间的中间插入,它可以支付使用专用向量。如果您可以使用 Boost.Container ,请尝试 boost :: flat_set ,它提供 std :: set 接口与 std :: vector 存储后端(即连续的内存布局,非常cache和预取友好)。
  3. ,请参阅Boost文档中的一些折衷(很好地了解所有其他问题,如迭代器无效,移动语义和异常安全):


    Boost.Container flat_ [multi] map / set containers是基于Austern和Alexandrescu的
    指南的基于
    的关联容器。这些有序矢量容器最近也受益于
    ,向C ++添加了移动语义,从而大大加快了
    的插入和擦除时间。平面关联容器
    具有以下属性:




    • 比标准关联容器更快速的查找


    • 减少小对象(如果使用shrink_to_fit,则为大对象)的内存消耗

    • 改进的缓存性能(数据存储在连续内存中)

    • 非稳定迭代器(插入和删除元素时迭代器无效)

    • 不可复制和不可移动的值类型不能存储

    • 比标准关联容器更弱的异常安全性(复制/移动构造函数可以在删除值时引发
      和插入)

    • 比标准关联容器(特别是非移动类型)更快的插入和删除


    注意:查找速度更快,意味着 flat_set code> O(log N)在连续内存上,而不是 O(log N) > std :: set 。当然, std :: unordered_set 会执行 O(1)查找, c> N 。


    In my application I have the following requirements -

    1. The data structure will be populated just once with some values (not key/value pairs). The values may be repeated but I want the data structure to store them once only.

    2. I will be iterating 100s of times through all the elements of the data structure created above. The order in which the elements appear in the iteration is immaterial.

    Constraint 1 suggests that I will either have to use set or unordered_set as data is not in the form of key value pairs.

    Now set insertion is costlier than unordered_set insertion but the data structure is populated only once in the beginning of my program.

    I believe the deciding factor will be how fast I can iterate through all the elements of the data structure. I am not sure whether set or unordered_set will be faster for this purpose. I believe the standard makes no mention of this fact as this operation will be O(n) for either data structure. But I wonder for which data structure iterator.next() will be faster.

    解决方案

    There are several approaches.

    1. The comments to your question suggest keeping a std::unordered_set that has the fastest O(1) lookup/insertion and O(N) iteration (as has every container). If you have data that changes a lot, or requires a lot of random lookups, this is probably the fastest. But test.
    2. If you need to iterate 100s of times without intermediate insertions, you can do a single O(N) copy to a std::vector and gain from contiguous memory layout 100s of times. Test whether this is faster than a regular std::unordered_set.
    3. If you have a small number of intermediate insertions between iterations, it could pay to use a dedicated vector. If you can use Boost.Container, try boost::flat_set which offers a std::set interface with a std::vector storage back-end (i.e. a contiguous memory layout that is very cache- and prefetch friendly). Again, test whether this gives a speed-up to the other two solutions.

    For the last solution, see the Boost documentation for some of the tradeoffs (it's good to be aware of all the other issues like iterator invalidation, move semantics and exception safety as well):

    Boost.Container flat_[multi]map/set containers are ordered-vector based associative containers based on Austern's and Alexandrescu's guidelines. These ordered vector containers have also benefited recently with the addition of move semantics to C++, speeding up insertion and erasure times considerably. Flat associative containers have the following attributes:

    • Faster lookup than standard associative containers
    • Much faster iteration than standard associative containers
    • Less memory consumption for small objects (and for big objects if shrink_to_fit is used)
    • Improved cache performance (data is stored in contiguous memory)
    • Non-stable iterators (iterators are invalidated when inserting and erasing elements)
    • Non-copyable and non-movable values types can't be stored
    • Weaker exception safety than standard associative containers (copy/move constructors can throw when shifting values in erasures and insertions)
    • Slower insertion and erasure than standard associative containers (specially for non-movable types)

    NOTE: with faster lookup, it is meant that a flat_set does O(log N) on contiguous memory rather than O(log N) pointer chasing of a regular std::set. Of course, a std::unordered_set does O(1) lookup, which will faster for large N.

    这篇关于set vs unordered_set用于最快的迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆