C ++中集合的有效集合交集 [英] Efficient set intersection of a collection of sets in C++

查看:273
本文介绍了C ++中集合的有效集合交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 std :: set 的集合。我想以最快的方式找到该集合中所有集合的交集。集合中的集合数量通常很小(〜5-10),每个集合中的元素数量通常少于1000,但偶尔可以增加到10000左右。但是我需要做这些交集成千上万的时间,尽快。我尝试对一些方法进行基准测试,如下所示:

I have a collection of std::set. I want to find the intersection of all the sets in this collection, in the fastest manner. The number of sets in the collection is typically very small (~5-10), and the number of elements in each set is is usually less than 1000, but can occasionally go upto around 10000. But I need to do these intersections tens of thousands of time, as fast as possible. I tried to benchmark a few methods as follows:


  1. std :: set中的就地交集对象,该对象最初复制第一组。然后,对于后续集合,它会迭代自身的所有元素以及集合的第i个集合,并根据需要从自身中删除项目。

  2. 使用 std :: set_intersection 到临时的 std :: set 中,将内容交换到当前集,然后再次找到当前集与下一个集的交集并插入

  3. 手动遍历1)中所有集合的所有元素,但使用 vector 作为目标容器,而不是 std :: set

  4. 与4相同,但使用的是 std :: list 而不是 vector ,怀疑 list 将提供更快的删除速度。

  5. 使用哈希集( std :: unordered_set )并检查所有集中的所有项目。

  1. In-place intersection in a std::set object which initially copies the first set. Then for subsequent sets, it iterates over all element of itself and the ith set of the collection, and removes items from itself as needed.
  2. Using std::set_intersection into a temporary std::set, swap contents to a current set, then again find intersection of the current set with the next set and insert into the temp set, and so on.
  3. Manually iterate over all the elements of all sets like in 1), but using a vector as the destination container instead of std::set.
  4. Same as in 4, but using a std::list instead of a vector, suspecting a list will provide faster deletions from the middle.
  5. Using hash sets (std::unordered_set) and checking for all items in all sets.

事实证明,当每个集合中的元素数量为1时,使用 vector 的速度略快小,并且 list 是ma大集合的速度稍快一些。就地使用 set 比两者都慢得多,其次是 set_intersection 和哈希集。是否有更快的算法/数据结构/技巧来实现这一目标?如果需要,我可以发布代码段。谢谢!

As it turned out, using a vector is marginally faster when the number of elements in each set is small, and list is marginally faster for larger sets. In-place using set is a substantially slower than both, followed by set_intersection and hash sets. Is there a faster algorithm/datastructure/tricks to achieve this? I can post code snippets if required. Thanks!

推荐答案

您可能想尝试 std :: set_intersection():该算法将对所有集合使用迭代器:

You might want to try a generalization of std::set_intersection(): the algorithm is to use iterators for all sets:


  1. 如果有任何迭代器已达到 end() 对应的设置就完成了。因此,可以假定所有迭代器都是有效的。

  2. 将第一个迭代器的值作为下一个候选值 x
  3. li>
  4. 在迭代器列表中移动,而 std :: find_if()的第一个元素至少与 x一样大

  5. 如果该值大于 x ,则将其设为新的候选值,然后在

  6. 如果所有迭代器的值都在 x 上,则您会发现交集的元素:将其记录下来,增加所有迭代器,重新开始。

  1. If any iterator has reached the end() of its corresponding set, you are done. Thus, it can be assumed that all iterators are valid.
  2. Take the first iterator's value as the next candidate value x.
  3. Move through the list of iterators and std::find_if() the first element at least as big as x.
  4. If the value is bigger than x make it the new candidate value and search again in the sequence of iterators.
  5. If all iterators are on value x you found an element of the intersection: Record it, increment all iterators, start over.

这篇关于C ++中集合的有效集合交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆