用于查找严格子集的快速数据结构(来自给定列表) [英] Fast Data structure for finding strict subsets (from a given list)

查看:176
本文介绍了用于查找严格子集的快速数据结构(来自给定列表)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大套套,例如 {{2,4,5},{4,5},...}。
给定这些子集之一,我想遍历所有其他子集,这是子集的严格子集。也就是说,如果我对set A 感兴趣,例如 {2,4,5} ,我想找到所有集合 B 其中相对补码 B / A = {},空集。一些可能性可能是 {2,4} {2,5} 而不是 {当然,我可以线性搜索并检查每一次,但正在寻找一个高效的数据结构,对于较大的集合和子集(如果重要)。子集的数量通常在几十万之间,但是如果它有所不同,那么对于可能在数亿的情况,我会感兴趣。子集的大小通常是10s。



我在C ++中编程



谢谢

解决方案

数学上,你应该构建 Hasse图表,这将是您的集合的部分有序集合,其顶点包含您的集合和箭头。基本上,您要创建一个有向,非循环图,箭头 A - > B 如果 A 严格包含 B ,并且没有 C ,使 A 严格包含 C C 严格包含 B



这实际上将是一个排名的poset,这意味着您可以根据集合的基数来跟踪图的级别。这就像创建一个哈希表来跳转到正确的集合。



A ,只需在图表中按BFS查找所有适当的一个



如何实现:(伪代码)


$ b $(code)C(code)C(code)
addArrow(C,B)
}
(A在HasseDiagram中排名(C)+1){
if(C contains A)
addArrow A,C)
}
addToDiagram(C)
}

为了使这个和所有子程序快速,你可以对每个集合进行编码,其中数字 i 1 if C 0 否则。这使得测试遏制和确定排名很小。



如果您有所有可能的子集,上述方法将工作。因为你可能会缺少一些,你必须检查更多的东西。对于伪代码,您需要将等级(C)-1 更改为最大整数 l < (C),使得HasseDiagram的某些元素的排名为 l ,而对于等级(C)+1 。然后,当您向图中添加 C


  1. 如果 A 涵盖 C ,那么您只需要检查较低排名的集合 B 也被 A 覆盖。


  2. 如果 C 涵盖 B ,那么你只需要检查更高排名的集合 A 覆盖 B


X 涵盖 Y 我的意思是有一个箭头 X - > Y ,而不仅仅是路径。



此外,当您插入 C code> A 和 B 使用上述检查之一,您将需要删除箭头 A - - > B 当您添加 A - > C C - > B


I have a large set of sets e.g. {{2,4,5} , {4,5}, ...}. Given one of these subsets, I would like to iterate through all other subsets which are strict subsets of this subset. That is, if I am interested in set A, e.g. {2,4,5}, I want to find all sets B where the relative complement B / A = {}, the empty set. Some possibilities could be {2,4}, {2,5} but not {2,3}

I could of course search linearly and check each time, but am looking for an efficient data structure both for the larger set and the subset (if it matters). The number of subsets is typically in the 10s of thousands, but if it makes a difference I would be interested in cases where it could be in the hundreds of millions. The size of the subsets is typically in 10s.

I am programming in C++

Thanks

解决方案

Mathematically, you should construct the Hasse diagram for your sets, which will be the partially ordered set with vertices your sets and arrows given by containment. Essentially, you want to create a directed, acyclic graph with an arrow A --> B if A strictly contains B and there is no C such that A strictly contains C and C strictly contains B.

This is actually going to be a ranked poset, meaning that you can keep track of "levels" of the digraph based on the cardinality of the sets. This is sort of like creating a hash table to jump to the right set.

From A, just do a BFS down the graph to find all proper subsets of A.

How to implement this: (in pseudocode)

for (C in sets) {
    for (B in HasseDiagram at rank rank(C)+1) {
      if (C contains B)
        addArrow(C,B)
    }
    for (A in HasseDiagram at rank rank(C)+1) {
      if (C contains A)
        addArrow(A,C)
    }
    addToDiagram(C)
}

To make this and all the subroutines fast, you can encode each set an a binary where digit i is 1 if i is in C and 0 otherwise. This makes testing containment and determining rank trivial.

The above method works if you have all possible subsets. Since you may be missing some, you'll have to check more things. For the pseudocode, you'll need to change rank(C)-1 to the largest integer l < rank(C) such that some element of the HasseDiagram has rank l, and similarly for rank(C)+1. Then, when you're adding the set C to the diagram:

  1. If A covers C, then you only need to check lower ranked sets B that are also covered by A.

  2. If C covers B, then you only need to check higher ranked sets A that also cover by B.

By "X covers Y" I mean there is an arrow X -> Y, not just a path.

Furthermore, when you insert C between A and B using one of the above checks, you will need to remove the arrow A --> B when you add A --> C and C --> B.

这篇关于用于查找严格子集的快速数据结构(来自给定列表)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆