快速检查集合是否是存储集合的超集 [英] Quickly checking if set is superset of stored sets

查看：31 发布时间：2021/12/22 8:30:27 algorithm data-structures set complexity-theory time-complexity

本文介绍了快速检查集合是否是存储集合的超集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到了 N 个 C 布尔值数组.我想将这些组织成一个数据结构，使我能够尽快执行以下操作:给定一个新数组，如果该数组是任何存储数组的超集"，则返回 true.对于超集，我的意思是:A 是 B 的超集，如果 A[i] 对每个 i 都为真，其中 B[i] 为真.如果 B[i] 是假的，那么 A[i] 可以是任何东西.

I am given N arrays of C booleans. I want to organize these into a datastructure that allows me to do the following operation as fast as possible: Given a new array, return true if this array is a "superset" of any of the stored arrays. With superset I mean this: A is a superset of B if A[i] is true for every i where B[i] is true. If B[i] is false, then A[i] can be anything.

或者，就集合而不是数组而言:

Or, in terms of sets instead of arrays:

将 N 个集合(每个集合包含 C 个可能的元素)存储到一个数据结构中，以便您可以快速查找给定的集合是否是任何存储集合的超集.

构建数据结构可以花费尽可能长的时间，但查找应该尽可能高效，并且数据结构不能占用太多空间.

Building the datastructure can take as long as possible, but the lookup should be as efficient as possible, and the datastructure can't take too much space.

我认为这本身就是一个有趣的问题，但对于我真正想解决的问题，您可以假设如下:

I think this is an interesting problem on its own, but for the thing I'm really trying to solve, you can assume the following:

N = 10000
C = 1000
存储的数组是稀疏的
查找的数组是随机的(所以不是稀疏的)

对于 O(NC) 查找:只需迭代所有数组.不过这太慢了.

For O(NC) lookup: Just iterate all the arrays. This is just too slow though.

对于 O(C) 查找:我在这里有很长的描述，但正如 Amit 在评论中指出的那样，它基本上是一个 BDD.虽然这具有很高的查找速度，但它的节点数量呈指数级增长.由于 N 和 C 如此之大，这会占用太多空间.

For O(C) lookup: I had a long description here, but as Amit pointed out in the comments, it was basically a BDD. While this has great lookup speed, it has an exponential number of nodes. With N and C so large, this takes too much space.

我希望在这个 O(N*C) 和 O(C) 解决方案之间，可能有一个不需要指数级空间的 O(log(N)*C) 解决方案.

I hope that in between this O(N*C) and O(C) solution, there's maybe a O(log(N)*C) solution that doesn't require an exponential amount of space.

对于 O(sqrt(N)C) 查找:将数组存储为前缀试试.查找数组 A 时，如果 A[i]=0，则转到相应的子树，如果 A[i]=1，则访问两个子树.

For O(sqrt(N)C) lookup: Store the arrays as a prefix trie. When looking up an array A, go to the appropriate subtree if A[i]=0, but visit both subtrees if A[i]=1.

我的直觉告诉我，如果您假设存储的数组是随机的，那么这应该使查找的(平均)复杂度变为 O(sqrt(N)C).但是: 1. 他们不是，数组是稀疏的.2. 这只是直觉，我无法证明.

My intuition tells me that this should make the (average) complexity of the lookup O(sqrt(N)C), if you assume that the stored arrays are random. But: 1. they're not, the arrays are sparse. And 2. it's only intuition, I can't prove it.

我将同时尝试这个新想法和 BDD 方法，看看两者中哪一个最有效.

I will try out both this new idea and the BDD method, and see which of the 2 work out best.

但与此同时，这个问题不是经常出现吗?它没有名字吗?之前不是有研究吗?感觉就像我在这里重新发明轮子一样.

But in the meantime, doesn't this problem occur more often? Doesn't it have a name? Hasn't there been previous research? It really feels like I'm reinventing the wheel here.

快速检查集合是否是存储集合的超集 [英] Quickly checking if set is superset of stored sets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

快速检查集合是否是存储集合的超集 [英] Quickly checking if set is superset of stored sets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭