如果设置快速检查存储​​组的超集 [英] Quickly checking if set is superset of stored sets

查看:148
本文介绍了如果设置快速检查存储​​组的超集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我给Ç布尔N个阵列。我想这些组织成一个数据结构,它允许我做以下操作尽可能快:给定一个新的数组,返回true,如果这个数组是任何存储阵列的超集。随着超我的意思是:A是B的一个超集,如果A [1]是真正的每个i其中B [i]为真。如果乙方[i]为假,则A [1]可以是任何东西。

I am given N arrays of C booleans. I want to organize these into a datastructure that allows me to do the following operation as fast as possible: Given a new array, return true if this array is a "superset" of any of the stored arrays. With superset I mean this: A is a superset of B if A[i] is true for every i where B[i] is true. If B[i] is false, then A[i] can be anything.

或者,在套代替阵列方面:

Or, in terms of sets instead of arrays:

商品N组(每个用C可能的内容)到数据结构,因此您可以快速查找,如果一个给定的是任何存储组的一个超集。

Store N sets (each with C possible elements) into a datastructure so you can quickly look up if a given set is a superset of any of the stored sets.

构建数据结构可以采取尽可能长,但查找应尽可能高效,并且数据结构不能占用太多空间。

Building the datastructure can take as long as possible, but the lookup should be as efficient as possible, and the datastructure can't take too much space.

我觉得这是对自己的一个有趣的问题,但对于事情我真的想解决,你可以假设如下:

I think this is an interesting problem on its own, but for the thing I'm really trying to solve, you can assume the following:

  • N = 10000
  • C = 1000
  • 在存储阵列稀疏
  • 的抬起头阵列是随机的(所以不疏)
  1. 对于O(NC)查找:只需遍历所有的阵列。这实在是太慢了,虽然。

  1. For O(NC) lookup: Just iterate all the arrays. This is just too slow though.

对于O(C)查询:我有一个很长的描述,但作为阿米特指出的意见,这是一个基本的 BDD 。虽然这有很大的查找速度,它具有节点的指数数量。随着N和C如此之大,这需要太多的空间。

For O(C) lookup: I had a long description here, but as Amit pointed out in the comments, it was basically a BDD. While this has great lookup speed, it has an exponential number of nodes. With N and C so large, this takes too much space.

我希望在这个O(N * C)和O(C)的解决方案之间,有可能是O(日志(N)* C)的解决方案,不要求空间的指数数量。

I hope that in between this O(N*C) and O(C) solution, there's maybe a O(log(N)*C) solution that doesn't require an exponential amount of space.

  • 对于O(开方(N)C)查找:存储阵列作为 preFIX线索。当查找一个数组A,转到相应的子树,如果A [1] = 0,但访问两个子树,如果A [1] = 1。

  • For O(sqrt(N)C) lookup: Store the arrays as a prefix trie. When looking up an array A, go to the appropriate subtree if A[i]=0, but visit both subtrees if A[i]=1.

我的直觉告诉我,这应该使查找O(开方(N)C)(平均)的复杂性,如果你假设存储阵列是随机的。但是:1,他们不是,数组是稀疏。和2。这只是直觉,我不能证明这一点。

My intuition tells me that this should make the (average) complexity of the lookup O(sqrt(N)C), if you assume that the stored arrays are random. But: 1. they're not, the arrays are sparse. And 2. it's only intuition, I can't prove it.

我会尝试了这项新的理念和方法BDD,看看其中的2研究出最好的。

I will try out both this new idea and the BDD method, and see which of the 2 work out best.

但是,在此期间,没有此问题发生更经常?没有它有名字吗?一直没有出现过previous研究?这真的感觉就像我在这里重新发明轮子。

But in the meantime, doesn't this problem occur more often? Doesn't it have a name? Hasn't there been previous research? It really feels like I'm reinventing the wheel here.

推荐答案

我想补充一些背景信息到 preFIX线索的解决方案,最近我发现了以下文件:

Just to add some background information to the prefix trie solution, recently I found the following paper:

I.Savnik:索引数据结构快速的子集,超集查询的。 CD-ARES,IFIP LNCS,2013年

I.Savnik: Index data structure for fast subset and superset queries. CD-ARES, IFIP LNCS, 2013.

本文提出的设置特里数据结构(容器),它提供了有效的存储和使用的特里数据结构,配套操作,如找到的所有超集/子集台套查询支持一个给定集合的从集的集合

The paper proposes the set-trie data structure (container) which provides support for efficient storage and querying of sets of sets using the trie data structure, supporting operations like finding all the supersets/subsets of a given set from a collection of sets.

有关的任​​何蟒蛇有意在实际执行的用户,我想出了上述纸部分基于一个python3包。它包含一个套线索为基础的容器,也映射容器,其中键是集。你可以找到它的 github上

For any python users interested in an actual implementation, I came up with a python3 package based partly on the above paper. It contains a trie-based container of sets and also a mapping container where the keys are sets. You can find it on github.

这篇关于如果设置快速检查存储​​组的超集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆