具有多个条件的快速查找的数据结构 [英] Data structure for fast lookup with multiple criteria

查看:85
本文介绍了具有多个条件的快速查找的数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个包含n个属性的元素.我需要创建一个数据结构,该数据结构将返回给定条件集的所有匹配元素.查找需要快速.

I have multiple elements that contain n properties. I need to create a datastructure that will return all the matching elements for a given set of criteria. The lookup needs to be fast.

为了使它更加清晰,请想象以下类:

To make it a bit clearer, imagine the following class:

class Element {
    let a : Int? // criterion0
    let b : Int? // criterion1
    let c : Int? // criterion2

    let value : String
}

现在,我用3,000个这样的元素填充 dataStructure :

Now I populate a dataStructure with 3,000 of such elements:

dataStructure.add(Element(a:1,b:3,c:4, value:"val0")) // insert 1
dataStructure.add(Element(a:nil,b:1000,c:40, value:"val1")) // insert 2
dataStructure.add(Element(a:nil,b:3,c:4, value:"val2")) // insert 3
...
dataStructure.add(Element(a:10,b:3,c:23, value:"val2999")) // insert 3000

我将不得不查找与我插入的内容完全匹配的元素,但这也可能有所不同.由于通配符为nil,因此每次查找可能有许多匹配项.我需要快速进行这些查找:

And I will have to look up elements that could be exact matches to what I inserted, but that could also be different. Because of the wildcard nil, there could be many matches for each lookup. I need these look ups to be fast:

dataStructure.getAllMatching(a:4, b:30, c:4)

其中插入的元素中的 nil 参数表示任何值都是匹配项.

where a nil argument in the inserted element means that any value is a match.

例如 dataStructure.getAllMatching(a:1,b:3,c:4)至少在上面插入的内容会返回["val0","val1"].

For example dataStructure.getAllMatching(a:1, b:3, c:4) will return ["val0", "val1"] at least given what I inserted above.

请注意,我对内存的限制很小,我可以花一些时间在启动时进行设置.只有查找需要快速.

Note that I have little constraint on memory, and I can spend some time at launch time to set things up. Only the lookup needs to be fast.

推荐答案

对于固定数量的字段(在您的示例中为3),有一种简单的方法可以在O(1中全部添加,删除和查询操作) 时间.它使用O(n)内存,但具有较大的恒定因子;如果您对内存的限制很小,则可以.由于我不使用Swift,因此我将以非编程语言的特定方式编写.

For a fixed number of fields (3 in your example), there's a straightforward way to do it which allows adding, removing and querying operations all in O(1) time. It uses O(n) memory but with a relatively large constant factor; this is OK if you have little constraint on memory. I'll write in a non-programming-language specific way since I don't use Swift.

维护一个字典,其中每个字段组合(包括通配符)都映射到一组 Element 对象.这可能是最容易用示例描述的:假设元素是:

Maintain a dictionary where each combination of fields, including wildcards, maps to a set of Element objects. This is probably easiest to describe by an example: suppose the elements are:

X = (a=1, b=2, c=3)
Y = (a=1, b=3, c=4)
Z = (a=2, b=3, c=4)

然后您的字典就像:

{
    // just a
    (1, nil, nil): {X, Y}, (2, nil, nil): {Z},
    // just b
    (nil, 2, nil): {X}, (nil, 3, nil): {Y, Z},
    // just c
    (nil, nil, 3): {X}, (nil, nil, 4): {Y, Z}}`.
    // a and b
    (1, 2, nil): {X}, (1, 3, nil): {Y}, (2, 3, nil): {Z},
    // a and c
    (1, nil, 3): {X}, (1, nil, 4): {Y}, (2, nil, 4): {Z},
    // b and c
    (nil, 2, 3): {X}, (nil, 3, 4): {Y, Z},
    // a, b, c
    (1, 2, 3): {X}, (1, 3, 4): {Y}, (2, 3, 4): {Z}
}

本质上,数据结构仅存储每个可能查询的答案(没有结果的查询除外).

Essentially, the data structure just stores the answer to every possible query (except those with no results).

要添加或删除元素,需要更新属于2 ^ 3-1 = 7个键的集合,并可能添加具有新集合的新键或删除键.如果将哈希表用于字典和集合,则相关的字典/集合操作都需要O(1)时间,并且需要执行的此类操作数为O(1),因此可以从数据结构中添加或删除数据花费O(1)时间.

To add or remove an element requires updating the sets belonging to 2^3 - 1 = 7 keys, and possibly adding a new key with a new set, or removing a key. If hashtables are used for the dictionary and the sets, the relevant dictionary/set operations all take O(1) time, and the number of such operations which need to be done is O(1), so adding or removing from the data structure takes O(1) time.

要通过字段值的某种组合进行查询,只需在字典中查找查询,然后返回相应的集合(或返回不可修改的视图)即可.

To query by some combination of field values, simply look up the query in the dictionary, and return the corresponding set (or return an unmodifiable view).

所有这些都说明,对于仅300个元素,值得比较天真"的解决方案,在该解决方案中,您仅将元素存储在列表中,并检查每个查询的整个列表.300不是一个很大的数字,可能对于O(1)与O(n)而言并不足够大.根据编程语言的不同,分支预测和CPU缓存等其他详细信息可能会比理论上的复杂性产生更大的影响.我建议对这两个选项进行基准测试.

All of this said, for only 300 elements it is worth comparing the "naive" solution where you just store the elements in a list and check the whole list for each query. 300 is not a large number, perhaps not large enough for O(1) vs. O(n) to matter. Depending on programming language, other details like branch prediction and the CPU cache could have more effect than theoretical complexity. I'd recommend benchmarking both options.

这篇关于具有多个条件的快速查找的数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆