数据结构的连续整数大范​​围? [英] Data structure for large ranges of consecutive integers?

查看:161
本文介绍了数据结构的连续整数大范​​围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个大范围的连续的整数中存储器,其中每一个属于一个类别。两个操作必须是O(log n)的:移动一个范围从一类到另一个,并且寻找类别计数在给定范围

我是pretty的确定第二次手术就解决了平凡给出一个正确实施的第一次手术。

每个整数开始的一类,所以我开始了一组平衡BSTS的。移动子树从一个BST到另一个(例如,移动至不同的类别)拥有相当于一个运行时合并两个BSTS,这是O(N1 * N2)的 1

这是太慢了(在Python,C是不是一种选择),我不能想出一个办法来利用我的数据的内在结构,建立一个高效的BST合并操作。

我现在在看AVL,红黑,和区间树,二叉堆和treaps。比较它们的属性是压倒性的。其中结构我应该使用?

修改(问题澄清):我是灵活的我如何保存这些价值和创建我的数据结构。我不灵活我是如何得到我的投入,这是从另外一个应用程序,如下所示:类别(CAT3,I,J)。我目前的解决方案创建与范围内的每个整数节点树。这是我的数据集的大小太慢了,所以我很高兴重新架构如果给一个更好的方法。

任何给定的请求可以移动的整数的任何可能的范围到任何类别。换句话说,范围在类别(CAT1,1,10)然后按类别(CAT3,5,15)<感/重叠code>,但在这个意义上,每一个整数将在恰好一个类别,在任何给定的时间上不重叠。

解决方案

据我了解的问题,你有一个区间[A,B]和形式的查询 -

  1. 分配一个特定的范围内,以目录

例如。
R1 R2 C1
R3 R4 C2

  1. 查询一个范围,适用于特定类别的总项数。 例如。找到类别计数R1 R4

我描述这个例子使用字典上面给出一个简单的实现方式是行不通的 -

假设我们有一个区间[1000,5000]

和我们做的分配如下: -

1 2 C1
2 3 C2
3 4 C3
......
4999 5000 C4999

现在,我们做如下分配

1 5000 C5555

这将使updation /更改/删除以$ P $范围O(N),其中N是范围最大尺寸(B - A)的pviously分配范围

  

D ['类'] =设置(of_all_the_ranges_you_have_in_category)

在从C1,C2 ... C4999将用于最后分配相应的集类这种情况下删除previous范围(1 5000 C5555)

  

1:{停止:5,类别:C1},   6:{停止:19,类别:C23}

下面updation每个起始值类(1,2,3,4 ......,4999)将需要最后一次转让(1 5000 C5555)

有一个更好的选择,这将使范围updation在O(LG N)将段树 ( http://en.wikipedia.org/wiki/Segment_tree

对于上面的例子中,段树将是这个样子

  1000:5000
                      |
             ---------------------
             | |
           1000:3000 3001:5000
            | |
    ---------------- --------------------
   | | | |
 1000:2000 2001:3000 3001:4000 4001:5000
 

.............................. .................. .................................................. .............等等

叶节点将具有范围(1:2,2:3,...)

您可以分配一个值类到每个节点并给出一个区间遍历树分割的间隔适当地(例如,2500〜4500鸿沟2500:3000和3001:4500,并继续在两个方向,直到相匹配的范围节点被发现)。

现在一个有趣的事情是,当你需要的时候更新节点的孩子。例如不要继续做这样1 5000 C5555任务时立即更新儿童。这件事情被称为懒惰的传播,你可以了解更多有关在这里( http://www.spoj.pl/forum/viewtopic.php?f=27&t=8296 )。

现在的查询部分。如果类别的数量都非常小,可以保持每个节点的频率表,并在需要时和需要,否则,当懒传播更新的范围内,你将不得不从叶遍历整个树节点,计算和复杂性将成为Ø (N)。

我想查询一个更好的解决方案可能存在。它不来我的脑海里。

更新 让我们举一个小例子。

范围[1,8]

分类允许{C1,C2}

  1:8
     1:4 5:8
     1:2 3:4 5 6 7:8
 1:1的2:2 3:3 4:4 5:5 6:6 7:7 8:8
 

每个节点将有3场[类别,category_counts [],children_update_required =假]

1 5 C1

查询将得到分和节点1:4的类别将被设置为C1和children_update_required将被设置为true,其子女将不会被更新,现在(还记得仅在需要时更新或懒惰传播)。还节点5:5的类别将被设定为C1

3 4 C2

查询将沿着朝向3树传播:4(并在此过程达到3:4,1:2和3:4的类别将被更新,以C1,1:4的children_update_required将被设置为假,1: 2的和3:4的children_update_required将被设置为真),现在将更新的3类:根据电流要求4至C 2。接着,它会设置3条规定children_update:4是真的为它的孩子将来updation(这是已经在这样做了case..no伤害集)

Suppose you have a large range of consecutive integers in memory, each of which belongs to exactly one category. Two operations must be O(log n): moving a range from one category to another, and finding the category counts for a given range.

I'm pretty sure the second operation is solved trivially given a correct implementation for the first operation.

Each integer begins in a category, so I started with a set of balanced BSTs. Moving a subtree from one BST to another (eg, moving a range to a different category) has a runtime equivalent to merging two BSTs, which is O(n1 * n2)[1].

This is too slow (in python, and C is not an option), and I couldn't figure out a way to take advantage of my data's inherent structure to create an efficient BST merge operation.

I'm now looking at AVL, red-black, and interval trees, binary heaps, and treaps. Comparing their properties is overwhelming. Which structure should I use?

Edit (problem clarification): I am flexible on how I store these values and create my data structures. I am inflexible about how I receive my input, which comes from another application, and looks like the following: CATEGORY(cat3, I, J). My current solution creates a tree with a node for each integer in the range. This is too slow for the size of my dataset, so I'm happy to re-architect if given a better method.

Any given request can move any possible range of integers into any category. In other words, ranges are overlapping in the sense of CATEGORY(cat1, 1, 10) followed by CATEGORY(cat3, 5, 15), but non-overlapping in the sense that every integer will be in exactly one category at any given time.

解决方案

As far as I understood the question you have a range [A, B] and queries of the form -

  1. Assign a particular range to a category

E.g.
R1 R2 C1
R3 R4 C2

  1. Query a range for the total number of items in particular categories. E.g. find count of categories in R1 R4

A simple implementation using dictionaries as given above would not work as I describe by this example -

suppose we have a range [1000, 5000]

and we make assignment as follows -

1 2 C1
2 3 C2
3 4 C3
......
4999 5000 C4999

Now we make the following assignment

1 5000 C5555

This will make updation/changes/deletion to previously assigned ranges of the ranges O(N) where N is maximum size of range (B - A)

D['category'] = set(of_all_the_ranges_you_have_in_category)

In this case deletion of previous ranges from corresponding sets in categories C1,C2...C4999 will be needed for last assignment ( 1 5000 C5555 )

OR

1 : { "stop" : 5, "category" : "C1"}, 6 : { "stop" : 19, "category" : "C23"},

Here updation of category for each starting value (1,2,3,4...,4999) will be required for last assignment ( 1 5000 C5555 )

A better option that will make updation of ranges in O(lg n) would be segment trees (http://en.wikipedia.org/wiki/Segment_tree )

For the above example the segment tree will look something like this

                   1000:5000
                      |
             ---------------------
             |                   |
           1000:3000         3001:5000
            |                    |
    ----------------      --------------------
   |               |      |                  |
 1000:2000     2001:3000   3001:4000       4001:5000

................................................................. ............................................................... and so on

The leaf nodes will have ranges (1:2, 2:3,...)

You can assign a value "category" to each node and given an interval traverse the tree dividing the interval appropriately ( e.g for 2500 to 4500 divide in 2500:3000 and 3001:4500 and proceed in both directions until nodes with matching ranges are found).

Now an interesting thing is update the children of the nodes when you need them. For example do not proceed updating the children immediately when doing assignments like 1 5000 C5555. This thing is called lazy propagation and you can learn more about it here (http://www.spoj.pl/forum/viewtopic.php?f=27&t=8296 ).

Now for query part. If the number of categories are very small, you can maintain a frequency table at each node and update the ranges when required and lazy propagate when needed otherwise, you will have to traverse the whole tree from leaf to node, counting and complexity will become O(n).

I think a better solution to querying may exist. It's not coming to my mind.

UPDATE Lets take a small example.

Range [1,8]

Categories allowed {C1, C2}

        1:8
     1:4         5:8
     1:2  3:4      5:6    7:8
 1:1 2:2 3:3 4:4  5:5 6:6 7:7 8:8

Each node will have 3 fields [category, category_counts[], children_update_required = false]

1 5 C1

Query will get divided and nodes 1:4's category will be set to C1 and children_update_required will be set to true, its children will not be updated now (remember update only when required or lazy propagation). Also node 5:5's category will be set to C1

3 4 C2

Query will propagate along the tree towards 3:4 (and in the process to reach 3:4, 1:2 and 3:4's categories will be updated to C1, 1:4's children_update_required will be set to false, 1:2's and 3:4's children_update_required will be set to true) and now will update the category of 3:4 to C2 according to current requirement. Next it will set children_update required of 3:4 to be true for future updation of its children (which is already set in this case..no harm done).

这篇关于数据结构的连续整数大范​​围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆