连接红黑树 [英] Concatenating red-black trees

查看:162
本文介绍了连接红黑树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OCaml标准库具有精美的 Set 实现,它使用非常有效的除法和算法来计算 union 两套。我相信它需要整个子树(不仅仅是单个元素),并将它们插入到另一个集合中,必要时重新平衡。



我想知道这是否需要保存在OCaml使用的AVL树中的高度信息,或者也可以使用红黑树。例如,是否可以更有效地连接一对红黑树,而不是简单地遍历第二个树,将其元素附加到第一个树的末尾?

解决方案

如果您对设置联合或级联或两者都感兴趣,或者您只对OCaml中常见的持久性数据结构感兴趣,则我不确定您的问题的措辞,也在短暂的结构中。



红黑树与手指是由Heather D. Booth在她的论文的一章中描述的。用手指,一个大小为n的红黑树可以分成两个大小为p和q的树,分摊的O(lg(min(p,q)))时间,两个大小为p和q的红黑树可以连接在同一个界限。另外,一个元素可以在摊销的O(1)时间的rb树的任一端添加或删除。通过这些操作,可以获得信息理论上最优的分摊O(p lg(q / p))时间集合(p

上面的界限在传统意义上是摊销的。对于像OCaml这样的功能语言,当持续使用数据结构时,可能希望有适用的界限。当树永久使用时,我不认为布斯的描述将会实现所有这些界限。例如,在手指上插入可以采用ω(1)重新着色。这可能会通过 Driscoll讨论的懒惰色彩来解决等等的使数据结构持续存在



另一方面,我认为Booth的分析可能表明,连接仍然是O(lg( max(p,q)))即使持续使用。我对设定的联合约束不太乐观。



在功能设置中设置具有渐近最优时间范围的操作。这些由Hinze& Paterson 实现了摊销(但持续)的界限,由Blandford& Blelloch以随机的方式实现界限,以及由Kaplan& Tarjan 在最坏的情况下实现它们。后者还提供O(lg lg(min(p,q)))级联,尽管Hinze&帕特森对这个说法是可疑的。这些树并不是对你的问题的直接的回答,它是针对红黑树的,但是他们希望给出一些可能的东西,H& P的论文包括代码,验证了rel =nofollow noreferrer> relam =nofollow noreferrer>,这可以提取到OCaml代码。



您可能还有两个可能感兴趣的指针: Brodal et al。使用O(lg n)查找,插入和删除和O(1)concat甚至在功能设置中呈现搜索树。另外, Atallah et al。声称描述已经摊销O(1)concat(大概只是短暂的))的红黑树,但是 Buchsbaum和Goodrich声称该结构中有几个缺陷



关于实用程序的最后一个注释红黑树:在对这个问题的答案的评论之一,你说:


唯一的优点是红黑树是辅助信息(红色或黑色)每个分支只有1位。通过添加高度,您已经失去了这个优势,也可以使用高度平衡的树。


还有其他优点以及。例如,在计算几何中使用的一些数据结构是基于二叉搜索树,但是具有很高的树形旋转成本。 红黑树可以重新平衡每次插入和删除最多3轮,而 AVL树可以采取Ω(lg n)轮播这些操作由于Ralf Hinze注意到冈崎红黑树的重新平衡计划(代码可用在 ML 中, Haskell Java和Ada )不提供相同的约束,最终可以做Ω(lg n)插入旋转。 (Okasaki不存在删除)



此外,高度平衡搜索树(甚至AVL树)可以存储,以便仅使用一位余额信息节点。一些树在每个节点上只有两个可能的平衡位置,如单侧高度平衡的树,但是每个节点最多有四个可能的平衡位置的树可以在每个小孩中存储一位平衡信息,如最初由布朗解释及更高版本由Haeupler等人扩展



编辑: b>



在问题结束时回答您的具体查询,以下是连接两个红黑树的算法的描述。它需要O(lg(max(| L |,| R |)))时间,这太长,无法获得上面我描述的渐近最优联合时间。为了比较,我希望 OCaml的stdlib中的AVL集合的join实现获得O(h1-h2)性能,其中h1是较高树的高度,尽管它实际上连接了两个适合的元素的AVL树在它们之间,而下面的算法必须从其中一个参数中找到并删除该迫击炮元素。你可以通过在叶子上存储元素来避免这种情况,就像在B +树中一样,但是它具有空间损失,它必须保留一堆指向非叶节点中的元素的指针来指导搜索。在任何情况下,不要像OCaml stdlib中的AVL连接代码一样,使同一高度的树连接不变,因为您仍然必须计算每个树的黑色高度,如下所述。



给定两个非空的红黑树L和R,我们将产生一个新的红黑树,它是L和R的并置。这将与O( lg
(max(| L |,| R |))),其中| L |表示L中的节点数。



首先,从L中删除最大的元素,c。接下来,找到L和R的黑色高度。黑色高度表示从
根到叶子的任何路径上的黑色节点数。通过红黑树不变量,这在任何给定的树的所有路径上是不变的。调用L's黑色高度p和R的黑色高度q,并假设
w.l.o.g. p&le q。



从R的根中,跟随左边的孩子,直到到达具有高度p的黑色节点R'。用根元素c,左子L和右
孩子R'创建一个新的红树C。因为L是一个红黑树,它的根是黑色的,颜色不变量不会在C以下或者C以下。另外,C
的黑色高度是p。



但是,我们不能简单地将C连接到R代替R'。首先,如果p = q,R'是R,而C具有红根。在这种情况下,只需重新着色C黑色的根。这是你的
新的连接树。



其次,如果R'不是根,它可能有一个红色的父母。红色的父母不得有红孩子,所以我们必须重新平衡。在这里,我们只是在R'(现在替换为C)和R的根之间应用冈崎的平衡
计划。



有两个可能的情况。如果C没有祖父母,C的父母是黑色的。该树现在有效。



如果C有一个祖父母,它必须是黑色,黑色高度p + 1,因为C的父母是红色的。用一棵新的红树替代C的祖父母,其根部是C的父母的根,其中的
的孩子是C,重新黑色,其右边的孩子是一个黑色的树,由C的兄弟姐妹组成,C的祖父母的根,C的叔叔,在
的订单。这不会增加C的祖父母的黑色高度,但它的颜色会变成红色,这可能会使它成为一个红色的
父母的根或红孩子,所以我们必须重新平衡,所有这一切寻找树的方式




  • 查找两棵树的黑色高度:O(lg | L |)+ O(lg | R |)

  • 将R追溯到正确的位置:O(lg | R | - lg | L |)

  • 一直向上旋转到root:O(lg | R | - lg | L |)



这些都不大于O(lg | R |为了使这个O(lg(min(| L |,l | | | L |)= O(lg(| L |,| R |)))



< | R |))),首先反转脊柱指针。然后,您不需要较大树的黑色高度,您只需要计数黑色脊柱节点,直到一棵树用完脊椎。然后,使用原始(不是Okasaki的)重新平衡方案来确保只重新平衡O(1)个节点。最后,如果有需要的话,标记不需要重新平衡的脊柱的其余部分。如果有需要的话。


The OCaml standard library has a wonderful Set implementation that uses a very efficient divide-and-conquer algorithm to compute the union of two sets. I believe it takes whole subtrees (not just single elements) from one set and inserts them into the other set, rebalancing when necessary.

I'm wondering if this requires the height information that is kept in the AVL tree that OCaml uses or if this is also possible with red-black trees. For example, is it possible to concatenate a pair of red-black trees more efficiently than simply iterating over the second tree appending its elements to the end of the first tree?

解决方案

I'm not sure from the wording of your question if you are interested in set union or concatenation or both, or if you're only interested in persistent data structures as are common in OCaml or also in ephemeral structures.

An implementation of red-black trees with fingers is described by Heather D. Booth in a chapter from her thesis. With fingers, a red-black tree of size n can be split into two trees of size p and q in amortized O(lg (min (p,q))) time and two red-black trees of size p and q can be concatenated in the same bound. Additionally, an element can be added or deleted at either end of an rb tree in amortized O(1) time. With these operations, it is possible to achieve amortized O(p lg(q/p)) time set union (for p < q), which is information-theoretically optimal. Perhaps the key idea to get these bounds is the reversal of the child pointers on the left and right spines.

The bounds above are amortized in the traditional sense. For a functional language like OCaml, one might wish to have bounds that apply when a data structure is used persistently. I do not think Booth's description will achieve all of those bounds when the trees are used persistently. For example, insertion at a finger can take ω(1) recolorings. This might be solved via the lazy recolorings discussed in Driscoll et al.'s "Making Data Structures Persistent".

On the other hand, I think Booth's analysis might show that concatenation is still O(lg (max (p,q))) even when used persistently. I'm less optimistic about the set union bound.

Set operations with asymptotically optimal time bounds are possible in a functional setting. Those described by Hinze & Paterson achieve the bounds in an amortized (but persistent) sense, the treaps described by Blandford & Blelloch achieve the bounds in a randomized sense, and those described by Kaplan & Tarjan achieve them in worst-case time. The latter also offer O(lg lg (min(p,q))) concatenation, though Hinze & Paterson are dubious of that claim. These trees are not a direct answer to your question, which is specific to red-black trees, but they hopefully give a flavor of what is possible, and the H&P paper includes code, and has been verified correct using Coq, which can extract to OCaml code.

Two more pointers you might be interested in: Brodal et al. presented search trees with O(lg n) find, insert, and delete and O(1) concat even in a functional setting. Additionally, Atallah et al. claim to describe a red-black tree that has amortized O(1) concat (presumably ephemerally only), but Buchsbaum and Goodrich claim that there are several flaws in that structure.

One final note about the utility of red-black trees: in one of the comments on one of the answers to this question, you say:

The only advantage of a red-black tree is that the auxiliary information (red or black) is only 1-bit per branch. By adding height, you've lost that advantage and might as well just use a height-balanced tree instead.

There are other advantages as well. For instance, some data structures used in computational geometry are based on binary search trees but have a high cost of tree rotation. Red-black trees can be rebalanced in at most 3 rotations per insert and delete, while AVL trees can take Ω(lg n) rotations for these operations. As Ralf Hinze noticed, Okasaki's rebalancing scheme for red-black trees (code available in ML, Haskell, Java, and Ada) does not offer the same bound, and can end up doing Ω(lg n) rotations on insertion. (Okasaki does not present deletion.)

Additionally, height-balanced search trees (and even AVL trees) can be stored so as to use only one bit of balance information per node. Some trees have only two possible balance positions at each node, like one-sided height-balanced trees, but trees with up to four possible balance positions per node can store one bit of balance information in each child, as initially explained by Brown and later expanded upon by Haeupler et al.

Edit:

In answer to your specific query at the end of your question, here is a description of an algorithm for concatenating two red-black trees. It takes O(lg(max(|L|,|R|))) time, which is too long to get the asymptotically optimal union time I describe above. For comparison, I expect that the "join" implementation for AVL sets in OCaml's stdlib gets O(h1-h2) performance, where h1 is the height of the taller tree, though it actually joins two AVL trees given an element that fits between them, while the algorithm below has to find and remove that mortar element from one of its arguments. You could avoid that by only storing elements at the leaves, as in a B+ tree, but that has a space penalty of having to keep a bunch of pointers to elements in the non-leaf nodes to guide search. In any case, it wouldn't make join constant time for trees of the same height like the AVL join code in the OCaml stdlib, since you would still have to calculate the black height of each tree, as explained below.

Given two non-empty red-black trees L and R, we will produce a new red-black tree that is the concatenation of L and R. This will take time proportional to O(lg (max(|L|,|R|))), where |L| denotes the number of nodes in L.

First, remove the largest element from L, c. Next, find the black height of L and R. By "black height", I mean the number of black nodes on any path from the root to a leaf. By the red-black tree invariants, this is constant on all paths of any given tree. Call L's black height p and R's black height q, and assume w.l.o.g. p ≤ q.

From the root of R, follow left children until arriving at a black node R' with height p. Make a new red tree C with root element c, left child L and right child R'. Since L is a red-black tree on its own, its root is black, and the color invariants are not violated at or below C. Furthermore, the black height of C is p.

However, we cannot simply splice C back into R in place of R'. First, if p = q, R' is R, yet C has a red root. In this case, simply recolor the root of C black. This is your new concatenated tree.

Second, if R' is not the root, it may have a red parent. Red parents are not permitted to have red children, so we must rebalance. Here we just apply Okasaki's rebalancing scheme all the way up the spine between R' (now replaced with C) and the root of R.

There are two possible cases. If C has no grandparent, color C's parent black. The tree is now valid.

If C has a grandparent, it must be black and of black height p+1, since C's parent is red. Replace C's grandparent with a new red tree, the root of which is the root of C's parent, the left child of which is C, recolored black, and the right child of which is a black tree that consists of C's sibling, C's grandparent's root, and C's uncle, in that order. This doesn't increase the black height of C's grandparent, but it changes its color to red, which might make it a root or a red child of a red parent, so we have to rebalance again, and so on all the way up the tree

  • Finding the black height of both trees : O(lg |L|) + O(lg |R|)
  • Tracing down R to the right spot: O(lg |R| - lg |L|)
  • Rotations all the way back up to the root: O(lg |R| - lg |L|)

None of these is greater than O(lg |R| + lg |L|) = O(lg (max(|L|,|R|)))

To make this O(lg (min(|L|,|R|))), first reverse the spine pointers. Then you don't need the black height of the larger tree, you only need to count black spine nodes until one tree runs out of spine. Then, use the original (not Okasaki's) rebalancing scheme to make sure you only rebalance O(1) nodes. Finally, mark the rest of the spine that doesn't need rebalancing for lazy recoloring if necessary later.

这篇关于连接红黑树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆