什么是压缩树,它如何工作? [英] What is a zip tree, and how does it work?

查看:162
本文介绍了什么是压缩树,它如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我听说过一种新的平衡BST数据结构,称为 zip树.什么是邮编树?如何运作?

解决方案

从总体上讲,一棵zip树是

  • 随机平衡二叉搜索树,
  • 这是将跳过列表编码为BST的一种方式,并且
  • 使用一对称为 zipping unzipping 的操作,而不是树的旋转.

第一个要点-zip树是随机的,平衡的BST-使人感觉到zip树在较高水平上可以实现.这是一种平衡的二叉搜索树,与挖掘类似,与红色/黑色树不同,它使用随机化来平衡树.从这种意义上讲,不能保证一棵zip树是平衡树,而是很有可能被平衡.

第二个要点-zip树是跳过列表的编码-显示zip树来自何处以及直观上为什么它们是平衡的.您可以将zip树视为采取随机跳表数据结构的一种方法,该结构支持预期时间O(log n)中的所有主要操作,并将其表示为二进制搜索树.这提供了直觉树的来源以及为什么我们期望它们如此之快的直觉.

第三个要点-zip树使用 zipping unzipping 而不是树旋转-解释了zip树的名称以及将其编码的感觉.压缩树不同于其他类型的平衡树(例如,红色/黑色树或AVL树),因为节点不是通过旋转而是通过一对将较大的节点链转换为两个较小的链或反之亦然.

此答案的其余部分将深入探讨zip树的来源,它们的工作方式和结构.

评论:跳过列表

要了解zip树的来源,让我们从回顾另一个数据结构(跳过列表)开始. skiplist 是一种数据结构,类似于二进制搜索树,它按排序顺序存储元素的集合.速滑运动员不是树结构.相反,跳过列表的工作方式是通过几层链接列表按排序顺序存储元素.示例跳过列表如下所示:

如您所见,元素以排序顺序表示.每个元素都有一个关联的 height ,并且是等于其高度的许多链接列表的一部分.跳过列表的所有元素都参与了下一层.理想情况下,大约一半的节点将位于其上方的层中,大约四分之一的节点将位于其上方的层中,大约八分之一的节点将位于其上方的层中,依此类推.以后可以使用.)

要在跳过列表中进行查找,我们从最顶层开始.我们在跳过列表中向前走,直到(1)找到要查找的元素,(2)找到比要查找的元素大的元素,或者(3)找到列表的末尾.在第一种情况下,我们将香槟开瓶并进行庆祝,因为我们发现了我们要寻找的物品,没有更多要做.在第二种或第三种情况下,我们已经夸大"了所要查找的元素.但这没什么可担心的-实际上,这很有用,因为这意味着我们正在寻找的对象必须位于我们击中该快照"的节点与该节点之前的节点之间.因此,我们将转到上一个节点,下拉一层,然后从那里进行搜索.

例如,这是我们搜索47的方式:

在这里,蓝色的边缘表示链接,这些链接紧跟着我们向前移动的位置,红色的边缘表示我们超调并决定向下下降的位置.

关于跳转列表如何工作的一个强有力的直觉-我们稍后将过渡到zip树时需要-跳转列表的最顶层将跳转列表的其余元素划分为不同的范围.您可以在这里看到它:

直觉上,如果我们能够跳过大多数元素,则跳过列表搜索将是快速"的.例如,想像一下,跳转列表的倒数第二层仅存储该跳转列表的所有其他元素.在这种情况下,遍历倒数第二层的速度是遍历底层的两倍,因此我们希望从倒数第二层开始的查找所花的时间是在末层开始的查找所花时间的一半.底层.类似地,假设上面的那一层仅存储它下面一层中的所有其他元素.然后,在那个层中进行搜索所需的时间大约是在其下一层进行搜索所花费的时间的一半.更笼统地说,如果每个图层仅存储其下面图层的大约一半元素,那么我们可以在搜索过程中跳过跳过列表中的大量元素,从而获得良好的性能.

跳过列表通过使用以下规则来完成此操作:每当我们将元素插入到跳过列表中时,我们都会掷硬币直到获得正面.然后,我们将新插入的节点的高度设置为最终投入的硬币数量.这意味着它有50%的机会保留在当前层中,而有50%的机会移至其当前层上,这意味着,总的来说,大约一半的节点将仅位于底层,大约是一半左边将是其上一层,大约一半将是其上一层,依此类推.

(对于那些有数学背景的人,您也可以说跳过列表中每个节点的高度都是Geom(1/2)随​​机变量.)

下面是一个示例,该示例使用1的高度将42插入到上面显示的跳过列表中:

从跳过列表中删除也是一个相当简单的操作:我们只是将其拼接到碰巧出现的任何链表中.这意味着,如果要删除刚刚从上述列表中插入的42个,我们将最终得到与我们开始时相同的跳过列表.

基于每个列表中项目的数量大约是列表中项目数量的一半这一事实,可以显示出在跳过列表中进行插入,删除或查找的预期成本为O(log n).在它下面的一个.(这意味着我们希望看到O(log n)层,并且在每个层中只执行恒定数量的步骤.)

从Skiplists到Zip Trees

现在,我们已经审查了跳过列表,让我们讨论一下zip树的来源.

让我们想象一下,您正在查看跳过列表数据结构.您确实喜欢每个操作的预期O(log n)性能,并且喜欢它在概念上的简单程度.这只是一个问题-您真的不喜欢链表,并且在链表的层上构建东西的想法并没有使您兴奋.另一方面,您真的喜欢二进制搜索树.他们有一个非常简单的结构-每个节点只有两个指针离开它,并且有一个简单的规则来确定所有放置位置.这个问题自然而然地出现了:除了BST形式之外,您还能获得一份跳过列表的所有好处吗?

事实证明,有一种非常好的方法可以做到这一点.假设您在此处显示了跳过列表:

现在,假设您在此跳过列表中执行查找.该搜索将如何工作?好吧,您总是从扫描跳过列表的顶层开始,继续前进,直到找到一个比您要查找的键大的键为止,或者直到您找到列表的末尾并且发现没有键为止.顶层有更多节点.然后,您可以从那里下降"一个级别到一个子跳过列表中,该列表仅包含您访问的最后一个节点与上一个节点之间的键.

可以将此完全相同的搜索建模为BST遍历.具体来说,这是我们如何将跳过列表的顶层表示为BST:

请注意,所有这些节点都链接到右侧,其思想是在跳过列表中向前扫描"对应于访问越来越大的键".在BST中,从一个节点移动到较大的节点对应于向右移动,因此节点链向右移动.

现在,BST中的每个节点最多可以有两个子代,并且在上面显示的图片中,每个节点可以有零个子代或一个子代.如果我们通过标记缺失的孩子对应的范围来填写它们,我们将得到这个结果.

嘿,等等!看起来BST似乎像跳过列表一样对键的空间进行了分区.这是有希望的,因为它表明我们正在做一些事情.另外,它为我们提供了一种填充树的其余部分的方式:我们可以将跳过列表的子范围递归转换为它们自己的BST,并将整个内容粘合在一起.如果这样做,我们得到的树将对跳过列表进行编码:

我们现在有一种将跳过列表表示为二进制搜索树的方法.太酷了!

现在,我们可以反过来吗?也就是说,我们可以从BST转到跳转列表吗?通常,没有一种独特的方法可以做到这一点.毕竟,当我们将跳过列表转换为BST时,我们确实丢失了一些信息.具体来说,跳过列表中的每个节点都有一个关联的高度,而我们BST中的每个节点也都有一个高度,但它与跳过列表节点的高度并没有紧密联系.为了解决这个问题,让我们用它来自的跳过列表节点的高度标记每个BST节点.显示在这里:

现在,出现了一些不错的模式.对于初学者,请注意每个节点的关联编号大于其左子节点的编号.这是有道理的,因为向左的每一步都对应于进入跳过列表的子范围,因此节点的高度会降低.同样,每个节点的关联编号大于或等于其右侧子节点的编号.这又很有意义-向右移动都意味着

  • 继续以我们已经到达的高度前进,在这种情况下,高度保持不变,或者
  • 击中范围的末端并下降到子范围,在这种情况下,高度会减小.

我们能否再说说树的形状?我们当然可以!例如,在一个跳过列表中,通过翻转硬币直到获得正面,然后计算翻转的硬币总数来选择每个节点的高度.(或者,像以前一样,以概率1/2进行几何分布).因此,如果我们想构建一个与跳过列表相对应的BST,我们希望分配给节点的数字以相同的方式计算.

将这三个规则放在一起,我们得到以下内容,它定义了我们的树的形状,即zip树!

zip树 是二进制搜索树,其中

  • 每个节点都有一个关联的编号,称为其 rank .通过翻转硬币直至翻转头部,然后计数扔掉的硬币总数,来随机地将等级分配给每个节点.
  • 每个节点的等级严格大于其左子节点的等级.
  • 每个节点的等级大于或等于其右子节点的等级.

令人惊奇的是,通过编写这样的简单规则,可以将像跳过列表之类的内容表示为BST!

插入元素:解压缩

让我们假设您有一个邮编树.您如何将新元素插入其中?

我们原则上可以通过纯粹看上面给出的规则来回答这个问题,但是我认为记住 zip树是变相的跳过列表来解决这个问题要容易得多.例如,这是上面的zip树及其关联的跳过列表:

现在,假设我们要在该zip树中插入18.为了了解这种情况如何发生,想象一下我们决定给18赋予2的等级.与其来看看zip树,不如看一下如果我们插入到跳过列表中会发生什么.这将导致此跳过列表:

如果我们要使用此跳过列表并将其编码为zip树,则会得到以下结果:

有趣的是,即使我们不知道如何执行插入操作,我们也可以看到树插入后的样子.然后,我们可以通过对这些之前"和之后"图片进行反向工程来尝试弄清楚插入逻辑的外观.

让我们考虑一下此插入对我们的zip树所做的更改.首先,让我们回想一下我们如何将跳跃列表编码为zip树.具体来说,跳过列表中同一级别的节点链没有中间的较高"元素映射到zip树中向右倾斜的节点链.将元素插入到跳过列表中相当于在其中一个级别中添加了一些新元素,其效果是(1)在跳过列表的某些级别中添加新的内容,以及(2)从以前的跳过列表中获取元素链在某种程度上相邻,然后断开这些连接.

例如,当我们在此处显示的跳过列表中插入18时,我们在此处突出显示的蓝色链中添加了一些新内容,并且断开了此处显示的所有红色链:

在我们的zip树中将翻译成什么?好吧,我们可以突出显示此处插入项目的蓝色链接,以及剪切的红色链接:

让我们看看我们是否可以解决这里发生的事情.幸运的是,这里的蓝色链接很容易找到.假设我们进行常规的BST插入,将18添加到树中.这样做时,我们会在达到这一点时暂停:

请注意,我们已经击打了一个与我们等级相同的钥匙.这意味着,如果我们要继续向右移动,我们会找出跳过列表的这一区域:

要找到蓝色的边缘(我们要去的地方),我们只需要沿着该节点链向下走,直到找到一个比我们大的节点.然后,蓝色边缘-我们的插入点-由该节点与其上方节点之间的边缘给出.

我们可以用另一种方式识别此位置:我们发现蓝色边缘-我们的插入点-当我们到达要插入的节点(1)的等级大于左侧节点的等级时,,(2)的等级大于或等于右侧的节点,(3)如果右侧的节点具有相同的等级,则我们要插入的新项目小于右侧的项目.前两条规则确保我们将插入正确的列表中,最后一条规则确保将我们插入正确的列表中.

现在,我们的红色边缘在哪里?直观地,这些是被剪切"的边缘,因为已将1​​8添加到了跳过列表中.这些将是先前位于蓝色边缘相对两端的两个节点之间的项目,但是需要将该节点划分为该蓝色边缘的拆分版本所定义的新范围.

幸运的是,这些边缘出现在非常好的地方.这是他们映射到的位置:

(在这张图片中,我将新节点18放置在我们在跳过列表中标识的蓝色边缘的中间.这使结果不保持BST,但是我们将在一分钟内对其进行修复.)

请注意,如果我们要完成常规的BST插入操作,这些边缘将与我们遇到的边缘完全相同-这是通过查找18找出的路径!这里确实发生了一些非常好的事情.注意

  • 每次我们向右移动时,该节点在被切割时都将向右移至18,并且
  • 每次我们向左移动时,该节点在被切割时都将移至18的左侧.

换句话说,一旦我们找到插入的蓝色边缘,我们就像往常一样进行插入,继续走动,跟踪我们左移的节点和右移的节点.然后,我们可以将所有左侧的节点链接在一起,并将所有右侧的节点链接在一起,将结果粘贴到新节点下.如下所示:

此操作称为 解压缩 ,在这里我们从中获得名称"zip tree".名称有点意思-我们采用两个交错的结构(左右链),并将它们分成两个更简单的线性链.

总结:

将x插入zip树的工作方式如下:

  1. 通过掷硬币并计算需要翻转多少次才能为x分配随机等级.
  2. 搜索x.到达节点后,请停止搜索
    • 该节点的左子节点的等级低于x,
    • 节点的右子节点的等级小于或等于x,并且
    • 如果节点的右子级与x相同,则其密钥大于x.
  3. 执行 解压缩 .具体来说:

    1. 像以前一样继续搜索x,记录我们向左移动和向右移动的时间.
    2. 通过使每个节点成为先前访问的向左移动节点的左子节点,将所有节点捆绑在一起.
    3. 通过使每个节点成为先前访问的向右移动节点的合适子节点,将所有节点正确地束缚在一起.
    4. 使这两个链成为节点x的子代.

您可能会注意到,此解压缩"过程等效于您执行其他操作时所得到的过程.通过像平常一样插入x,然后使用树的旋转将x越来越高地拉到树中,直到它停在正确的位置,您可以获得相同的结果.这是执行插入的一种完全有效的替代策略,尽管它稍慢一些,因为需要在树上进行两次遍历(从上到下的遍历插入叶子,然后由下而上的遍历进行旋转).

删除元素:压缩

现在我们已经了解了如何插入元素,如何删除它们?

让我们从一个有帮助的观察开始:如果我们将一个项目插入一个zip树中然后将其删除,那么我们应该以与我们开始时完全相同的树结尾.要了解为什么会出现这种情况,我们可以指向一个跳过列表.如果添加然后从跳过列表中删除某些内容,那么最终将获得与以前相同的跳过列表.因此,这意味着zip树的最终外观必须与添加和删除元素后的样子相同.

要了解如何执行此操作,我们需要执行两个步骤:

  1. 撤消解压缩操作,将形成的两个节点链转换回线性的节点链.
  2. 撤消蓝色边缘的折点,恢复x的插入点.

让我们从如何撤消解压缩操作开始.幸运的是,这还算不错.当我们很容易地将x插入zip树中时,我们可以识别通过解压缩操作创建的节点链-我们只需查看x的左子元素和右子元素,然后分别分别向左和向右移动即可.是的.

现在,我们知道这些节点以前是通过链链接在一起的.我们将它们重新组装成什么顺序?例如,看一下zip树的这一部分,我们要删除其中的53.突出显示了53左右两侧的链:

如果我们查看组成左右链的节点,则可以看到只有一种重新组装它们的方法.重组链的最高节点必须为67,因为它的等级为3,并且排名高于所有其他项目.之后,下一个节点必须为41,因为它是等级2元素中较小的元素,并且具有相同等级的元素在顶部具有较小的项.通过重复此过程,我们只需使用关于如何构造zip树的规则即可重建节点链,如下所示:

此操作将两个链条交错在一起,称为 zipping .

总结一下,这是删除的工作原理:

从zip树中删除节点x的工作方式如下:

  1. 在树中找到节点x.
  2. 对其左侧和右侧子树执行 zip .具体来说:

    1. 维护"lhs"和"rhs"指针,最初指向左侧和右侧子树.
    2. 虽然这两个指针都不为空:

      1. 如果lhs的等级比rhs高,则将lhs的合适孩子设为rhs,然后将lhs提升为lhs的合适孩子.
      2. 否则,让rhs的左孩子变成lhs,然后前进rhs指向以前是rhs的左孩子.

  3. 重新连接x的父级以指向zip操作的结果,而不是x.

更多探索内容

要点要点:我们看到了如何使用等级的概念将一个跳过列表表示为BST.这就产生了zip树,该树使用排名规则来确定父/子关系.这些规则是使用zip和unzip操作维护的,因此使用名称.

对zip列表进行完整的分析基本上是通过类似于跳过列表的推理来完成的.例如,通过指向等效的跳过列表并指出等效操作的时间复杂度为O(log n),我们可以证明插入或删除的预期运行时间为O(log n).而且我们可以类似地表明,这些不仅仅是预期的时限,而且是很有可能发生的预期时限.

存在一个问题,即如何实际存储维护zip树所需的信息.一种选择是简单地在节点本身中写下每个项目的等级.这样做是可行的,尽管由于几何随机变量的性质,秩不太可能超过O(log n),这会浪费很多空间.另一种选择是在节点地址上使用哈希函数,以生成某个范围内的随机,均匀分布的整数,然后找到最低有效的1位的位置来模拟抛硬币.由于计算哈希码的开销,这增加了插入和删除的成本,但同时也减少了空间使用.

压缩树并不是将跳转列表和BST映射在一起的第一个数据结构.迪恩(Dean)和琼斯(Jones)在2007年提出了这种想法的另一种形式.还有另一种利用这种联系的方法.在这里,我们从 randomized 跳过列表开始,并使用它来导出 randomized BST.但是我们也可以反过来运行-我们可以从确定性平衡的BST开始,并使用它来导出确定性跳过列表.Munro,Papadakis和Sedgewick找到了一种方法,可以连接2-3-4棵树和跳过列表.

zip树不是唯一的随机平衡BST.挖土是第一个执行此操作的结构,通过一点数学运算,您可以证明挖土的预期高度往往比低矮的树木低.但是,权衡是,与zip树相比,每个节点需要更多的随机位.

希望这会有所帮助!

I've heard of a new balanced BST data structure called a zip tree. What is the zip tree? How does it work?

解决方案

At a high level, a zip tree is a

  • randomized balanced binary search tree,
  • that is a way of encoding a skiplist as a BST, and
  • that uses a pair of operations called zipping and unzipping rather than tree rotations.

The first bullet point - that zip trees are randomized, balanced BSTs - gives a feel for what a zip tree achieves at a high level. It's a type of balanced binary search tree that, like treaps and unlike red/black trees, uses randomization to balance the tree. In that sense, a zip tree isn't guaranteed to be a balanced tree, but rather has a very high probability of being balanced.

The second bullet point - that zip trees are encodings of skiplists - shows where zip trees come from and why, intuitively, they're balanced. You can think of a zip tree as a way of taking the randomized skiplist data structure, which supports all major operations in expected time O(log n), and representing it as a binary search tree. This provides the intuition for where zip trees come from and why we'd expect them to be so fast.

The third bullet point - zip trees use zipping and unzipping rather than tree rotations - accounts for the name of the zip tree and what it feels like to code one up. Zip trees differ from other types of balanced trees (say, red/black trees or AVL trees) in that nodes are moved around the tree not through rotations, but through a pair of operations that convert a larger chain of nodes into two smaller chains or vice-versa.

The rest of this answer dives deeper into where zip trees come from, how they work, and how they're structured.

Review: Skip Lists

To understand where zip trees come from, let's begin with a review of another data structure, the skiplist. A skiplist is a data structure that, like a binary search tree, stores a collection of elements in sorted order. Skiplists, however, aren't tree structures. Rather, a skiplist works by storing elements in sorted order through several layers of linked lists. A sample skiplist is shown here:

As you can see, the elements are represented in sorted order. Each element has an associated height, and is part of a number of linked lists equal to its height. All of the elements of the skiplist participate in the bottom layer. Ideally, roughly half of the nodes will be in the layer above that, roughly a quarter of the nodes will be in the layer above that, roughly an eighth of the nodes will be in the layer above that, etc. (More on how this works later on.)

To do a lookup in a skiplist, we begin in the topmost layer. We walk forward in the skiplist until either (1) we find the element we're looking for, (2) we find an element bigger than the one we're looking for, or (3) we hit the end of the list. In the first case, we uncork the champagne and celebrate because we discovered the item we were searching for and there's nothing more to do. In the second case or third cases, we've "overshot" the element that we're looking for. But that's nothing to worry about - in fact, that's helpful because it means that what we're looking for must be between the node we hit that "overshot" and the node that comes before it. So we'll go to the previous node, drop down one layer, and pick up our search from there.

For example, here's how we'd do a search for 47:

Here, the blue edges indicate the links followed where we moved forward, and the red edges indicate where we overshot and decided to descend down a layer.

A powerful intuition for how skiplists work - which we'll need later on as we transition to zip trees - is that the topmost layer of the skiplist partitions the remaining elements of the skiplists into different ranges. You can see this here:

Intuitively, a skiplist search will be "fast" if we're able to skip looking at most of the elements. Imagine, for example, that the second-to-last layer of the skiplist only stores every other element of the skiplist. In that case, traversing the second-to-last layer is twice as fast as traversing the bottom layer, so we'd expect a lookup starting in the second-to-last layer to take half as much time as a lookup starting in the bottom layer. Similarly, imagine that the layer above that one only stores every other element from the layer below it. Then searching in that layer will take roughly half as much time as searching the layer below it. More generally, if each layer only stores roughly half the elements of the layer below it, then we could skip past huge amounts of the elements in the skiplist during a search, giving us good performance.

The skiplist accomplishes this by using the following rule: whenever we insert an element into the skiplist, we flip a coin until we get heads. We then set the height of the newly-inserted node to be the number of coins that we ended up tossing. This means it has a 50% chance to stay in its current layer and a 50% chance to move to the layer above it, which means, in aggregate, that roughly half the nodes will only be in the bottom layer, roughly half of what's left will be one layer above that, roughly half of what's left will be one layer above that, etc.

(For those of you with a math background, you could also say that the height of each node in the skiplist is a Geom(1/2) random variable.)

Here's an example of inserting 42 into the skiplist shown above, using a height of 1:

Deletion from a skiplist is also a fairly simple operation: we simply splice it out of whatever linked lists it happens to be in. That means that if we were to delete the 42 we just inserted from the above list, we'd end up with the same skiplist that we started with.

It can be shown that the expected cost of an insertion, deletion, or lookup in a skiplist is O(log n), based on the fact that the number of items in each list is roughly half the number of items in the one below it. (That means we'd expect to see O(log n) layers, and only take a constant number of steps in each layer.)

From Skiplists to Zip Trees

Now that we've reviewed skiplists, let's talk about where the zip tree comes from.

Let's imagine that you're looking at the skiplist data structure. You really like the expected O(log n) performance of each operation, and you like how conceptually simple it is. There's just one problem - you really don't like linked lists, and the idea of building something with layers upon layers of linked lists doesn't excite you. On the other hand, you really love binary search trees. They've got a really simple structure - each node has just two pointers leaving it, and there's a simple rule about where everything gets placed. This question then naturally arises: could you get all the benefits of a skiplist, except in BST form?

It turns out that there's a really nice way to do this. Let's imagine that you have the skiplist shown here:

Now, imagine you perform a lookup in this skiplist. How would that search work? Well, you'd always begin by scanning across the top layer of the skiplist, moving forward until you found a key that was bigger than the one you were looking for, or until you hit the end of the list and found that there were no more nodes at the top level. From there, you'd then "descend" one level into a sub-skiplist containing only the keys between the last node you visited and the one that overshot.

It's possible to model this exact same search as a BST traversal. Specifically, here's how we might represent the top layer of that skiplist as a BST:

Notice that all these nodes chain to the right, with the idea being that "scanning forward in the skiplist" corresponds to "visiting larger and larger keys." In a BST, moving from one node to a larger node corresponds to moving right, hence the chain of nodes to the right.

Now, each node in a BST can have up to two children, and in the picture shown above each node has either zero children or one child. If we fill in the missing children by marking what ranges they correspond to, we get this.

And hey, wait a minute! It sure looks like the BST is partitioning the space of keys the same way that the skiplist is. That's promising, since it suggests that we're on to something here. Plus, it gives us a way to fill in the rest of the tree: we can recursively convert the subranges of the skiplist into their own BSTs and glue the whole thing together. If we do that, we get this tree encoding the skiplist:

We now have a way of representing a skiplist as a binary search tree. Very cool!

Now, could we go the other way around? That is, could we go from a BST to a skiplist? In general, there's no one unique way to do this. After all, when we converted the skiplist to a BST, we did lose some information. Specifically, each node in the skiplist has an associated height, and while each node in our BST has a height as well it's not closely connected to the skiplist node heights. To address this, let's tag each BST node with the height of the skiplist node that it came from. This is shown here:

Now, some nice patterns emerge. For starters, notice that each node's associated number is bigger than its left child's number. That makes sense, since each step to the left corresponds to descending into a subrange of the skiplist, where nodes will have lower heights. Similarly, each node's associated number is greater than or equal to the number of its right child. And that again makes sense - moving to the right either means

  • continuing forward at the same level that we were already on, in which case the height remains the same, or
  • hitting the end of a range and descending into a subrange, in which case the height decreases.

Can we say more about the shape of the tree? Sure we can! For example, in a skiplist, each node's height is picked by flipping coins until we get heads, then counting how many total coins we flipped. (Or, as before, it's geometrically distributed with probability 1/2). So if we were to imagine building a BST that corresponded to a skiplist, we'd want the numbers assigned to the nodes to work out the same way.

Putting these three rules together, we get the following, which defines the shape of our tree, the zip tree!

A zip tree is a binary search tree where

  • Each node has an associated number called its rank. Ranks are assigned randomly to each node by flipping coins until heads is flipped, then counting how many total coins were tossed.
  • Each node's rank is strictly greater than its left child's rank.
  • Each node's rank is greater than or equal to its right child's rank.

It's amazing how something like a skiplist can be represented as a BST by writing out such simple rules!

Inserting Elements: Unzipping

Let's suppose you have a zip tree. How would you insert a new element into it?

We could in principle answer this question by looking purely at the rules given above, but I think it's a lot easier to figure this out by remembering that zip trees are skiplists in disguise. For example, here's the above zip tree, with its associated skiplist:

Now, suppose we want to insert 18 into this zip tree. To see how this might play out, imagine that we decide to give 18 a rank of 2. Rather than looking at the zip tree, let's look at what would happen if we did the insertion into the skiplist. That would give rise to this skiplist:

If we were to take this skiplist and encode it as a zip tree, we'd get the following result:

What's interesting about this is that we can see what the tree needs to look like after the insertion, even if we don't know how to perform the insertion. We can then try to figure out what the insertion logic needs to look like by reverse-engineering it from these "before" and "after" pictures.

Let's think about what change this insertion made to our zip tree. To begin with, let's think back to our intuition for how we encode skiplists as zip trees. Specifically, chains of nodes at the same level in a skiplist with no intervening "higher" elements map to chains of nodes in the zip tree that lean to the right. Inserting an element into the skiplist corresponds to adding some new element into one of the levels, which has the effect of (1) adding in something new into some level of the skiplist, and (2) taking chains of elements in the skiplist that previously were adjacent at some level, then breaking those connections.

For example, when we inserted 18 into the skiplist shown here, we added something new into the blue chain highlighted here, and we broke all of the red chains shown here:

What is that going to translate into in our zip tree? Well, we can highlight the blue link where our item was inserted here, as well as the red links that were cut:

Let's see if we can work out what's going on here. The blue link here is, fortunately, pretty easy to find. Imagine we do a regular BST insertion to add 18 into our tree. As we're doing so, we'll pause when we reach this point:

Notice that we've hit a key with the same rank as us. That means that, if we were to keep moving to the right, we'd trace out this region of the skiplist:

To find the blue edge - the place where we go - we just need to walk down through this chain of nodes until we find one bigger than us. The blue edge - our insertion point - is then given by the edge between that node and the one above it.

We can identify this location in a different way: we've found the blue edge - our insertion point - when we've reached a point where the node to insert (1) has a bigger rank than the node to the left, (2) has a rank that's greater than or equal to the node on the right, and (3) if the node to the right has the same rank, our new item to insert is less than the item to the right. The first two rules ensure that we're inserting into the right level of the skiplist, and the last rule ensures that we insert into the right place in that level of the skiplist.

Now, where are our red edges? Intuitively, these are the edges that were "cut" because 18 has been added into the skiplist. Those would be items that previously were between the two nodes on opposite ends of the blue edge, but which node need to get partitioned into the new ranges defined by the split version of that blue edge.

Fortunately, those edges appear in really nice places. Here's where they map to:

(In this picture, I've placed the new node 18 in the middle of the blue edge that we identified in the skiplist. This causes the result not to remain a BST, but we'll fix that in a minute.)

Notice that these are the exact same edges that we'd encounter if we were to finish doing our regular BST insertion - it's the path traced out by looking for 18! And something really nice happens here. Notice that

  • each time we move to the right, the node, when cut, goes to the right of 18, and
  • each time we move to the left, the node, when cut, goes to the left of 18.

In other words, once we find the blue edge where we get inserted, we keep walking as though we were doing our insertion as usual, keeping track of the nodes where we went left and the nodes where we went right. We can then chain together all the nodes where we went left and chain together all the nodes where we went right, gluing the results together under our new node. That's shown here:

This operation is called unzipping, and it's where we get the name "zip tree" from. The name kinda make sense - we're taking two interleaved structures (the left and right chains) and splitting them apart into two simpler linear chains.

To summarize:

Inserting x into a zip tree works as follows:

  1. Assign a random rank to x by flipping coins and counting how many flips were needed to get heads.
  2. Do a search for x. Stop the search once you reach a node where
    • the node's left child has a lower rank than x,
    • the node's right child has a rank less than or equal to x, and
    • the node's right child, if it has the same rank as x, has a larger key than x.
  3. Perform a unzip. Specifically:

    1. Continue the search for x as before, recording when we move left and when we move right.
    2. Chain all the nodes together where we went left by making each the left child of the previously-visited left-moving node.
    3. Chain all the nodes together where we went right by making each the right child of the previously-visited right-moving node.
    4. Make those two chains the children of the node x.

You might notice that this "unzipping" procedure is equivalent to what you'd get if you performed a different operation. You could achieve the same result by inserting x as usual, then using tree rotations to pull x higher and higher in the tree until it came to rest in the right place. This is a perfectly valid alternative strategy for doing insertions, though it's a bit slower because two passes over the tree are required (a top-down pass to insert at a leaf, then a bottom-up pass to do the rotations).

Removing Elements: Zipping

Now that we've seen how to insert elements, how do we remove them?

Let's begin with a helpful observation: if we insert an item into a zip tree and then remove it, we should end up with the exact same tree that we started with. To see why this is, we can point back to a skiplist. If you add and then remove something from a skiplist, then you end up with the same skiplist that you would have had before. So that means that the zip tree needs to end up looking identical to how it started after we add and then remove an element.

To see how to do this, we'd need to perform two steps:

  1. Undo the unzip operation, converting the two chains of nodes formed back into a linear chain of nodes.
  2. Undo the break of the blue edge, restoring the insertion point of x.

Let's begin with how to undo an unzip operation. This, fortunately, isn't too bad. We can identify the chains of nodes that we made with the unzip operation when we inserted x into the zip tree fairly easily - we simply look at the left and right children of x, then move, respectively, purely to the left and purely to the right.

Now, we know that these nodes used to be linked together in a chain. What order do we reassemble them into? As an example, take a look a this part of a zip tree, where we want to remove 53. The chains to the left and right of 53 are highlighted:

If we look at the nodes making up the left and right chains, we can see that there's only one way to reassemble them. The topmost node of the reassembled chain must be 67, since it has rank 3 and will outrank all other items. After that, the next node must be 41, because it's the smaller of the rank-2 elements and elements with the same rank have smaller items on top. By repeating this process, we can reconstruct the chain of nodes, as shown here, simply by using the rules for how zip trees have to be structured:

This operation, which interleaves two chains together into one, is called zipping.

To summarize, here's how a deletion works:

Deleting a node x from a zip tree works as follows:

  1. Find the node x in the tree.
  2. Perform a zip of its left and right subtrees. Specifically:

    1. Maintain "lhs" and "rhs" pointers, initially to the left and right subtrees.
    2. While both those pointers aren't null:

      1. If lhs has a higher rank than rhs, make lhs's right child rhs, then advance lhs to what used to be lhs's right child.
      2. Otherwise, make rhs's left child lhs, then advance rhs to point to what used to be rhs's left child.

  3. Rewire x's parent to point to the result of the zip operation rather than x.

More to Explore

To recap our main points: we saw how to represent a skiplist as a BST by using the idea of ranks. That gave rise to the zip tree, which uses ranking rules to determine parent/child relationships. Those rules are maintained using the zip and unzip operations, hence the name.

Doing a full analysis of a zip list is basically done by reasoning by analogy to a skiplist. We can show, for example, that the expected runtime of an insertion or deletion is O(log n) by pointing at the equivalent skiplist and noting that the time complexity of the equivalent operations there are O(log n). And we can similary show that these aren't just expected time bounds, but expected time bounds with a high probability of occurring.

There's a question of how to actually store the information needed to maintain a zip tree. One option would be to simply write the rank of each item down in the nodes themselves. That works, though since ranks are very unlikely to exceed O(log n) due to the nature of geometric random variables, that would waste a lot of space. Another alternative would be to use a hash function on node addresses to generate a random, uniformly-distributed integer in some range, then find the position of the most least-significant 1 bit to simulate our coin tosses. That increases the costs of insertions and deletions due to the overhead of computing the hash codes, but also decreases the space usage.

Zip trees aren't the first data structure to map skiplists and BSTs together. Dean and Jones developed an alternative presentation of this idea in 2007. There's also another way to exploit this connection. Here, we started with a randomized skiplist, and used it to derive a randomized BST. But we can run this in reverse as well - we can start with a deterministic balanced BST and use that to derive a deterministic skiplist. Munro, Papadakis, and Sedgewick found a way to do this by connecting 2-3-4 trees and skiplists.

And zip trees aren't the only randomized balanced BST. The treap was the first structure to do this, and with a little math you can show that treaps tend to have slightly lower expected heights than zip trees. The tradeoff, though, is that you need more random bits per node than in a zip tree.

Hope this helps!

这篇关于什么是压缩树,它如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆