完善修改preorder树遍历算法的可扩展性 [英] Improving scalability of the modified preorder tree traversal algorithm

查看:120
本文介绍了完善修改preorder树遍历算法的可扩展性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在思考href="http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/">修改preorder树的遍历的算法存储一个平面表内的树木(如SQL)。

一个性质我不喜欢的标准方法是插入一个节点, 有触摸(平均)N / 2个节点(一切与左或右高于插入点)。

我见过的实现依赖于按顺序编号的值。这使得没有空间的更新。

这似乎是坏的并发性和缩放。想象一下,你有一个植根于包含用户组在一个大系统中的每个帐户的世界树,这是非常大的,到时候,你必须存储在不同服务器上的树子集。所有节点的感人半添加一个节点到树的底部是坏的。

下面是这个想法我正在考虑。基本上通过分割密钥空间,并在每个级别划分留有余地插入。

下面是用N <子>最大 = 64(这通常是你的数据库的MAX_INT)的一个例子

  0:64
              ________ | ________
             / \
         1:31 32:63
        / \ / \
    2:14 15-30 33:47 48:62
 

下面,将节点添加到树的左半

  0:64
              ________ | ________
             / \
         1:31 32:63
      / | \ / \
  2:11 11:20 21:30 33:47 48:62
 

该算法FFT必须延伸的插入和移除过程递归重新编号为子树的左/右索引。由于在查询某个节点的直接子是复杂的,我觉得很有道理也存储父ID在表中。然后,该算法可以选择子树(左使用> p.left和放大器;&安培;右LT; p.right),然后使用node.id和node.parent工作,通过列表,细分指标

这是不是只是递增各项指标,以腾出空间插入(或减少拆卸)更为复杂,但它有可能影响少得多节点的潜力(仅decendenants插入/删除节点的父)。

我的问题(S)基本上是:

  1. 的这个想法得到正式或实施?

  2. 这是一样区间套?

解决方案

我听说有人在此之前,出于同样的原因,是的。

请注意,你做这个输在一对夫妇的算法小的优点

  • 在通常情况下,你可以告诉一个节点的后代的数量((右 - 左+ 1)2区)。这有时是有用的,如果如你会显示在一个TreeView,其中应包括儿童的数量计数将进一步回落树
  • 发现
  • 从上面的流动,可以很容易地选择所有的叶节点 - WHERE(右=左+ 1)

这些都是相当轻微的优势,未必对你有用无论如何,虽然对于一些使用模式,它们显然得心应手。

这就是说,它听起来像物化路径可能更对你有用,按照上面的建议。

I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).

One property I dislike about the standard approach is that to insert a node you have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).

The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.

This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.

Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.

Here's an example with Nmax = 64 (this would normally be the MAX_INT of your DB)

                     0:64
              ________|________
             /                 \
         1:31                   32:63
        /    \                 /     \
    2:14    15-30         33:47       48:62

Here, a node is added to the left half of the tree.

                     0:64  
              ________|________
             /                 \
         1:31                  32:63
      /   |   \               /     \
  2:11  11:20  21:30     33:47       48:62

The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.

This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).

My question(s) are basically:

  1. Has this idea been formalized or implemented?

  2. Is this the same as nested intervals?

解决方案

I have heard of people doing this before, for the same reasons, yes.

Note that you do lose at a couple of small advantages of the algorithm by doing this

  • normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
  • Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).

These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.

That said, it does sound like materialized paths may be more useful to you, as suggested above.

这篇关于完善修改preorder树遍历算法的可扩展性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆