在关系数据库中存储分层数据的选项有哪些? [英] What are the options for storing hierarchical data in a relational database?

查看:38
本文介绍了在关系数据库中存储分层数据的选项有哪些?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

良好的概述

一般来说,您是在快速读取时间(例如,嵌套集)或快速写入时间(邻接列表)之间做出决定.通常,您最终会得到最适合您需求的以下选项的组合.以下提供了一些深入阅读:

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

  • One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set, and Nested Interval I've found.
  • Models for hierarchical data: slides with good explanations of tradeoffs and example usage
  • Representing hierarchies in MySQL: very good overview of Nested Set in particular
  • Hierarchical data in RDBMSs: a most comprehensive and well-organized set of links I've seen, but not much in the way of explanation

选项

我知道的和一般特征:

  1. 邻接列表:

  • 列:ID、ParentID
  • 易于实施.
  • 廉价的节点移动、插入和删除.
  • 很难找到等级、血统和后代,路径
  • 通过支持它们的数据库中的通用表表达式避免 N+1
    • Columns: ID, ParentID
    • Easy to implement.
    • Cheap node moves, inserts, and deletes.
    • Expensive to find the level, ancestry & descendants, path
    • Avoid N+1 via Common Table Expressions in databases that support them
      1. 嵌套集(又名 修改前序树遍历)

      • 列:左、右
      • 廉价的祖先,后代
      • 非常昂贵的 O(n/2) 由于易失编码而导致移动、插入、删除
        • Columns: Left, Right
        • Cheap ancestry, descendants
        • Very expensive O(n/2) moves, inserts, deletes due to volatile encoding
          1. 桥接表(又名 关闭表/w 触发器)

          • 使用带有祖先、后代、深度的单独连接表(可选)
          • 廉价的祖先和后代
          • 插入、更新、删除的写入成本 O(log n)(子树的大小)
          • 规范化编码:适用于 RDBMS 统计和联接中的查询规划器
          • 每个节点需要多行
            1. 血统专栏(又名 物化路径,路径枚举)
            1. Lineage Column (a.k.a. Materialized Path, Path Enumeration)

            • 列:血统(例如/parent/child/grandchild/etc...)
            • 通过前缀查询的廉价后代(例如 LEFT(lineage, #) = '/enumerated/path')
            • 插入、更新、删除的写入成本 O(log n)(子树的大小)
            • 非关系型:依赖于 Array 数据类型或序列化字符串格式
              1. 嵌套区间

              • 类似于嵌套集,但具有实数/浮点数/十进制,因此编码不会易失(廉价的移动/插入/删除)
              • 存在实数/浮点数/十进制表示/精度问题
              • 矩阵编码变体为 "免费",但增加了线性代数的复杂性.
                • Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
                • Has real/float/decimal representation/precision issues
                • Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with the added trickiness of linear algebra.
                  1. 平表

                  • 修改后的邻接列表,为每条记录添加级别和排名(例如排序)列.
                  • 迭代/分页成本低
                  • 昂贵的移动和删除
                  • 良好用途:线程讨论 - 论坛/博客评论
                    1. 多个沿袭列

                    • 列:每个血统级别一个,指的是所有父级到根,从项目级别向下的级别设置为NULL
                    • 廉价的祖先、后代、等级
                    • 便宜的插入、删除、移动叶子
                    • 昂贵的内部节点插入、删除、移动
                    • 层次结构深度的硬性限制
                    • 数据库特定说明

                      MySQL

                      甲骨文

                      • Use CONNECT BY to traverse Adjacency Lists

                      PostgreSQL

                      • ltree datatype for Materialized Path

                      SQL 服务器

                      • 一般摘要
                      • 2008 提供了似乎有助于谱系的 HierarchyId 数据类型列接近并扩大可以表示的深度.
                      • General summary
                      • 2008 offers HierarchyId data type that appears to help with the Lineage Column approach and expand the depth that can be represented.

                      推荐答案

                      我最喜欢的答案是这个帖子的第一句话所建议的.使用邻接表维护层次结构,使用嵌套集查询层次结构.

                      My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

                      到目前为止的问题是从邻接列表到嵌套集的覆盖方法非常缓慢,因为大多数人使用称为推栈"的极端 RBAR 方法进行转换,并且被认为是通过邻接表维护简单性和嵌套集的出色性能达到涅槃的方法是昂贵的.结果,大多数人最终不得不满足于一个或另一个,特别是如果有超过,比如说,糟糕的 100,000 个左右的节点.使用推送堆栈方法可能需要一整天的时间来转换传销者认为的百万级节点层次结构.

                      The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

                      我想通过想出一种方法以似乎不可能的速度将邻接列表转换为嵌套集,我想给 Celko 带来一些竞争.这是我的 i5 笔记本电脑上推送堆栈方法的性能.

                      I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

                      Duration for     1,000 Nodes = 00:00:00:870 
                      Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
                      Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
                      Duration for 1,000,000 Nodes = 'Didn't even try this'
                      

                      这是新方法的持续时间(括号中是推送堆栈方法).

                      And here's the duration for the new method (with the push stack method in parenthesis).

                      Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
                      Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
                      Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
                      Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)
                      

                      是的,没错.100 万个节点在不到 1 分钟的时间内完成转换,100,000 个节点在 4 秒内完成.

                      Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

                      您可以在以下 URL 阅读有关新方法的信息并获取代码副本.http://www.sqlservercentral.com/articles/Hierarchy/94040/

                      You can read about the new method and get a copy of the code at the following URL. http://www.sqlservercentral.com/articles/Hierarchy/94040/

                      我还使用类似的方法开发了预聚合"层次结构.传销者和制作物料清单的人会对本文特别感兴趣.http://www.sqlservercentral.com/articles/T-SQL/94570/

                      I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article. http://www.sqlservercentral.com/articles/T-SQL/94570/

                      如果您确实停下来看看任何一篇文章,请跳转到加入讨论"链接,让我知道您的想法.

                      If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.

                      这篇关于在关系数据库中存储分层数据的选项有哪些?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆