将平面表解析为树的最有效/最优雅的方法是什么? [英] What is the most efficient/elegant way to parse a flat table into a tree?

查看:35
本文介绍了将平面表解析为树的最有效/最优雅的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个存储有序树层次结构的平面表:

Id Name ParentId Order1 '节点 1' 0 102 '节点 1.1' 1 103 '节点 2' 0 204 '节点 1.1.1' 2 105 '节点 2.1' 3 106 '节点 1.2' 1 20

这是一个图表,其中我们有 [id] Name.根节点 0 是虚构的.

<前>[0] 根/[1] 节点 1 [3] 节点 2/ [2] 节点 1.1 [6] 节点 1.2 [5] 节点 2.1/[4] 节点 1.1.1

您将使用什么简约方法将其作为正确排序、正确缩进的树输出到 HTML(或文本,就此而言)?

进一步假设你只有基本的数据结构(数组和哈希图),没有带有父/子引用的花哨对象,没有 ORM,没有框架,只有你的两只手.该表表示为一个结果集,可以随机访问.

伪代码或者简单的英文都可以,这纯粹是一个概念问题.

额外的问题:有没有一种从根本上更好的方法来在 RDBMS 中存储这样的树结构?

<小时>

编辑和补充

回答一位评论者(Mark Bessey 的)问题:根节点不是必需的,因为它无论如何都不会显示.ParentId = 0 是表达这些是顶级"的约定.Order 列定义了如何对具有相同父节点的节点进行排序.

我所说的结果集"可以被描绘成一个哈希映射数组(保持在那个术语中).对于我的例子,本来就已经存在了.有些答案会加倍努力并先构建它,但这没关系.

树可以任意深.每个节点可以有 N 个子节点.不过,我并没有完全想到数百万个条目"树.

不要误认为我选择的节点命名('Node 1.1.1')是依赖的东西.节点同样可以称为Frank"或Bob",没有暗示命名结构,这只是为了使其可读.

我已经发布了我自己的解决方案,所以你们可以把它拆成碎片.

解决方案

现在 MySQL 8.0 支持递归查询,可以说所有流行的SQL数据库都支持递归以标准语法查询.

带有递归 MyTree AS (SELECT * FROM MyTable WHERE ParentId 为 NULL联合所有SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id)从 MyTree 中选择 *;

我在我的演示文稿 Recursive Query Throwdown 中测试了 MySQL 8.0 中的递归查询2017.

以下是我 2008 年的原始回答:

<小时>

有多种方法可以在关系数据库中存储树结构数据.您在示例中显示的内容使用两种方法:

  • 邻接表(父"列)和
  • 路径枚举(您姓名栏中的虚线数字).

另一种解决方案称为嵌套集,它也可以存储在同一个表中.阅读适用于 Smarties 的 SQL 中的树和层次结构"Joe Celko 了解有关这些设计的更多信息.

我通常更喜欢一种称为闭包表(又名邻接关系")的设计,用于存储树结构数据.它需要另一个表,但查询树很容易.

我在演示文稿中介绍了闭包表使用 SQL 和 PHP 的分层数据模型 和我的书 SQL 反模式:避免数据库编程的陷阱.>

CREATE TABLE ClosureTable (祖先 ID INT NOT NULL REFERENCES FlatTable(id),后代 ID INT NOT NULL REFERENCES FlatTable(id),主键(ancestor_id,descendant_id));

将所有路径存储在闭包表中,其中存在从一个节点到另一个节点的直接祖先.为每个节点包含一行以引用自身.例如,使用您在问题中显示的数据集:

INSERT INTO ClosureTable (ancestor_id,descendant_id) VALUES(1,1), (1,2), (1,4), (1,6),(2,2), (2,4),(3,3), (3,5),(4,4),(5,5),(6,6);

现在你可以像这样从节点 1 开始得到一棵树:

SELECT f.*FROM FlatTable fJOIN ClosureTable a ON (f.id = a.descendant_id)WHERE a.ancestor_id = 1;

输出(在 MySQL 客户端中)如下所示:

+----+|身份证 |+----+|1 ||2 ||4 ||6 |+----+

换句话说,节点 3 和 5 被排除在外,因为它们是单独层次结构的一部分,而不是从节点 1 下降.

<小时>

回复:来自 e-satis 的关于直系子女(或直系父母)的评论.您可以向 ClosureTable 添加path_length"列,以便更轻松地专门查询直系子代或父代(或任何其他距离).

INSERT INTO ClosureTable (ancestor_id,descendant_id, path_length) VALUES(1,1,0), (1,2,1), (1,4,2), (1,6,1),(2,2,0), (2,4,1),(3,3,0), (3,5,1),(4,4,0),(5,5,0),(6,6,0);

然后您可以在搜索中添加一个术语来查询给定节点的直接子节点.这些是 path_length 为 1 的后代.

SELECT f.*FROM FlatTable fJOIN ClosureTable a ON (f.id = a.descendant_id)WHERE a.ancestor_id = 1AND path_length = 1;+----+|身份证 |+----+|2 ||6 |+----+

<小时>

来自@ashraf 的重新评论:[按名称] 对整棵树进行排序怎么样?"

这是一个示例查询,用于返回节点 1 的所有后代节点,将它们连接到包含其他节点属性(例如 name)的 FlatTable,并按名称排序.

SELECT f.nameFROM FlatTable fJOIN ClosureTable a ON (f.id = a.descendant_id)WHERE a.ancestor_id = 1按 f.name 排序;

<小时>

来自@Nate 的重新评论:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS 面包屑FROM FlatTable fJOIN ClosureTable a ON (f.id = a.descendant_id)JOIN ClosureTable b ON (b.descendant_id = a.descendant_id)WHERE a.ancestor_id = 1GROUP BY a.descendant_id按 f.name 排序+------------+------------+|姓名 |面包屑|+------------+------------+|节点 1 |1 ||节点 1.1 |1,2 ||节点 1.1.1 |1,2,4 ||节点 1.2 |1,6 |+------------+------------+

<小时>

今天有用户提出了修改建议.SO 版主批准了编辑,但我正在撤消它.

编辑建议上面最后一个查询中的 ORDER BY 应该是 ORDER BY b.path_length, f.name,大概是为了确保排序与层次结构相匹配.但这不起作用,因为它会在Node 1.2"之后订购Node 1.1.1".

如果您希望排序以合理的方式与层次结构匹配,这是可能的,但不能简单地按路径长度排序.例如,请参阅我对 MySQL Closure Table 分层数据库 - 如何以正确的顺序拉取信息.

Assume you have a flat table that stores an ordered tree hierarchy:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

Here's a diagram, where we have [id] Name. Root node 0 is fictional.

                       [0] ROOT
                          /     
              [1] Node 1          [3] Node 2
              /                          
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?

Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.

Pseudo code or plain English is okay, this is purely a conceptional question.

Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?


EDITS AND ADDITIONS

To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.

The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.

The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.

Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.

I have posted my own solution so you guys can pull it to pieces.

解决方案

Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdown in 2017.

Below is my original answer from 2008:


There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

Now you can get a tree starting at node 1 like this:

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

The output (in MySQL client) looks like the following:

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.


Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+


Re comment from @ashraf: "How about sorting the whole tree [by name]?"

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;


Re comment from @Nate:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+


A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.

这篇关于将平面表解析为树的最有效/最优雅的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆