如何使用递归查询向后遍历层次结构树结构 [英] How to traverse a hierarchical tree-structure structure backwards using recursive queries

本文介绍了如何使用递归查询向后遍历层次结构树结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PostgreSQL 9.1来查询树的分层结构数据,该数据由与节点连接的边(或元素)组成。数据实际上是用于流网络的,但我已将问题抽象为简单的数据类型。考虑示例 tree 表。每个边缘都有长度和面积属性,用于确定网络中的一些有用指标。

 创建温度表树(
边缘文本PRIMARY KEY,
from_node整数唯一不为空,-也可以用作PK
to_node整数参考树(from_node),
模式字符变化(5),-冗余,但说明性的
长度数字NOT NULL,
区域数字NOT NULL,
fwd_path text [],-可选的有序序列,对调试
fwd_search_depth整数很有用,
fwd_length数字,
rev_path text [],-可选的无序集,对调试
rev_search_depth整数,
rev_length数字,
rev_area数字
很有用。
CREATE INDEX ON树(to_node);
插入树(边缘,from_node,to_node,模式,长度,面积)值
('A',1,4,'start',1.1,0.9),
('B ',2,4,'开始',1.2,1.3),
('C',3,5,'开始',1.8,2.4),
('D',4,5, NULL,1.2,1.3),
('E',5,NULL,'end',1.1,0.9);

下面可以说明,其中A-E表示的边与节点1-5连接。 NULL to_node (Ø)表示结束节点。 from_node 始终是唯一的,因此可以充当PK。如果该网络像的文档提供了一个很好的示例,说明了如何在递归查询中使用搜索图。因此,要获取正向信息,查询将从末尾开始,然后向后工作:

 使用递归search_graph AS( 
-从结束节点
开始选择E.from_node,1 AS search_depth,E.length
,ARRAY [E.edge] AS路径-可选
从树E位置E.to_node是NULL

UNION ALL
-累积每个边,向后(上游)工作
SELECT o.from_node,sg.search_depth + 1,sg.length + o .length
,o.edge || sg.path-可选
FROM tree o,search_graph sg
W.E.to_node = sg.from_node

更新树SET
fwd_path = sg.path,
fwd_search_depth = sg.search_depth,
fwd_length = sg.length
FROM search_graph AS sg在sg.from_node = tree.from_node;

选择边,from_node,to_node,fwd_path,fwd_search_depth,fwd_length
FROM tree ORDER BY边;

边| from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------ + ----------- + --------- + ---------- + ----- ------------- + ------------
A | 1 | 4 | {A,D,E} | 3 | 3.4
B | 2 | 4 | {B,D,E} | 3 | 3.5
C | 3 | 5 | {C,E} | 2 | 2.9
D | 4 | 5 | {D,E} | 2 | 2.3
E | 5 | | {E} | 1 | 1.1

以上说明很有意义,并且可以很好地扩展到大型网络。例如,我可以看到边缘 B 是末端的3个边缘,前进路径是 {B,D,E} 从头到尾的总长度为3.5。



但是,我无法找到构建反向查询的好方法。也就是说,从每个边缘开始,累积的上游边缘,长度和面积是多少。使用有递归,我最好的是:

 有递归search_graph AS(
-从起始节点
开始选择S.from_node,S.to_node,1 AS search_depth,S.length,S.area
,ARRAY [S.edge] AS路径- -可选的
FROM树位于from_node IN(
-起始节点具有一个from_node没有任何to_node
SELECT from_node FROM树,除了选择to_node FROM树之外)

UNION全部
-累积边缘,向前移动
SELECT c.from_node,c.to_node,sg.search_depth + 1,sg.length + c.length,sg.area + c.area
,c.edge || sg.path-可选
从树c,search_graph sg
WHERE c.from_node = sg.to_node

更新树SET
rev_path = sg.path,
rev_search_depth = sg.search_depth,
rev_length = sg.length,
rev_area = sg.area
FROM search_graph AS sg在sg.from_node =树。 from_node;

选择边,from_node,to_node,rev_path,rev_search_depth,rev_length,rev_area
FROM tree ORDER BY边;

边| from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------ + ----------- + --------- + ---------- + ----- ------------- + ------------ + ----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2
E | 5 | | {E,C} | 2 | 2.9 | 3.3

我想在递归查询的第二项中建立聚合,因为每个下游边都连接到1个或多个上游边缘,但是集合不适用于递归查询。另外,我知道联接是草率的,因为有递归结果对 edge 有多个联接条件。



反向/向后查询的预期结果是:

 边缘from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area 
------ + ----------- + --------- + ------------- +- ---------------- + ------------ + ----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {A,B,D} | 3 | 3.5 | 3.5
E | 5 | | {A,B,C,D,E} | 5 | 6.4 | 6.8

如何编写此更新查询?



请注意,我最终更关心累积准确的长度和面积总计,并且路径属性用于调试。在我的实际情况中,前进路径最多可达数百条,我希望大型和复杂集水区的前进路径数以万计。

解决方案

更新2:
我重写了原始的递归查询,以便所有累加/聚合都在递归部分之外完成。它应该比此答案的先前版本更好。
这与@a_horse_with_no_name的答案非常相似。

 
递归search_graph(edge,from_node,to_node,length,area,start_node)AS

SELECT edge, from_node,to_node,长度,区域,from_node AS start_node
FROM tree
UNION ALL
选择o.edge,o.from_node,o.to_node,o.length,o.area, p.start_node
从树o
JOIN search_graph p ON p.from_node = o.to_node

选择array_agg(edge)AS edges
-, array_agg(from_node)AS节点
,count(edge)AS edge_count
,sum(length)AS length_sum
,sum(area)AS area_sum
FROM search_graph
GROUP BY start_node
ORDER BY start_node
;

结果符合预期:

  start_node |边缘edge_count | length_sum | area_sum 
------------ + ------------- + ------------ + ----- ------- + ------------
1 | {A} | 1 | 1.1 | 0.9
2 | {B} | 1 | 1.2 | 1.3
3 | {C} | 1 | 1.8 | 2.4
4 | {D,B,A} | 3 | 3.5 | 3.5
5 | {E,D,C,B,A} | 5 | 6.4 | 6.8


I'm using PostgreSQL 9.1 to query hierarchical tree-structured data, consisting of edges (or elements) with connections to nodes. The data are actually for stream networks, but I've abstracted the problem to simple data types. Consider the example tree table. Each edge has length and area attributes, which are used to determine some useful metrics from the network.

CREATE TEMP TABLE tree (
  edge text PRIMARY KEY,
  from_node integer UNIQUE NOT NULL, -- can also act as PK
  to_node integer REFERENCES tree (from_node),
  mode character varying(5), -- redundant, but illustrative
  length numeric NOT NULL,
  area numeric NOT NULL,
  fwd_path text[], -- optional ordered sequence, useful for debugging
  fwd_search_depth integer,
  fwd_length numeric,
  rev_path text[], -- optional unordered set, useful for debugging
  rev_search_depth integer,
  rev_length numeric,
  rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
 ('A', 1, 4, 'start', 1.1, 0.9),
 ('B', 2, 4, 'start', 1.2, 1.3),
 ('C', 3, 5, 'start', 1.8, 2.4),
 ('D', 4, 5, NULL, 1.2, 1.3),
 ('E', 5, NULL, 'end', 1.1, 0.9);

Which can be illustrated below, where the edges represented by A-E are connected with nodes 1-5. The NULL to_node (Ø) represents the end node. The from_node is always unique, so it can act as PK. If this network flows like a drainage basin, it would be from top to bottom, where the starting tributary edges are A, B, C and the ending outflow edge is E.

The documentation for WITH provide a nice example of how to use search graphs in recursive queries. So, to get the "forwards" information, the query starts at the end, and works backwards:

WITH RECURSIVE search_graph AS (
  -- Begin at ending nodes
  SELECT E.from_node, 1 AS search_depth, E.length
    , ARRAY[E.edge] AS path -- optional
  FROM tree E WHERE E.to_node IS NULL

  UNION ALL
  -- Accumulate each edge, working backwards (upstream)
  SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
    , o.edge|| sg.path -- optional
  FROM tree o, search_graph sg
  WHERE o.to_node = sg.from_node
)
UPDATE tree SET
  fwd_path = sg.path,
  fwd_search_depth = sg.search_depth,
  fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;

SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;

 edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
 A    |         1 |       4 | {A,D,E}  |                3 |        3.4
 B    |         2 |       4 | {B,D,E}  |                3 |        3.5
 C    |         3 |       5 | {C,E}    |                2 |        2.9
 D    |         4 |       5 | {D,E}    |                2 |        2.3
 E    |         5 |         | {E}      |                1 |        1.1

The above makes sense, and scales well for large networks. For example, I can see edge B is 3 edges from the end, and the forward path is {B,D,E} with a total length of 3.5 from the tip to the end.

However, I cannot figure out a good way to build a reverse query. That is, from each edge, what are the accumulated "upstream" edges, lengths and areas. Using WITH RECURSIVE, the best I have is:

WITH RECURSIVE search_graph AS (
  -- Begin at starting nodes
  SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
    , ARRAY[S.edge] AS path -- optional
  FROM tree S WHERE from_node IN (
    -- Starting nodes have a from_node without any to_node
    SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)

  UNION ALL
  -- Accumulate edges, working forwards
  SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
         , c.edge || sg.path -- optional
  FROM tree c, search_graph sg
  WHERE c.from_node = sg.to_node
)
UPDATE tree SET
  rev_path = sg.path,
  rev_search_depth = sg.search_depth,
  rev_length = sg.length,
  rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;

SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;

 edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
 A    |         1 |       4 | {A}      |                1 |        1.1 |      0.9
 B    |         2 |       4 | {B}      |                1 |        1.2 |      1.3
 C    |         3 |       5 | {C}      |                1 |        1.8 |      2.4
 D    |         4 |       5 | {D,A}    |                2 |        2.3 |      2.2
 E    |         5 |         | {E,C}    |                2 |        2.9 |      3.3

I would like to build aggregates into the second term of the recursive query, since each downstream edge connects to 1 or many upstream edges, but aggregates are not allowed with recursive queries. Also, I'm aware that the join is sloppy, since the with recursive result has multiple join conditions for edge.

The expected result for the reverse / backwards query is:

 edge | from_node | to_node |  rev_path   | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
 A    |         1 |       4 | {A}         |                1 |        1.1 |      0.9
 B    |         2 |       4 | {B}         |                1 |        1.2 |      1.3
 C    |         3 |       5 | {C}         |                1 |        1.8 |      2.4
 D    |         4 |       5 | {A,B,D}     |                3 |        3.5 |      3.5
 E    |         5 |         | {A,B,C,D,E} |                5 |        6.4 |      6.8

How can I write this update query?

Note that I'm ultimately more concerned about accumulating accurate length and area totals, and that the path attributes are for debugging. In my real-world case, forwards paths are up to a couple hundred, and I expect reverse paths in the tens of thousands for large and complex catchments.

解决方案

UPDATE 2: I rewrote the original recursive query so that all accumulation/aggregation is done outside the recursive part. It should perform better than the previous version of this answer. This is very much alike the answer from @a_horse_with_no_name for a similar question.

  WITH 
    RECURSIVE search_graph(edge, from_node, to_node, length, area, start_node) AS
    (
        SELECT edge, from_node, to_node, length, area, from_node AS "start_node"
        FROM tree
        UNION ALL
        SELECT o.edge, o.from_node, o.to_node, o.length, o.area, p.start_node
        FROM tree o
    JOIN search_graph p ON p.from_node = o.to_node
    )
    SELECT array_agg(edge) AS "edges"
       -- ,array_agg(from_node) AS "nodes"
          ,count(edge) AS "edge_count"
          ,sum(length) AS "length_sum"
          ,sum(area) AS "area_sum"
    FROM search_graph
    GROUP BY start_node
    ORDER BY start_node
;

Results are as expected:

 start_node | edges       | edge_count | length_sum |  area_sum
------------+-------------+------------+------------+------------
  1         | {A}         |          1 |        1.1 |       0.9
  2         | {B}         |          1 |        1.2 |       1.3
  3         | {C}         |          1 |        1.8 |       2.4
  4         | {D,B,A}     |          3 |        3.5 |       3.5
  5         | {E,D,C,B,A} |          5 |        6.4 |       6.8

这篇关于如何使用递归查询向后遍历层次结构树结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆