使用递归查询访问有向图,就好像它是无向图一样 [英] Visiting a directed graph as if it were an undirected one, using a recursive query

查看:20
本文介绍了使用递归查询访问有向图,就好像它是无向图一样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于访问存储在数据库中的有向图,我需要你的帮助.

I need your help about the visit of a directed graph stored in a database.

考虑下面的有向图

1->2 
2->1,3 
3->1

一个表存储这些关系:

create database test;
c test;

create table ownership (
    parent bigint,
    child  bigint,
    primary key (parent, child)
);

insert into ownership (parent, child) values (1, 2);
insert into ownership (parent, child) values (2, 1);
insert into ownership (parent, child) values (2, 3);
insert into ownership (parent, child) values (3, 1);

我想提取从节点可达的图的所有半连接边(即忽略方向的连接边).即,如果我从 parent=1 开始,我希望有以下输出

I'd like to extract all the semi-connected edges (i.e. the connected edges ignoring the direction) of the graph reachable from a node. I.e., if I start from parent=1, I'd like to have the following output

1,2
2,1
2,3
3,1

我正在使用 postgresql.

我修改了 Postgres 手册上的示例,它解释了递归查询,并且我已经调整了连接条件以向上"和向下"(这样做我忽略了方向).我的查询如下:

I've modified the example on Postgres' manual which explains recursive queries, and I've adapted the join condition to go "up" and "down" (doing so I ignore the directions). My query is the following one:

c test

WITH RECURSIVE graph(parent, child, path, depth, cycle) AS (
SELECT o.parent, o.child, ARRAY[ROW(o.parent, o.child)], 0, false
    from ownership o
    where o.parent = 1
UNION ALL
SELECT 
    o.parent, o.child,
    path||ROW(o.parent, o.child), 
    depth+1, 
    ROW(o.parent, o.child) = ANY(path)
    from 
        ownership o, graph g
    where 
        (g.parent = o.child or g.child = o.parent) 
        and not cycle

)
select  g.parent, g.child, g.path, g.cycle
from
    graph g

其输出如下:

 parent | child |               path                | cycle 
--------+-------+-----------------------------------+-------
      1 |     2 | {"(1,2)"}                         | f
      2 |     1 | {"(1,2)","(2,1)"}                 | f
      2 |     3 | {"(1,2)","(2,3)"}                 | f
      3 |     1 | {"(1,2)","(3,1)"}                 | f
      1 |     2 | {"(1,2)","(2,1)","(1,2)"}         | t
      1 |     2 | {"(1,2)","(2,3)","(1,2)"}         | t
      3 |     1 | {"(1,2)","(2,3)","(3,1)"}         | f
      1 |     2 | {"(1,2)","(3,1)","(1,2)"}         | t
      2 |     3 | {"(1,2)","(3,1)","(2,3)"}         | f
      1 |     2 | {"(1,2)","(2,3)","(3,1)","(1,2)"} | t
      2 |     3 | {"(1,2)","(2,3)","(3,1)","(2,3)"} | t
      1 |     2 | {"(1,2)","(3,1)","(2,3)","(1,2)"} | t
      3 |     1 | {"(1,2)","(3,1)","(2,3)","(3,1)"} | t
(13 rows)

我有一个问题:查询多次提取相同的边,因为它们是通过不同的路径到达的,我想避免这种情况.如果我将外部查询修改为

I have a problem: the query extracts the same edges many times, as they are reached through different paths, and I'd like to avoid this. If I modify the outer query into

select  distinct g.parent, g.child from graph

我得到了想要的结果,但是 WITH 查询仍然效率低下,因为不需要的连接已经完成.那么,是否有一种解决方案可以在不使用distinct 的情况下从给定的数据库中提取图的可达边?

I have the desired result, but inefficiencies remain in the WITH query, as unneeded joins are done. So, is there a solution to extract the reachable edges of a graph in a db, starting from a given one, without using distinct?

我还有另一个问题(这个问题解决了,看底部):正如你从输出中看到的,循环只有在第二次到达节点时才会停止.IE.我有 (1,2) (2,3) (1,2).我想在再次循环最后一个节点之前停止循环,即具有 (1,2) (2,3).我尝试修改 where 条件如下

I also have another problem (this one is solved, look at the bottom): as you can see from the output, cycles stop only when a node is reached for the second time. I.e. I have (1,2) (2,3) (1,2). I'd like to stop the cycle before cycling over that last node again, i.e. having (1,2) (2,3). I've tried to modify the where condition as follows

where
    (g.parent = o.child or g.child = o.parent) 
    and (ROW(o.parent, o.child) <> any(path))
    and not cycle

避免访问已经访问过的边缘,但它不起作用,我不明白为什么 ((ROW(o.parent, o.child) <> any(path)) 应该在再次进入循环边缘之前避免循环但不起作用).如何在关闭循环的节点前一步停止循环?

to avoid visiting already visited edges, but it doesn't work and I cannot understand why ((ROW(o.parent, o.child) <> any(path)) should avoid cycling before going on the cycled edge again but doesn't work). How can I do to stop cycles one step before the node that closes the cycle?

编辑:按照 danihp 的建议,解决我使用的第二个问题

Edit: as danihp suggested, to solve the second problem I used

where
    (g.parent = o.child or g.child = o.parent) 
    and not (ROW(o.parent, o.child) = any(path))
    and not cycle

现在输出不包含循环.行从 13 行到 6 行,但我仍然有重复项,因此提取所有边而没有重复项和不同项的主要(第一个)问题仍然存在.当前输出带有 而不是 ROW

and now the output contains no cycles. Rows went from 13 to 6, but I still have duplicates, so the main (the first) problem of extracting all the edges without duplicates and without distinct is still alive. Current output with and not ROW

 parent | child |           path            | cycle 
--------+-------+---------------------------+-------
      1 |     2 | {"(1,2)"}                 | f
      2 |     1 | {"(1,2)","(2,1)"}         | f
      2 |     3 | {"(1,2)","(2,3)"}         | f
      3 |     1 | {"(1,2)","(3,1)"}         | f
      3 |     1 | {"(1,2)","(2,3)","(3,1)"} | f
      2 |     3 | {"(1,2)","(3,1)","(2,3)"} | f
(6 rows)

编辑 #2::按照 Erwin Brandstetter 的建议,我修改了我的查询,但如果我没有错,建议的查询提供的行数比我的多(ROW 比较仍然存在,因为对我来说似乎更清楚,即使我知道字符串比较会更有效).使用新查询,我获得了 20 行,而我的获得了 6 行

Edit #2:: following what Erwin Brandstetter suggested, I modified my query, but if I'm not wrong, the proposed query gives MORE rows than mine (ROW comparison is still there as it seems more clear to me, even I understood that string comparison will be more efficient). Using the new query, I obtain 20 rows, while mine gives 6 rows

WITH RECURSIVE graph(parent, child, path, depth) AS (
SELECT o.parent, o.child, ARRAY[ROW(o.parent, o.child)], 0
    from ownership o
    where 1 in (o.child, o.parent)
UNION ALL
SELECT 
    o.parent, o.child,
    path||ROW(o.parent, o.child), 
    depth+1
    from 
        ownership o, graph g
    where 
        g.child in (o.parent, o.child) 
        and ROW(o.parent, o.child) <> ALL(path)

)
select  g.parent, g.child from graph g

编辑 3:所以,正如 Erwin Brandstetter 所指出的,最后一个查询仍然是错误的,而正确的可以在他的答案中找到.

Edit 3: so, as Erwin Brandstetter pointed out, the last query was still wrong, while the right one can be found in his answer.

当我发布我的第一个查询时,我不明白我错过了一些连接,因为它发生在以下情况:如果我从节点 3 开始,数据库选择行 (2,3)(3,1).然后,查询的第一个归纳步骤将选择,从这些行中加入行 (1,2)(2,3)(3,1),缺少应该包含在结果中的行 (2,1),因为算法在概念上暗示((2,1) 是接近"(3,1) )

When I posted my first query, I hadn't understood that I was missing some joins, as it happens in the following case: if I start with the node 3, the db selects the rows (2,3) and (3,1). Then, the first inductive step of the query would select, joining from these rows, the rows (1,2), (2,3) and (3,1), missing the row (2,1) that should be included in the result as conceptually the algorithm would imply ( (2,1) is "near" (3,1) )

当我尝试改编 Postgresql 手册中的示例时,我尝试加入 ownership 的父子节点是正确的,但是我错误地没有保存 graph 必须在每个步骤中加入.

When I tried to adapt the example in Postgresql manual, I was right trying to join ownership's parent and child, but I was wrong not saving the value of graph that had to be joined in each step.

这些类型的查询似乎根据起始节点(即根据基本步骤中选择的行集)生成不同的行集.因此,我认为在基本步骤中仅选择包含起始节点的一行会很有用,因为无论如何您都会得到任何其他相邻"节点.

These type of queries seem to generate a different set of rows depending on the starting node (i.e. depending on the set of rows selected in the base step). So, I think it could be useful to select just one row containing the starting node in the base step, as you'll get any other "adjacent" node anyway.

推荐答案

可以这样工作:

WITH RECURSIVE graph AS (
    SELECT parent
          ,child
          ,',' || parent::text || ',' || child::text || ',' AS path
          ,0 AS depth
    FROM   ownership
    WHERE  parent = 1

    UNION ALL
    SELECT o.parent
          ,o.child
          ,g.path || o.child || ','
          ,g.depth + 1
    FROM   graph g
    JOIN   ownership o ON o.parent = g.child
    WHERE  g.path !~~ ('%,' || o.parent::text || ',' || o.child::text || ',%')
    )
SELECT  *
FROM    graph

你提到了性能,所以我朝那个方向进行了优化.

You mentioned performance, so I optimized in that direction.

  • 仅在定义的方向上遍历图形.

不需要列cycle,而是将其设为排除条件.少走一步.这也是对以下问题的直接回答:

No need for a column cycle, make it an exclusion condition instead. One less step to go. That is also the direct answer to:

如何在关闭节点的节点之前停止循环一步循环?

How can I do to stop cycles one step before the node that closes the cycle?

  • 使用字符串来记录路径.比行数组更小、更快.仍然包含所有必要的信息.不过,可能会随着非常大的 bigint 数字而改变.

    • Use a string to record the path. Smaller and faster than an array of rows. Still contains all necessary information. Might change with very big bigint numbers, though.

      使用 LIKE 运算符 (~~) 检查循环应该会快得多.

      Check for cycles with the LIKE operator (~~), should be much faster.

      如果您不希望随着时间的推移超过 2147483647 行,请使用普通的 integer 列而不是 bigint.更小更快.

      If you don't expect more that 2147483647 rows over the course of time, use plain integer columns instead of bigint. Smaller and faster.

      确保在 parent 上有一个 index.child 上的索引与我的查询无关.(除了您在两个方向上遍历边缘的原始版本.)

      Be sure to have an index on parent. Index on child is irrelevant for my query. (Other than in your original where you traverse edges in both directions.)

      对于巨大的图形,我会切换到plpgsql过程,您可以在其中将路径作为临时表维护为一行每个步骤和匹配的索引.不过,这会带来一些开销,但会因巨大的图表而得到回报.

      For huge graphs I would switch to a plpgsql procedure, where you can maintain the path as temp table with one row per step and a matching index. A bit of an overhead, that will pay off with huge graphs, though.

      原始查询中的问题:

      WHERE (g.parent = o.child or g.child = o.parent) 
      

      在过程中的任何一点,您的遍历都只有一个端点.当您在两个方向上浏览有向图时,端点可以是父节点或子节点——但不能同时是两者.你必须保存每一步的端点,然后:

      There is only one endpoint of your traversal at any point in the process. As you wlak the directed graph in both directions, the endpoint can be either parent or child - but not both of them. You have to save the endpoint of every step, and then:

      WHERE g.child IN (o.parent, o.child) 
      

      违反方向也让你的起始条件有问题:

      The violation of the direction also makes your starting condition questionable:

      WHERE parent = 1
      

      应该是

      WHERE 1 IN (parent, child)
      

      并且两行 (1,2)(2,1) 以这种方式有效地重复...

      And the two rows (1,2) and (2,1) are effectively duplicates this way ...

      • 忽略方向
      • 每条路径仍然只走任何边缘一次.
      • 使用 ARRAY 作为路径
      • 在路径中保存原始方向,而不是实际方向.

      注意,这种方式 (2,1)(1,2) 是有效的重复项,但两者可以在同一路径中使用.

      Note, that this way (2,1) and (1,2) are effective duplicates, but both can be used in the same path.

      我介绍列leaf,它保存了每一步的实际端点.

      I introduce the column leaf which saves the actual endpoint of every step.

      WITH RECURSIVE graph AS (
          SELECT CASE WHEN parent = 1 THEN child ELSE parent END AS leaf
                ,ARRAY[ROW(parent, child)] AS path
                ,0 AS depth
          FROM   ownership
          WHERE  1 in (child, parent)
      
          UNION ALL
          SELECT CASE WHEN o.parent = g.leaf THEN o.child ELSE o.parent END -- AS leaf
                ,path || ROW(o.parent, o.child) -- AS path
                ,depth + 1 -- AS depth
          FROM   graph g
          JOIN   ownership o ON g.leaf in (o.parent, o.child) 
          AND    ROW(o.parent, o.child) <> ALL(path)
          )
      SELECT *
      FROM   graph
      

      这篇关于使用递归查询访问有向图,就好像它是无向图一样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆