使用递归查询来访问有向图,就好像它是无向图 [英] Visiting a directed graph as if it were an undirected one, using a recursive query
问题描述
考虑以下有向图
1-> 2
2-> 1,3
3-> 1
表存储这些关系:
create database test ;
\c test;
创建表所有权(
父级bigint,
子级bigint,
主键(父级,子级)
);
插入所有权(父,子)值(1,2);
插入所有权(父,子)值(2,1);
插入所有权(父,子)值(2,3);
插入所有权(父,子)值(3,1);
我想提取所有半连接边(即连接边忽略方向)从节点可到达的图形。也就是说,如果我从parent = 1开始,我想要有以下输出:
1,2
2,1
2,3
3,1
我正在使用 postgresql 。
我修改了上的例子,它解释了递归查询,并且我已经调整了连接条件以上和下(这样做,我忽略了方向)。我的查询如下:
\c测试
带有RECURSIVE图(父,子(
SELECT o.parent,o.child,ARRAY [ROW(o.parent,o.child)],0,false
所有权o
where o.parent = 1
UNION ALL
SELECT
o.parent,o.child,
path || ROW(o.parent,o.child),
深度+ 1,
ROW(o.parent,o.child)= ANY(路径)
从
所有权o,图g
其中
(g。 parent = o.child或g.child = o.parent)
并且不循环
)
选择g.parent,g.child,g.path,g.cycle
from
graph g
其输出如下:
父母|孩子|路径|循环
-------- + ------- + ---------------------------- ------- + -------
1 | 2 | {(1,2)} | f
2 | 1 | {(1,2),(2,1)} | f
2 | 3 | {(1,2),(2,3)} | f
3 | 1 | {(1,2),(3,1)} | f
1 | 2 | {(1,2),(2,1),(1,2)} | t
1 | 2 | {(1,2),(2,3),(1,2)} | t
3 | 1 | {(1,2),(2,3),(3,1)} | f
1 | 2 | {(1,2),(3,1),(1,2)} | t
2 | 3 | {(1,2),(3,1),(2,3)} | f
1 | 2 | {(1,2),(2,3),(3,1),(1,2)} | t
2 | 3 | {(1,2),(2,3),(3,1),(2,3)} | t
1 | 2 | {(1,2),(3,1),(2,3),(1,2)} | t
3 | 1 | {(1,2),(3,1),(2,3),(3,1)} | t
(13行)
我有一个问题: 查询会多次提取相同的边,因为它们是通过不同的路径到达的,我想避免这种情况。如果我将外部查询修改为
,请从图
$ c中选择不同的g.parent,g.child $ c>
我得到了期望的结果,但是由于不需要的连接完成,所以WITH查询中仍然存在效率低下的问题。
那么,是否有解决方案可以从一个给定的数据库中提取图形的可到达边缘,而不使用不同的?
我还遇到了另一个问题(这个问题已解决,请查看底部):正如您从输出中看到的那样,仅当第二次到达节点时,循环才会停止。即我有(1,2)(2,3)(1,2)
。
我想在再次循环最后一个节点之前停止循环,即有(1,2)(2,3)
。
我试图修改where条件,如下所示:
其中
(g。 parent = o.child或g.child = o.parent)
和(ROW(o.parent,o.child)<>任何(路径))
并且不循环
避免访问已经访问过的边,但它不起作用,我无法理解为什么( (ROW(o.parent,o.child)<>任何(路径)
)应该避免循环再次进入循环边缘,但不起作用)。 如何才能在关闭循环的节点之前停止循环?
$ b
编辑 :正如danihp所建议的那样,为了解决我使用的第二个问题
其中
(g.parent = o.child或g.child = o.parent)
而不是(ROW(o.parent,o.child)= any(path))
并且不循环
现在输出不包含循环。行从13到6,但我仍然有重复,所以提取所有没有重复和没有重复的边的主要(第一个)问题仍然存在。当前输出而不是ROW
父母|孩子|路径|循环
-------- + ------- + --------------------------- + -------
1 | 2 | {(1,2)} | f
2 | 1 | {(1,2),(2,1)} | f
2 | 3 | {(1,2),(2,3)} | f
3 | 1 | {(1,2),(3,1)} | f
3 | 1 | {(1,2),(2,3),(3,1)} | f
2 | 3 | {(1,2),(3,1),(2,3)} | f
(6行)
编辑#2: em::按照Erwin Brandstetter的建议,我修改了我的查询,但是如果我没有错,建议的查询会给出比我更多的行(行比较仍然存在,因为它对我来说似乎更清楚,即使是我了解字符串比较会更有效率)。
使用新的查询,我获得了20行,而我的行有6行
WITH RECURSIVE图(父,子,路径,深度)AS(
SELECT o.parent,o.child,ARRAY [ROW(o.parent,o.child)],0
从所有权o
where 1 in( o.child,o.parent)
UNION ALL
SELECT
o.parent,o.child,
path || ROW(o.parent,o.child),
深度+ 1
从
所有权o,图g
其中
g.child在(o.parent,o.child)
和ROW(o .parent,o.child)<> ALL(路径)
)
从图中选择g.parent,g.child g
编辑3 :正如Erwin Brandstetter指出的那样,而在他的回答中可以找到正确的一个。
当我发布我的第一个查询时,我并没有理解我错过了一些连接,因为它发生在以下情况:如果我从节点3开始,数据库选择行
(2,3)
和(3,1)
。然后,查询的第一个归纳步骤将从这些行中选择行(1,2)
,(2,3)
和(3,1)
,缺少应包含在结果中的行(2,1),因为概念上算法意味着((2,1)
是near(3,1)
)
当我尝试修改Postgresql手册中的示例时,我正确地尝试加入
所有权
的父级和子级,但我错误地未保存图
必须在每一步中加入。
这些类型的查询似乎会生成一组不同的行取决于起始节点(即取决于在基本步骤中选择的行集)。所以,我认为在基本步骤中只选择一行包含起始节点是有用的,因为无论如何你会得到任何其他的相邻节点。
解决方案可以这样工作:
WITH RECURSIVE graph AS(
SELECT parent
,child
,','|| parent :: text ||','|| child :: text ||','AS path
,0 AS depth
FROM ownership
WHERE parent = 1
UNION ALL
SELECT o.parent
,o.child
,g.path || o.child | |','
,g.depth + 1
FROM图g
JOIN所有权o ON o.parent = g.child
WHERE g.path!~~(' ,'|| o.parent :: text ||','|| o.child :: text ||',%')
)
SELECT *
FROM graph
您提到了性能,所以我在这方面进行了优化。
< h3>主要观点:
-
仅以定义的方向浏览图表。 无需列
循环
,请改为排除条件。减少一步。这也是直接的答案:
$ b
如何停止循环在关闭
循环的节点之前执行一步?
-
使用 string 来记录路径。比行数组更小更快。仍包含所有必要的信息。不过,使用非常大的
bigint
数字可能会发生变化。 检查包含 code> LIKE 运算符(~~
)应该快得多。 -
如果您在一段时间内不希望出现更多的2147483647行,请使用plain
integer
列代替bigint
一>。更小更快。 在
小孩
上的索引与我的查询无关。 (除了在原来的方向上遍历两个方向的边缘)。 巨大图我会切换到 plpgsql过程,您可以在该过程中将路径保存为临时表,每步一行,并匹配索引。虽然有一些开销,但它还是可以带来巨大的图表。 -
原始查询中的问题:
WHERE(g.parent = o.child或g.child = o.parent)
在流程中的任何点都只有一个端点。当你在两个方向上扫描有向图时,端点可以是父或子 - 但不是两者都有。您必须保存每一步的终点,然后:
WHERE g.child IN(o.parent,o.child )
违反方向也会使您的出发条件受到质疑:
WHERE parent = 1
必须是
WHERE 1 IN(parent,child)
并且两行(1,2)
和(2,1)
$ h $>
$ b $ h $>
$ b
(2,1)
和(1,2)
是有效的重复项,但两者都可以使用相同路径。
我介绍列 leaf $
$ b
保存每一步的实际终点。 WITH RECURSIVE图AS(
SELECT CASE WHEN parent = 1 THEN child ELSE parent END AS leaf
,ARRAY [ROW(parent,child)] AS path
,0 AS深度
所有权
WHERE 1(孩子,父母)
UNION ALL
选择案例当o.parent = g.leaf THEN o.child ELSE o .parent END - AS leaf
,path || ROW(o.parent,o.child) - AS路径
,深度+ 1 - AS深度
FROM图形g
JOIN所有权o ON g.leaf in(o.parent, o.child)
AND ROW(o.parent,o.child)<> ALL(路径)
)
SELECT *
FROM图
I need your help about the visit of a directed graph stored in a database.
Consider the following directed graph
1->2
2->1,3
3->1
A table stores those relations:
create database test;
\c test;
create table ownership (
parent bigint,
child bigint,
primary key (parent, child)
);
insert into ownership (parent, child) values (1, 2);
insert into ownership (parent, child) values (2, 1);
insert into ownership (parent, child) values (2, 3);
insert into ownership (parent, child) values (3, 1);
I'd like to extract all the semi-connected edges (i.e. the connected edges ignoring the direction) of the graph reachable from a node. I.e., if I start from parent=1, I'd like to have the following output
1,2
2,1
2,3
3,1
I'm using postgresql.
I've modified the example on Postgres' manual which explains recursive queries, and I've adapted the join condition to go "up" and "down" (doing so I ignore the directions). My query is the following one:
\c test
WITH RECURSIVE graph(parent, child, path, depth, cycle) AS (
SELECT o.parent, o.child, ARRAY[ROW(o.parent, o.child)], 0, false
from ownership o
where o.parent = 1
UNION ALL
SELECT
o.parent, o.child,
path||ROW(o.parent, o.child),
depth+1,
ROW(o.parent, o.child) = ANY(path)
from
ownership o, graph g
where
(g.parent = o.child or g.child = o.parent)
and not cycle
)
select g.parent, g.child, g.path, g.cycle
from
graph g
its output follows:
parent | child | path | cycle
--------+-------+-----------------------------------+-------
1 | 2 | {"(1,2)"} | f
2 | 1 | {"(1,2)","(2,1)"} | f
2 | 3 | {"(1,2)","(2,3)"} | f
3 | 1 | {"(1,2)","(3,1)"} | f
1 | 2 | {"(1,2)","(2,1)","(1,2)"} | t
1 | 2 | {"(1,2)","(2,3)","(1,2)"} | t
3 | 1 | {"(1,2)","(2,3)","(3,1)"} | f
1 | 2 | {"(1,2)","(3,1)","(1,2)"} | t
2 | 3 | {"(1,2)","(3,1)","(2,3)"} | f
1 | 2 | {"(1,2)","(2,3)","(3,1)","(1,2)"} | t
2 | 3 | {"(1,2)","(2,3)","(3,1)","(2,3)"} | t
1 | 2 | {"(1,2)","(3,1)","(2,3)","(1,2)"} | t
3 | 1 | {"(1,2)","(3,1)","(2,3)","(3,1)"} | t
(13 rows)
I have a problem: the query extracts the same edges many times, as they are reached through different paths, and I'd like to avoid this. If I modify the outer query into
select distinct g.parent, g.child from graph
I have the desired result, but inefficiencies remain in the WITH query, as unneeded joins are done. So, is there a solution to extract the reachable edges of a graph in a db, starting from a given one, without using distinct?
I also have another problem (this one is solved, look at the bottom): as you can see from the output, cycles stop only when a node is reached for the second time. I.e. I have (1,2) (2,3) (1,2)
.
I'd like to stop the cycle before cycling over that last node again, i.e. having (1,2) (2,3)
.
I've tried to modify the where condition as follows
where
(g.parent = o.child or g.child = o.parent)
and (ROW(o.parent, o.child) <> any(path))
and not cycle
to avoid visiting already visited edges, but it doesn't work and I cannot understand why ((ROW(o.parent, o.child) <> any(path)
) should avoid cycling before going on the cycled edge again but doesn't work). How can I do to stop cycles one step before the node that closes the cycle?
Edit: as danihp suggested, to solve the second problem I used
where
(g.parent = o.child or g.child = o.parent)
and not (ROW(o.parent, o.child) = any(path))
and not cycle
and now the output contains no cycles. Rows went from 13 to 6, but I still have duplicates, so the main (the first) problem of extracting all the edges without duplicates and without distinct is still alive. Current output with and not ROW
parent | child | path | cycle
--------+-------+---------------------------+-------
1 | 2 | {"(1,2)"} | f
2 | 1 | {"(1,2)","(2,1)"} | f
2 | 3 | {"(1,2)","(2,3)"} | f
3 | 1 | {"(1,2)","(3,1)"} | f
3 | 1 | {"(1,2)","(2,3)","(3,1)"} | f
2 | 3 | {"(1,2)","(3,1)","(2,3)"} | f
(6 rows)
Edit #2:: following what Erwin Brandstetter suggested, I modified my query, but if I'm not wrong, the proposed query gives MORE rows than mine (ROW comparison is still there as it seems more clear to me, even I understood that string comparison will be more efficient). Using the new query, I obtain 20 rows, while mine gives 6 rows
WITH RECURSIVE graph(parent, child, path, depth) AS (
SELECT o.parent, o.child, ARRAY[ROW(o.parent, o.child)], 0
from ownership o
where 1 in (o.child, o.parent)
UNION ALL
SELECT
o.parent, o.child,
path||ROW(o.parent, o.child),
depth+1
from
ownership o, graph g
where
g.child in (o.parent, o.child)
and ROW(o.parent, o.child) <> ALL(path)
)
select g.parent, g.child from graph g
Edit 3: so, as Erwin Brandstetter pointed out, the last query was still wrong, while the right one can be found in his answer.
When I posted my first query, I hadn't understood that I was missing some joins, as it happens in the following case: if I start with the node 3, the db selects the rows (2,3)
and (3,1)
. Then, the first inductive step of the query would select, joining from these rows, the rows (1,2)
, (2,3)
and (3,1)
, missing the row (2,1) that should be included in the result as conceptually the algorithm would imply ( (2,1)
is "near" (3,1)
)
When I tried to adapt the example in Postgresql manual, I was right trying to join ownership
's parent and child, but I was wrong not saving the value of graph
that had to be joined in each step.
These type of queries seem to generate a different set of rows depending on the starting node (i.e. depending on the set of rows selected in the base step). So, I think it could be useful to select just one row containing the starting node in the base step, as you'll get any other "adjacent" node anyway.
Could work like this:
WITH RECURSIVE graph AS (
SELECT parent
,child
,',' || parent::text || ',' || child::text || ',' AS path
,0 AS depth
FROM ownership
WHERE parent = 1
UNION ALL
SELECT o.parent
,o.child
,g.path || o.child || ','
,g.depth + 1
FROM graph g
JOIN ownership o ON o.parent = g.child
WHERE g.path !~~ ('%,' || o.parent::text || ',' || o.child::text || ',%')
)
SELECT *
FROM graph
You mentioned performance, so I optimized in that direction.
Major points:
Traverse the graph only in the defined direction.
No need for a column
cycle
, make it an exclusion condition instead. One less step to go. That is also the direct answer to:
How can I do to stop cycles one step before the node that closes the cycle?
Use a string to record the path. Smaller and faster than an array of rows. Still contains all necessary information. Might change with very big
bigint
numbers, though.Check for cycles with the
LIKE
operator (~~
), should be much faster.If you don't expect more that 2147483647 rows over the course of time, use plain
integer
columns instead ofbigint
. Smaller and faster.Be sure to have an index on
parent
. Index onchild
is irrelevant for my query. (Other than in your original where you traverse edges in both directions.)For huge graphs I would switch to a plpgsql procedure, where you can maintain the path as temp table with one row per step and a matching index. A bit of an overhead, that will pay off with huge graphs, though.
Problems in your original query:
WHERE (g.parent = o.child or g.child = o.parent)
There is only one endpoint of your traversal at any point in the process. As you wlak the directed graph in both directions, the endpoint can be either parent or child - but not both of them. You have to save the endpoint of every step, and then:
WHERE g.child IN (o.parent, o.child)
The violation of the direction also makes your starting condition questionable:
WHERE parent = 1
Would have to be
WHERE 1 IN (parent, child)
And the two rows (1,2)
and (2,1)
are effectively duplicates this way ...
Additional solution after comment
- Ignore direction
- Still walk any edge only once per path.
- Use ARRAY for path
- Save original direction in path, not actual direction.
Note, that this way (2,1)
and (1,2)
are effective duplicates, but both can be used in the same path.
I introduce the column leaf
which saves the actual endpoint of every step.
WITH RECURSIVE graph AS (
SELECT CASE WHEN parent = 1 THEN child ELSE parent END AS leaf
,ARRAY[ROW(parent, child)] AS path
,0 AS depth
FROM ownership
WHERE 1 in (child, parent)
UNION ALL
SELECT CASE WHEN o.parent = g.leaf THEN o.child ELSE o.parent END -- AS leaf
,path || ROW(o.parent, o.child) -- AS path
,depth + 1 -- AS depth
FROM graph g
JOIN ownership o ON g.leaf in (o.parent, o.child)
AND ROW(o.parent, o.child) <> ALL(path)
)
SELECT *
FROM graph
这篇关于使用递归查询来访问有向图,就好像它是无向图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!