使用递归查询在Oracle SQL中使用有向图仅访问每个节点一次 [英] Directed graph in Oracle SQL using recursive query visiting each node only once

查看:31
本文介绍了使用递归查询在Oracle SQL中使用有向图仅访问每个节点一次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说明

在我们的问题域中,我们正在研究一组连接在一起以形成图形的边.从给定的一个或多个节点开始,我们必须列出整个图中连接到给定的一个或多个节点的所有链接.我们必须从左到右,从上到下显示这些链接.

In our problem domain we are working on a set of edges that connect together forming a graph. Starting from a given node (or nodes), we have to list all links within the entire graph that are connected to the given node (or nodes). We have to show these links from left-to-right, top-to-bottom.

对于循环数量有限的图形,我们有一个针对此问题的有效查询.循环次数越多,执行时间就成倍增加.

We have a working query for this problem for graphs with a limited number of loops. Higher number of loops increases the execution time exponentially.

我们需要在递归过程中限制对同一节点的访问,以找到可行的解决方案.

We need to limit visits to the same node during recursion to have a working solution.

下面的示例仅包含一个循环,但是此循环已引起86个额外的和过时的行.

The example below contains only a single loop, but this single loop is already causing 86 additional and obsolete rows.

在类似的文章中,使用ROW和ANY运算符为postgresql提供了解决方案,但我找不到Oracle解决方案.

In similar posts solutions are provided for postgresql using ROW and ANY operators, but I could not find an Oracle solution.

我们正在寻找解决方案的替代方法,或者是将访问次数限制在同一边缘的方法.

We are searching for either an alternative for our solution or a way to limit the number of visits to the same edges.

任何帮助将不胜感激!

类似

使用递归查询访问有向图,就好像它是无向图一样提供了Postgresql解决方案.我们必须使用Oracle11g.

Visiting a directed graph as if it were an undirected one, using a recursive query provides a solution in postgresql. We are required to use Oracle11g.

示例

边缘

A-B, B-D, C-A, C-E, C-F, H-F, E-B, G-D, G-I

图形

    A
  /   \
C - E - B - D
  \       /
H - F   G - I

DDL和DML

CREATE TABLE EDGE (
  FROM_ID VARCHAR(10),
  TO_ID   VARCHAR(10)
);

INSERT INTO EDGE VALUES ('A', 'B');
INSERT INTO EDGE VALUES ('E', 'B');
INSERT INTO EDGE VALUES ('C', 'E');
INSERT INTO EDGE VALUES ('C', 'A');
INSERT INTO EDGE VALUES ('C', 'F');
INSERT INTO EDGE VALUES ('B', 'D');
INSERT INTO EDGE VALUES ('G', 'D');
INSERT INTO EDGE VALUES ('H', 'F');
INSERT INTO EDGE VALUES ('G', 'I');

输入

nodes: 'A'

必需的输出

C   A
C   E
C   F
H   F
A   B
E   B
B   D
G   D
G   I

当前解决方案

我们当前的解决方案完全返回了我们所需要的,但是如上所述,每个额外的循环都会成倍地增加执行时间.

Our current solution returns exactly what we need, but as mentioned above each additional loop increases the execution time exponentially.

SELECT
  c.LVL,
  c.FROM_ID,
  c.TO_ID,
  CASE
  WHEN lag(C.TO_ID)
       OVER (
         PARTITION BY C.LVL
         ORDER BY C.LVL, C.TO_ID ) = C.TO_ID
    THEN C.LVL || '-' || C.TO_ID
  WHEN lead(C.TO_ID)
       OVER (
         PARTITION BY C.LVL
         ORDER BY C.LVL, C.TO_ID ) = C.TO_ID
    THEN C.LVL || '-' || C.TO_ID
  ELSE C.LVL || '-' || C.FROM_ID
  END GROUP_ID
FROM (
       WITH chain(LVL, FROM_ID, TO_ID ) AS (
         SELECT
           1            LVL,
           root.FROM_ID FROM_ID,
           root.TO_ID   TO_ID
         FROM EDGE root
         WHERE root.TO_ID IN (:nodes)
               OR (root.FROM_ID IN (:nodes) AND NOT EXISTS(
             SELECT *
             FROM EDGE
             WHERE TO_ID IN (:nodes)
         ))
         UNION ALL
         SELECT
           LVL +
           CASE
           WHEN previous.TO_ID = the_next.FROM_ID
             THEN 1
           WHEN previous.TO_ID = the_next.TO_ID
             THEN 0
           WHEN previous.FROM_ID = the_next.FROM_ID
             THEN 0
           ELSE -1
           END              LVL,
           the_next.FROM_ID FROM_ID,
           the_next.TO_ID   TO_ID
         FROM EDGE the_next
           JOIN chain previous ON previous.TO_ID = the_next.FROM_ID
                                  OR the_next.TO_ID = previous.FROM_ID
                                  OR (previous.TO_ID = the_next.TO_ID AND previous.FROM_ID <> the_next.FROM_ID)
                                  OR (previous.TO_ID <> the_next.TO_ID AND previous.FROM_ID = the_next.FROM_ID)
       )
         SEARCH BREADTH FIRST BY FROM_ID SET ORDER_ID
         CYCLE FROM_ID, TO_ID SET CYCLE TO 1 DEFAULT 0
       SELECT
         C.*,
         row_number()
         OVER (
           PARTITION BY LVL, FROM_ID, TO_ID
           ORDER BY ORDER_ID ) rank
       FROM chain C
       ORDER BY LVL, FROM_ID, TO_ID
     ) C
WHERE C.rank = 1;

推荐答案

为了使遍历算法不返回到已经访问过的边缘,确实可以将访问过的边缘保持在某个位置.正如您已经发现的那样,使用字符串连接不会取得很大的成功.但是,还有其他可用的值级联"技术可用...

For keeping the traversing algorithm out of returning to already visited edges, one can indeed keep the visited edges somewhere. As you already found out, you won't get much of a success with a string concatenation. However, there are other usable "value concatenation" techniques available...

您必须可以使用一个方便的模式级标量集合:

You have to have one handy schema-level collection of scalars at your disposal:

create or replace type arr_strings is table of varchar2(64);

然后您可以在每次迭代中将访问的边缘收集到该集合中:

And then you can collect the visited edges to that collection in each iteration:

with nondirected$ as (
    select from_id, to_id, from_id||'-'||to_id as edge_desc
    from edge
    where from_id != to_id
    union all
    select to_id, from_id, from_id||'-'||to_id as edge_desc
    from edge
    where (to_id, from_id) not in (
            select from_id, to_id
            from edge
        )
),
graph$(lvl, from_id, to_id, edge_desc, visited_edges) as (
    select 1, from_id, to_id, edge_desc,
        arr_strings(edge_desc)
    from nondirected$ R
    where from_id in (&nodes)
    --
    union all
    --
    select
        lvl+1,
        Y.from_id, Y.to_id, Y.edge_desc,
        X.visited_edges multiset union arr_strings(Y.edge_desc)
    from graph$ X
        join nondirected$ Y
            on Y.from_id = X.to_id
    where not exists (
            select 1
            from table(X.visited_edges) Z
            where Y.edge_desc = Z.column_value
        )
)
search breadth first by edge_desc set order_id
    cycle edge_desc set is_cycle to 1 default 0,
ranked_graph$ as (
    select C.*,
        row_number() over (partition by edge_desc order by lvl, order_id) as rank$
    from graph$ C
--    where is_cycle = 0
)
select *
from ranked_graph$
--where rank$ <= 1
order by lvl, order_id
;

注释

  1. 我通过 union (对输入的一组反向边)将有向图预处理为一个无向图.这应该使递归遍历谓词更易于阅读.仅出于我的目的,以便于更轻松地读取和编写SQL.当然,您不必这样做.
  2. 我记得几年前在Oracle 11.2上尝试过类似的方法.我记得那是失败的,尽管我不记得为什么.在12.2上,它运行正常.也尝试一下11g.我没有空位.
  3. 由于每次迭代都执行,所以除了遍历内部联接之外,还进行了反向联接,因此,我真诚地怀疑这会提高性能.但是,它肯定解决了减少递归嵌套数量的问题.
  4. 您可能必须自行解决所需的订购,正如您可能从我的评论中了解的那样.:-)
  1. I pre-processed the directed graph to a nondirected one by union-ing a set of reverse edges to the input. That should make the recursive traversal predicates easier to read. Solely for my purposes of easier reading+writing of the SQL. You don't have to do that, of course.
  2. I remember trying something like this a few years ago on Oracle 11.2. And I remember that it was failing, though I don't remember why. On 12.2, it ran OK. Just give it a try on 11g, too; I don't have one available.
  3. Since each iteration does, in addition to the traversal inner join, also an anti-join, I sincerely doubt that this is going to be more performant. However, it for sure solves the problem of lowering the number of recursive nestings.
  4. You'll have to resolve the desired ordering on your own, as you probably understood from my comments. :-)

将重新访问的边限制为零

在SQL中,不能.您提到的PostgreSQL解决方案可以做到这一点.但是,在Oracle中,您不能这样做.对于每个遍历联接,您都必须测试所有其他遍历联接的行.这将意味着某种聚合或分析……Oracle禁止并抛出ORA异常.

In SQL, you can't. The PostgreSQL solution you mentioned does do it. In Oracle, however, you can't. You would have to, for each traversal join, test rows for all other traversal joins. And that would mean some kind of aggregation or analytics... which Oracle forbids and throws out an ORA exception.

要抢救PLSQL吗?

不过,您可以在PL/SQL中执行此操作.它应该具有多少性能,取决于要在数据库中预取边缘时要花费多少内存,以及愿意从当前"节点遍历图时要花费多少SQL往返次数,或者是否愿意使用与您宁愿与常规 arr_output 集合 l_visited_nodes 进行反联接相比,更多的内存可将访问的节点保留在按边索引的精美索引中.您有多种选择,明智地选择.

You can do it in PL/SQL, though. How much performant it is supposed to be, depends on how much memory you want to spend on prefetching edges from DB vs. how many SQL roundtrips you are willing to take to traverse the graph from "current" nodes or if you are willing to use even more memory to keep the visited nodes in a fancy indexed-by-edge collection vs. if you rather anti-join against a regular arr_output collection l_visited_nodes. You have multiple choices, choose wisely.

无论如何,对于最广泛使用SQL引擎的最简单情况,这可能是您正在寻找的代码...

Anyway, for the simplest scenario with heavier use of SQL engine, this might be the code you're looking for...

create or replace
package pkg_so_recursive_traversal
is


type rec_output                     is record (
    from_id                             edge.from_id%type,
    to_id                               edge.to_id%type,
    lvl                                 integer
);
type arr_output                     is table of rec_output;


function traverse_a_graph
    ( i_from                        in arr_strings
    , i_is_directed                 in varchar2 default 'NO' )
    return arr_output
    pipelined;


end pkg_so_recursive_traversal;
/
create or replace
package body pkg_so_recursive_traversal
is


function traverse_a_graph
    ( i_from                        in arr_strings
    , i_is_directed                 in varchar2 )
    return arr_output
    pipelined
is
    l_next_edges                    arr_output;
    l_current_edges                 arr_output;
    l_visited_edges                 arr_output := arr_output();
    l_out                           rec_output;
    i                               pls_integer;
    l_is_directed                   varchar2(32) := case when i_is_directed = 'YES' then 'YES' else 'NO' end;
begin
    select E.from_id, E.to_id, 0
    bulk collect into l_next_edges
    from table(i_from) F
        join edge E
            on F.column_value in (E.from_id, case when l_is_directed = 'YES' then null else E.to_id end)
    where E.from_id != E.to_id;

    l_out.lvl := 0;

    loop
        dbms_output.put_line(l_next_edges.count());
        exit when l_next_edges.count() <= 0;
        l_out.lvl := l_out.lvl + 1;

        -- spool the edges to output
        i := l_next_edges.first();
        while i is not null loop
            l_out.from_id := l_next_edges(i).from_id;
            l_out.to_id := l_next_edges(i).to_id;
            pipe row(l_out);
            i := l_next_edges.next(i);
        end loop;

        l_current_edges := l_next_edges;
        l_visited_edges := l_visited_edges multiset union l_current_edges;

        -- find next edges
        select unique E.from_id, E.to_id, 0
        bulk collect into l_next_edges
        from table(l_current_edges) CE
            join edge E
                on CE.to_id in (E.from_id, case when l_is_directed = 'YES' then null else E.to_id end)
                or l_is_directed = 'NO' and CE.from_id in (E.from_id, E.to_id)
        where E.from_id != E.to_id
            and not exists (
                select 1
                from table(l_visited_edges) VE
                where VE.from_id = E.from_id
                    and VE.to_id = E.to_id
            );
    end loop;

    return;
end;


end pkg_so_recursive_traversal;

/

当调用 A 的起始节点并认为图形是无向的...

When called for the starting node of A and considering the graph to be undirected...

select *
from table(pkg_so_recursive_traversal.traverse_a_graph(
        i_from => arr_strings('A'),
        i_is_directed => 'NO'
    ));

...它产生...

... it yields...

FROM_ID    TO_ID             LVL
---------- ---------- ----------
A          B                   1
C          A                   1
C          E                   2
B          D                   2
C          F                   2
E          B                   2
G          D                   3
H          F                   3
G          I                   4

注释

  1. 再次,我没有做出任何努力来保持您所请求的顺序,因为您说这并不重要.
  2. 这是对 edge 表的多次SQL往返(对于您的示例输入而言,恰好是5次).与具有冗余边缘访问功能的纯SQL解决方案相比,这可能会或可能不会对性能造成更大的影响.正确测试更多解决方案,看看哪一种最适合您.
  3. 这段特定的代码将在12c及更高版本上运行.对于11g及更低版本,您必须在架构级别声明 rec_output arr_output 类型.
  1. Again, I did not put any effort to keep the ordering you requested, as you said it's not that important.
  2. This is doing multiple (exactly 5 for your example inputs) SQL roundtrips to the edge table. That may or may not be a bigger performance hit when compared to the pure-SQL solution with redundant edge visiting. Test more solutions properly, see which one works for you the best.
  3. This particular piece of code will work on 12c and higher. For 11g and lower you'll have to declare the rec_output and arr_output types on the schema level.

这篇关于使用递归查询在Oracle SQL中使用有向图仅访问每个节点一次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆