多次防止递归CTE访问节点 [英] Prevent recursive CTE visiting nodes multiple times

查看:134
本文介绍了多次防止递归CTE访问节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下简单的DAG:

  1-> 2-> 3-> 4 

以及一个表格#bar,描述了这个(我正在使用SQL Server 2005):

  parent_id child_id 
1 2
2 3
3 4
// ...其他边缘,没有连接到上面的子图上

现在想象一下,我有一些其他任意标准选择第一个和最后一个边缘,即1→2和3→4。我想用这些来查找我的图形的其余部分。



我可以编写递归CTE,如下所示(我使用来自 MSDN ):

 < (
)//选择第一个和最后一个边的锚成员:
从#bar中选择parent_id,child_id其中parent_id在(1,3)
union all
//递归成员:
select#bar。* from #bar
在#bar.parent_id = foo.child_id
$ b $上加入foo b从foo中选择parent_id,child_id

然而,这会导致边缘3> 4被选中两次:

  parent_id child_id 
1 2
3 4
2 3
3 4 //第二次出现!

如何防止查询被递归到已经描述的子图中?如果在查询的递归成员部分中,我可以引用到目前为止已由递归CTE检索到的所有数据(并提供一个在递归成员中指示的谓词,排除节点已经访问过)。但是,我想我可以访问由递归成员的最后一次迭代返回的数据。



当存在时,是很多这样的重复。有没有办法来防止这种不必要的额外递归?



请注意,我可以在语句的最后一行使用select distinct来获得所需的结果,但是这似乎是在所有(重复)递归完成后应用,所以我不认为这是一个理想的解决方案。



编辑 - hainstech建议停止递归,方法是添加一个谓词以排除在启动集中明确指定的递归路径,即仅递归,其中foo.child_id不在(1,3) code>。这适用于上述情况,因为它很简单 - 所有重复部分都在节点的锚集内开始。它不能解决它们可能不是的一般情况。例如,考虑将边1-> 4和4-> 5添加到上述集合。 Edge 4-> 5将被捕获两次,即使使用建议的谓词。 : CTE 是递归的。

>

当您的 CTE 有多个初始条件时,这意味着它们也有不同的递归堆栈,并且没有办法在另一个堆栈中使用来自堆栈的信息。



在您的示例中,递归堆栈将如下所示:

<$ (1) - 第一个IN条件
(1,2)
(1,2,3)
(1,2,3,4)
(1,2,3) - 不再有子女
(1,2) - 不再有子女
(1) - 没有子女,进入第二个IN条件

(3) - 第二个条件
(3,4)
(3) - 不再有子女,返回

正如你所看到的,这些递归堆栈不会相交。



你可能会将访问值记录在临时表中, code> JOIN 每个值都带有temptable,如果找到它,不要跟随此值,但 SQL Server 不支持这些东西。



所以你只需使用 SELECT DISTINCT


Consider the following simple DAG:

  1->2->3->4

And a table, #bar, describing this (I'm using SQL Server 2005):

parent_id   child_id
1           2
2           3
3           4
//... other edges, not connected to the subgraph above

Now imagine that I have some other arbitrary criteria that select the first and last edges, i.e. 1->2 and 3->4. I want to use these to find the rest of my graph.

I can write a recursive CTE as follows (I'm using terminology from MSDN):

with foo(parent_id,child_id) as (
// anchor member that happens to select first and last edges:
select parent_id,child_id from #bar where parent_id in (1,3)
union all
// recursive member:
select #bar.* from #bar
join foo on #bar.parent_id = foo.child_id
)
select parent_id,child_id from foo

However, this results in edge 3->4 being selected twice:

parent_id  child_id
1          2
3          4
2          3
3          4    // 2nd appearance!

How can I prevent the query from recursing into subgraphs that have already been described? I could achieve this if, in my "recursive member" part of the query, I could reference all data that has been retrieved by the recursive CTE so far (and supply a predicate indicating in the recursive member excluding nodes already visited). However, I think I can access data that was returned by the last iteration of the recursive member only.

This will not scale well when there is a lot of such repetition. Is there a way of preventing this unnecessary additional recursion?

Note that I could use "select distinct" in the last line of my statement to achieve the desired results, but this seems to be applied after all the (repeated) recursion is done, so I don't think this is an ideal solution.

Edit - hainstech suggests stopping recursion by adding a predicate to exclude recursing down paths that were explicitly in the starting set, i.e. recurse only where foo.child_id not in (1,3). That works for the case above only because it simple - all the repeated sections begin within the anchor set of nodes. It doesn't solve the general case where they may not be. e.g., consider adding edges 1->4 and 4->5 to the above set. Edge 4->5 will be captured twice, even with the suggested predicate. :(

解决方案

The CTE's are recursive.

When your CTE's have multiple initial conditions, that means they also have different recursion stacks, and there is no way to use information from one stack in another stack.

In your example, the recursion stacks will go as follows:

(1) - first IN condition
(1, 2)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3) - no more children
(1, 2) - no more children
(1) - no more children, going to second IN condition

(3) - second condition
(3, 4)
(3) - no more children, returning

As you can see, these recursion stack do not intersect.

You could probably record the visited values in a temporary table, JOIN each value with the temptable and do not follow this value it if it's found, but SQL Server does not support these things.

So you just use SELECT DISTINCT.

这篇关于多次防止递归CTE访问节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆