在递归CTE中检测重复项 [英] Detect duplicate items in recursive CTE

查看：128 发布时间：2020/5/29 20:12:18 sql postgresql common-table-expression

本文介绍了在递归CTE中检测重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在数据库中存储了一组依赖项。我正在寻找直接或间接依赖于当前对象的所有对象。由于对象可以依赖零个或多个其他对象，因此对象1两次依赖对象1（9依赖于4和5，两者都依赖于1）是完全合理的。我想获取不依赖复制的所有依赖于当前对象的对象的列表。

如果存在循环，这将变得更加复杂。没有循环，一个人可以使用DISTINCT，尽管经过长链不只一次以在末尾剔除它们仍然是一个问题。但是，使用循环时，递归CTE不要与已经看到的东西结合起来很重要。

所以到目前为止，我的情况是这样的：

 使用递归的__dependents AS（
 SELECT对象，array [object.id] AS seen_objects 
 FROM Instant_object_dependents（_objectid ）object 
 UNION ALL 
 SELECT对象d.seen_objects || object.id 
 FROM __dependents d 
 JOIN Instant_object_dependents（（d.object）.id）object 
 ON object.id<> ALL（d.seen_objects）
）SELECT（object）。* FROM __depends;

（它在存储过程中，所以我可以传入 _objectid ）

不幸的是，当我之前在当前链中看到它时，它只是省略了一个给定的对象，如果递归CTE是

理想情况下，解决方案是使用SQL而不是PLPGSQL，但是任何一种都可以。 / p>

例如，我在postgres中进行了设置：

  create表对象依赖关系（
 id int，
dependon int 
）; 
 
创建对象依赖关系的索引（dependson）； 
 
插入对象相关性值（1、2），（1、4），（2、3），（2、4），（3、4）;

然后我尝试运行此命令：

 与递归rdeps一样（
从对象依赖中选择dep 
 dep 
其中dep.dependson = 4-起点
并入所有
从对象相关性中选择dep 
 dep 
 join rdeps r（$ r 
 on（r.dep）.id = dep.dependson 
）select（dep）.id from rdeps;

我期望输出为 1、2、3。

但是，这种情况会永远持续下去（我也不明白）。如果我添加级别支票（选择dep，级别为0 ，... 选择dep，级别+ 1 ，在上，级别< 3 ），我看到2和3在重复。相反，如果我添加可见检查：

 ，递归rdeps为（
选择dep，array [id]为从对象依赖性dep中看到
 deb 
其中dep.dependson = 4-起点
合并所有
 select dep，r.seen || dep.id 
从对象依赖性dep 
加入rdeps r（$ r 
）（r.dep）.id = dep.dependson and dep.id<> ALL（r.seen）
）从rdeps中选择（dep）.id ;

然后我得到1，2，3，2，3，它停止了。我可以在外部select中使用 DISTINCT ，但是由于没有循环，因此只能合理地处理此数据。有了更大的数据集和更多的循环，我们将继续增加CTE的输出，只是让DISTINCT减少它的输出。我希望CTE在已经在其他地方看到该特定值时就停止该分支。

Edit ：这不仅仅是循环检测（尽管可能会有周期）。这是关于直接或间接地发现此对象引用的所有内容。因此，如果我们有1-> 2-> 3-> 5-> 6-> 7和2-> 4-> 5，我们可以从1开始，转到2，从那里我们可以转到3和4这些分支中的5个将移至5，但我不需要两个分支都可以-第一个分支可以移至5，而另一个可以停在该位置。然后我们继续进行6和7。大多数循环检测将找不到任何循环，并两次返回5、6、7。考虑到我希望我的大多数生产数据都具有0-3个立即引用，并且大多数引用也是这样，从一个对象到另一个对象有多个分支是很普遍的，而向下跳转这些分支将不会

解决方案

您可以使用tablefunc模块中存在的connectby函数。 / p>

首先，您需要启用模块

 创建扩展tablefunc;

然后，您可以使用connectby函数（根据您在问题中提供的示例表，

 从connectby（'objectdependencies'，'id'，'dependson'，' 4'，0）
 AS t（id int，dependon int，level int）
其中id！= 4;

这将返回：
1
2
3

以下是文档中参数的解释：

  connectby（text relname，文本keyid_fld，文本parent_keyid_fld 
 [，文本orderby_fld]，文本start_with，int max_depth 
 [，文本branch_delim]）

relname源关系的名称

keyid_fld关键字字段的名称

parent_keyid_fld父键字段的名称

orderby_fld用于排序同级的字段名称（可选）

start_with行的键值起始于

max_depth下降到的最大深度，或者为无限深度为零

branch_delim字符串，用于在分支输出中分隔键（可选）

请查阅文档以获取更多信息。
https://www.postgresql.org/docs/9.5/ static / tablefunc.html

I have a set of dependencies stored in my database. I'm looking to find all the objects that depend on the current one, whether directly or indirectly. Since objects can depend zero or more other objects, it's perfectly reasonable that object 1 is depended on by object 9 twice (9 depends on 4 and 5, both of which depend on 1). I'd like to get the list of all the objects that depend on the current object without duplication.

This gets more complex if there are loops. Without loops, one could use DISTINCT, though going through long chains more than once only to cull them at the end is still a problem. With loops, however, it becomes important that the RECURSIVE CTE doesn't union with something it has already seen.

So what I have so far looks like this:

WITH RECURSIVE __dependents AS (
  SELECT object, array[object.id] AS seen_objects
  FROM immediate_object_dependents(_objectid) object
  UNION ALL
  SELECT object, d.seen_objects || object.id
  FROM __dependents d
  JOIN immediate_object_dependents((d.object).id) object
    ON object.id <> ALL (d.seen_objects)
) SELECT (object).* FROM __dependents;

(It's in a stored procedure, so I can pass in _objectid)

Unfortunately, this just omits a given object when I've seen it before in the current chain, which would be fine if a recursive CTE was being done depth-first, but when it's breadth-first, it becomes problematic.

Ideally, the solution would be in SQL rather than PLPGSQL, but either one works.

As an example, I set this up in postgres:

create table objectdependencies (
  id int,
  dependson int
);

create index on objectdependencies (dependson);

insert into objectdependencies values (1, 2), (1, 4), (2, 3), (2, 4), (3, 4);

And then I tried running this:

with recursive rdeps as (
  select dep
  from objectdependencies dep
  where dep.dependson = 4 -- starting point
  union all
  select dep
  from objectdependencies dep
  join rdeps r
    on (r.dep).id = dep.dependson
) select (dep).id from rdeps;

I'm expecting "1, 2, 3" as output.

However, this somehow goes on forever (which I also don't understand). If I add in a level check (select dep, 0 as level, ... select dep, level + 1, on ... and level < 3), I see that 2 and 3 are repeating. Conversely, if I add a seen check:

with recursive rdeps as (
  select dep, array[id] as seen
  from objectdependencies dep
  where dep.dependson = 4 -- starting point
  union all
  select dep, r.seen || dep.id
  from objectdependencies dep
  join rdeps r
    on (r.dep).id = dep.dependson and dep.id <> ALL (r.seen)
) select (dep).id from rdeps;

then I get 1, 2, 3, 2, 3, and it stops. I could use DISTINCT in the outer select, but that only reasonably works on this data because there is no loop. With a larger dataset and more loops, we will continue to grow the CTE's output only to have the DISTINCT pare it back down. I would like the CTE to simply stop that branch when it's already seen that particular value somewhere else.

Edit: this is not simply about cycle detection (though there can be cycles). It's about uncovering everything referenced by this object, directly and indirectly. So if we have 1->2->3->5->6->7 and 2->4->5, we can start at 1, go to 2, from there we can go to 3 and 4, both of those branches will go to 5, but I don't need both branches to do so - the first one can go to 5, and the other can simply stop there. Then we go on to 6 and 7. Most cycle detection will find no cycles and return 5, 6, 7 all twice. Given that I expect most of my production data to have 0-3 immediate references, and most of those to be likewise, it will be very common for there to be multiple branches from one object to another, and going down those branches will be not only redundant but a huge waste of time and resource.

解决方案

You can use the connectby function which exists in the tablefunc module.

First you need to enable the module

CREATE EXTENSION tablefunc;

Then you can use the connectby function (based on the sample table you provided in the question it will as follows):

SELECT distinct id
FROM connectby('objectdependencies', 'id', 'dependson', '4', 0)
AS t(id int, dependson int, level int)
where id != 4;

This will return: 1 2 3

Here is an explanation of the parameters from documentation:

connectby(text relname, text keyid_fld, text parent_keyid_fld
          [, text orderby_fld ], text start_with, int max_depth
          [, text branch_delim ])

relname Name of the source relation
keyid_fld Name of the key field
parent_keyid_fld Name of the parent-key field
orderby_fld Name of the field to order siblings by (optional)
start_with Key value of the row to start at
max_depth Maximum depth to descend to, or zero for unlimited depth
branch_delim String to separate keys with in branch output (optional)

please consult the documentation for more information. https://www.postgresql.org/docs/9.5/static/tablefunc.html

这篇关于在递归CTE中检测重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在递归CTE中检测重复项 [英] Detect duplicate items in recursive CTE

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在递归CTE中检测重复项 [英] Detect duplicate items in recursive CTE

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭