如何使用WITH RECURSIVE子句进行选择 [英] How to select using WITH RECURSIVE clause

查看:937
本文介绍了如何使用WITH RECURSIVE子句进行选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在Google上搜索并阅读了一些文章,例如
此postgreSQL手册页
此博客页
并尝试进行查询我自己取得了一定程度的成功(其中的一部分挂起,而其他的则挂起并且运行良好),
,但是到目前为止,我还不能完全完全理解魔术的工作原理。



任何人都可以给出非常清晰的解释来说明这种查询语义和执行过程吗,
可以更好地基于典型样本(例如因式计算或<$ c $中的完整树扩展) c>(id,parent_id,name)表?



什么是基本准则和典型的错误? code>具有递归查询?

解决方案

首先,让我们尝试简化并阐明手册页。为简化起见,现在仅考虑 union all 和递归子句(以及 union 之后):

 使用递归伪实体名称(列名称)AS(
Initial-SELECT
UNION ALL
使用伪实体名
进行递归选择
使用伪实体名

为澄清这一点,让我们用伪代码描述查询执行过程:

  working-recordset = Initial-SELECT 

的结果将工作记录集附加到空的外部记录集

while(工作记录集不为空)开始

新的工作记录集=递归选择
的结果,将先前的工作记录集作为伪实体名称

将工作记录集追加到外部记录集

结束

整体结果=外部选择
的结果,将外部记录集作为伪实体名称

n更短-数据库引擎执行初始选择,将其结果行作为工作集。然后,它对工作集重复执行递归选择,每次用获得的查询结果替换工作集的内容。当递归选择返回空集时,此过程结束。并首先收集由初始选择然后递归选择给出的所有结果行,并将其馈送到外部选择,该结果将成为整体查询结果。



此查询正在计算阶乘,共3项:

 ,递归阶乘(F,n)AS(
选择1 F ,3 n
联合所有
SELECT F * n F,n-1 n从阶乘那里n> 1

从阶乘SELECT F其中n = 1

初始选择 SELECT 1 F,3 n 为我们提供初始值:3表示参数,1表示函数值。

递归选择 SELECT F * n F,n-1来自阶乘,其中n> 1 表示每次我们都需要将上一个函子值乘以最后一个参数值和减量参数值。

数据库引擎是这样执行的:



首先,它执行initail select,这给出了工作记录集的初始状态:

  F | n 
-+-
1 | 3

然后使用递归查询转换工作记录集并获取其第二种状态:

  F | n 
-+-
3 | 2

然后是第三种状态:

  F | n 
-+-
6 | 1

在第三种状态下, n> 1之后没有行递归选择中的/ code>条件,因此工作集是循环出口。



外部记录集现在包含所有行,这些行由初始选择和递归选择返回:

  F | n 
-+-
1 | 3
3 | 2
6 | 1

外部选择从外部记录集中过滤出所有中间结果,仅显示最终阶乘值,该值成为整体查询结果:

  F 
-
6

现在让我们考虑表 forest(id,parent_id,name)

  id | parent_id |名称
--- + ----------- + -----------------
1 | |项目1
2 | 1 |子项目1.1
3 | 1 |子项目1.2
4 | 1 |子项目1.3
5 | 3 |子项目1.2.1
6 | |项目2
7 | 6 |子项目2.1
8 | |项目3

'扩展全树'在这里意味着对人类中的树项进行排序可读的深度优先级,同时计算其级别和(也许)路径。如果不使用WITH RECURSIVE子句(或PostgreSQL不支持的Oracle CONNECT BY子句),那么两个任务(正确的排序和计算级别或路径)都无法在一个(甚至是恒定数量的)SELECT中解决。但是此递归查询确实可以完成工作(嗯,几乎可以做到,请参见下面的注释):

 与递归fulltree(id,parent_id ,level,name,path)AS(
SELECT id,parent_id,1作为level,name,name ||''作为从forest的路径,parent_id为null
UNION ALL
SELECT t。 id,t.parent_id,ft.level + 1作为级别,t.name,ft.path ||'/'|| t.name作为从森林t,fulltree ft的路径
,其中t.parent_id = ft。 id

通过路径
从全树顺序中选择*

数据库引擎执行它像这样:



首先,它执行initail select,这将给出 forest 表中所有最高级别的项(根) :

  id | parent_id |级别|名称|路径
--- + ----------- + ------- + ------------------ +- --------------------------------------
1 | | 1 |项目1 |项目1
8 | | 1 |项目3 |项目3
6 | | 1 |项目2 | item 2

然后,它执行递归选择,从而给出森林表:

  id | parent_id |级别|名称|路径
--- + ----------- + ------- + ------------------ +- --------------------------------------
2 | 1 | 2 |子项目1.1 |项目1 /子项目1.1
3 | 1 | 2 |分项目1.2 |项目1 /子项目1.2
4 | 1 | 2 |子项目1.3 |项目1 /子项目1.3
7 | 6 | 2 |分项目2.1 |项目2 /子项目2.1

然后,它再次执行递归选择,检索3d级项目:

  id | parent_id |级别|名称|路径
--- + ----------- + ------- + ------------------ +- --------------------------------------
5 | 3 | 3 |子项目1.2.1 |项目1 /子项目1.2 /子项目1.2.1

现在它再次执行递归选择,尝试检索



外部SELECT设置正确的人类可读行顺序,并在path列上排序:



p>

  id | parent_id |级别|名称|路径
--- + ----------- + ------- + ------------------ +- --------------------------------------
1 | | 1 |项目1 |项目1
2 | 1 | 2 |子项目1.1 |项目1 /子项目1.1
3 | 1 | 2 |分项目1.2 |项目1 /子项目1.2
5 | 3 | 3 |子项目1.2.1 |项目1 /子项目1.2 /子项目1.2.1
4 | 1 | 2 |子项目1.3 |项目1 /子项目1.3
6 | | 1 |项目2 |项目2
7 | 6 | 2 |分项目2.1 |项目2 /子项目2.1
8 | | 1 |项目3 |项目3

注意:仅当有项目名称中没有标点字符排序规则<< c $ c> / 。如果我们在项目1 * 中重新命名项目2 ,它将破坏行顺序,介于之间项目1 及其后代。

更稳定的解决方案是使用制表符( E'\t')作为查询中的路径分隔符(可以用更具可读性的路径分隔符代替)稍后:在外部选择,然后移至人类等)。制表符分隔的路径将保持正确的顺序,直到项目名称中包含制表符或控制字符为止-可以轻松检查和排除它们而不会损失可用性。



简单修改上一个查询即可扩展任意子树-您只需要用 perent_id = 1 替换条件 parent_id为空 (例如)。请注意,此查询变体将返回相对于 项目1 的所有级别和路径。



现在介绍典型错误。特定于递归查询的最明显的典型错误是在递归选择中定义错误停止条件,这会导致无限循环。



例如,如果我们在上面的阶乘样本中省略了其中n> 1 条件,则将永远不会执行递归选择给出一个空集(因为我们没有条件可以过滤出单行)并且循环将无限期地继续。



这是某些查询挂起的最可能原因(另一个非特定但仍然可能的原因是选择效果很差,它会在有限但很长的时间内执行)。



针对特定RECURSIVE的查询据我所知,要提到的准则。但是我想建议(相当明显)逐步的递归查询构建过程。




  • 分别构建和调试初始选择。


  • 用脚手架和RECURSIVE构造

    进行包装,然后开始构建和调试递归选择。




推荐的scuffolding结构如下:

 使用递归rec(<您的列名>)AS(
<您已准备就绪且可以工作的初始SELECT>
UNION ALL
<您现在正在调试的递归SELECT>

SELECT *从记录限制1000

此最简单的外部选择将输出整个外部记录集,众所周知,该记录集包含初始选择和重新执行选择的每次执行的所有输出行,并按其原始输出顺序循环执行-就像上面的示例一样! 限制1000 部分将防止挂起,而用超大输出代替,您可以在其中看到丢失的停止点。




  • 在调试初始和递归选择后,构建并调试外部选择。



现在最后要提到的是-在<$>中使用 union 而不是 union all 的区别c $ c>和递归子句。它引入了行唯一性约束,这在我们的执行伪代码中导致额外的两行:

  working-recordset = Initial-SELECT $的结果b 
$ b丢弃工作记录集中的重复行/ *特定于工会的* /

将工作记录集追加到空的外部记录集

while(working-记录集不为空)开始

新的工作记录集=递归选择的结果
将先前的工作记录集作为伪实体名称

丢弃重复的行以及从工作记录集/ *特定于工会* /

在外部记录集
中重复的行将工作记录集追加到外部记录集

end

整体结果=外部选择
的结果,将外部记录集作为伪实体名称


I have googled and read throug some articles like this postgreSQL manual page or this blog page and tried making queries myself with a moderate success (part of them hangs, while others works good and fast), but so far I can not completely understand how this magic works.

Can anybody give very clear explanation demonstrating such query semantics and execution process, better based on typical samples like factorial calculation or full tree expansion from (id,parent_id,name) table?

And what are the basic guidlines and typical mistakes that one should know to make good with recursive queries?

解决方案

First of all, let us try to simplify and clarify algorithm description given on the manual page. To simplify it consider only union all in with recursive clause for now (and union later):

WITH RECURSIVE pseudo-entity-name(column-names) AS (
    Initial-SELECT
UNION ALL
    Recursive-SELECT using pseudo-entity-name
)
Outer-SELECT using pseudo-entity-name

To clarify it let us describe query execution process in pseudo code:

working-recordset = result of Initial-SELECT

append working-recordset to empty outer-recordset

while( working-recordset is not empty ) begin

    new working-recordset = result of Recursive-SELECT 
        taking previous working-recordset as pseudo-entity-name

    append working-recordset to outer-recordset

end

overall-result = result of Outer-SELECT 
    taking outer-recordset as pseudo-entity-name

Or even shorter - Database engine executes initial select, taking its result rows as working set. Then it repeatedly executes recursive select on the working set, each time replacing contents of the working set with query result obtained. This process ends when empty set is returned by recursive select. And all result rows given firstly by initial select and then by recursive select are gathered and feeded to outer select, which result becomes overall query result.

This query is calculating factorial of 3:

WITH RECURSIVE factorial(F,n) AS (
    SELECT 1 F, 3 n
UNION ALL
    SELECT F*n F, n-1 n from factorial where n>1
)
SELECT F from factorial where n=1

Initial select SELECT 1 F, 3 n gives us initial values: 3 for argument and 1 for function value.
Recursive select SELECT F*n F, n-1 n from factorial where n>1 states that every time we need to multiply last funcion value by last argument value and decrement argument value.
Database engine executes it like this:

First of all it executes initail select, which gives the initial state of working recordset:

F | n
--+--
1 | 3

Then it transforms working recordset with recursive query and obtain its second state:

F | n
--+--
3 | 2

Then third state:

F | n
--+--
6 | 1

In the third state there is no row which follows n>1 condition in recursive select, so forth working set is loop exits.

Outer recordset now holds all the rows, returned by initial and recursive select:

F | n
--+--
1 | 3
3 | 2
6 | 1

Outer select filters out all intermediate results from outer recordset, showing only final factorial value which becomes overall query result:

F 
--
6

And now let us consider table forest(id,parent_id,name):

id | parent_id | name
---+-----------+-----------------
1  |           | item 1
2  | 1         | subitem 1.1
3  | 1         | subitem 1.2
4  | 1         | subitem 1.3
5  | 3         | subsubitem 1.2.1
6  |           | item 2
7  | 6         | subitem 2.1
8  |           | item 3

'Expanding full tree' here means sorting tree items in human-readable depth-first order while calculating their levels and (maybe) paths. Both tasks (of correct sorting and calculating level or path) are not solvable in one (or even any constant number of) SELECT without using WITH RECURSIVE clause (or Oracle CONNECT BY clause, which is not supported by PostgreSQL). But this recursive query does the job (well, almost does, see the note below):

WITH RECURSIVE fulltree(id,parent_id,level,name,path) AS (
    SELECT id, parent_id, 1 as level, name, name||'' as path from forest where parent_id is null
UNION ALL
    SELECT t.id, t.parent_id, ft.level+1 as level, t.name, ft.path||' / '||t.name as path
    from forest t, fulltree ft where t.parent_id = ft.id
)
SELECT * from fulltree order by path

Database engine executes it like this:

Firstly, it executes initail select, which gives all highest level items (roots) from forest table:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
1  |           | 1     | item 1           | item 1
8  |           | 1     | item 3           | item 3
6  |           | 1     | item 2           | item 2

Then, it executes recursive select, which gives all 2nd level items from forest table:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1
3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2
4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3
7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1

Then, it executes recursive select again, retrieving 3d level items:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1

And now it executes recursive select again, trying to retrieve 4th level items, but there are none of them, so the loop exits.

The outer SELECT sets the correct human-readable row order, sorting on path column:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
1  |           | 1     | item 1           | item 1
2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1
3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2
5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1
4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3
6  |           | 1     | item 2           | item 2
7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1
8  |           | 1     | item 3           | item 3

NOTE: Resulting row order will remain correct only while there are no punctuation characters collation-preceeding / in the item names. If we rename Item 2 in Item 1 *, it will break row order, standing between Item 1 and its descendants.
More stable solution is using tab character (E'\t') as path separator in query (which can be substituted by more readable path separator later: in outer select, before displaing to human or etc). Tab separated paths will retain correct order until there are tabs or control characters in the item names - which easily can be checked and ruled out without loss of usability.

It is very simple to modify last query to expand any arbitrary subtree - you need only to substitute condition parent_id is null with perent_id=1 (for example). Note that this query variant will return all levels and paths relative to Item 1.

And now about typical mistakes. The most notable typical mistake specific to recursive queries is defining ill stop conditions in recursive select, which results in infinite looping.

For example, if we omit where n>1 condition in factorial sample above, execution of recursive select will never give an empty set (because we have no condition to filter out single row) and looping will continue infinitely.

That is the most probable reason why some of your queries hang (the other non-specific but still possible reason is very ineffective select, which executes in finite but very long time).

There are not much RECURSIVE-specific querying guidlines to mention, as far as I know. But I would like to suggest (rather obvious) step by step recursive query building procedure.

  • Separately build and debug your initial select.

  • Wrap it with scaffolding WITH RECURSIVE construct
    and begin building and debugging your recursive select.

The recommended scuffolding construct is like this:

WITH RECURSIVE rec( <Your column names> ) AS (
    <Your ready and working initial SELECT>
UNION ALL
    <Recursive SELECT that you are debugging now>
)
SELECT * from rec limit 1000

This simplest outer select will output the whole outer recordset, which, as we know, contains all output rows from initial select and every execution of recusrive select in a loop in their original output order - just like in samples above! The limit 1000 part will prevent hanging, replacing it with oversized output in which you will be able to see the missed stop point.

  • After debugging initial and recursive select build and debug your outer select.

And now the last thing to mention - the difference in using union instead of union all in with recursive clause. It introduces row uniqueness constraint which results in two extra lines in our execution pseudocode:

working-recordset = result of Initial-SELECT

discard duplicate rows from working-recordset /*union-specific*/

append working-recordset to empty outer-recordset

while( working-recordset is not empty ) begin

    new working-recordset = result of Recursive-SELECT 
        taking previous working-recordset as pseudo-entity-name

    discard duplicate rows and rows that have duplicates in outer-recordset 
        from working-recordset /*union-specific*/

    append working-recordset to outer-recordset

end

overall-result = result of Outer-SELECT 
    taking outer-recordset as pseudo-entity-name

这篇关于如何使用WITH RECURSIVE子句进行选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆