查找生成林(有递归,PostgreSQL 9.5) [英] Finding the spanning forest (WITH RECURSIVE, PostgreSQL 9.5)

查看:45
本文介绍了查找生成林(有递归,PostgreSQL 9.5)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 identities (即别名)表,用于任意数量的人.每行都有一个先前的名称和一个新名称.在生产中,大约有100万行.例如:

I have a table of identities (i.e. aliases) for an arbitrary number of people. Each row has a previous name and a new name. In production, there are about 1 M rows. For example:

id, old, new
---
1, 'Albert', 'Bob'
2, 'Bob', 'Charles'
3, 'Mary', 'Nancy'
4, 'Charles', 'Albert'
5, 'Lydia', 'Nancy'
6, 'Zoe', 'Zoe'

我想要的是生成 users 的列表,并引用它们各自的身份.这类似于在每个已连接身份图中查找所有节点,或查找生成林:

What I want is to generate the list of users and reference all of their respective identities. This is analogous to finding all of the nodes in each graph of connected identities, or finding the spanning forest:

User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)

我一直在修改PostgreSQL的 WITH RECURSIVE ,但是它为每个生成了集合和子集.例如:

I've been tinkering with PostgreSQL's WITH RECURSIVE, but it yields both sets and subsets for each. For example:

1,2,4 <-- spanning tree: good
2     <-- subset: discard
3,5   <-- spanning tree: good
4     <-- subset: discard
5     <-- subset: discard
6     <-- spanning tree: good

我只需要为每个用户生成完整的身份集(即生成树),该怎么做?

SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4以我的最新尝试.这是代码:

SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4 with my latest attempt. Here's the code:

WITH RECURSIVE search_graph AS (
   SELECT id
    , id AS min_id
    , ARRAY[id] AS path
    , ARRAY[old,new] AS emails
   FROM   identities

   UNION 

   SELECT identities.id
    , LEAST(identities.id, sg.min_id)
    , (sg.path || identities.id)
    , (sg.emails || identities.old || identities.new)

   FROM search_graph sg
   JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
   WHERE  identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;

结果:

1,2,4
2
3,5
4
5
6

推荐答案

我不久前写了一个类似问题的答案:如何查找无向图的所有连接子图.在这个问题中,我使用了SQL Server.有关中间CTE的详细说明,请参见该答案.我将该查询改编为Postgres.

I wrote an answer to a similar question a while ago: How to find all connected subgraphs of an undirected graph. In that question I used SQL Server. See that answer for detailed explanation of intermediate CTEs. I adapted that query to Postgres.

使用Postgres数组功能而不是将路径串联到 text 列中,可以更高效地编写代码.

It may be written more efficiently using Postgres array feature instead of concatenating the path into a text column.

WITH RECURSIVE
CTE_Idents
AS
(
    SELECT old AS Ident
    FROM identities

    UNION

    SELECT new AS Ident
    FROM identities
)
,CTE_Pairs
AS
(
    SELECT old AS Ident1, new AS Ident2
    FROM identities
    WHERE old <> new

    UNION

    SELECT new AS Ident1, old AS Ident2
    FROM identities
    WHERE old <> new
)
,CTE_Recursive
AS
(
    SELECT
        CTE_Idents.Ident AS AnchorIdent 
        , Ident1
        , Ident2
        , ',' || Ident1 || ',' || Ident2 || ',' AS IdentPath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent 
        , CTE_Pairs.Ident1
        , CTE_Pairs.Ident2
        , CTE_Recursive.IdentPath || CTE_Pairs.Ident2 || ',' AS IdentPath
        , CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 || ',%')
)
,CTE_RecursionResult
AS
(
    SELECT AnchorIdent, Ident1, Ident2
    FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
    SELECT AnchorIdent, Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent, Ident2 AS Ident
    FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
  SELECT
    CTE_Idents.Ident
    ,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident) 
        ORDER BY COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
  FROM
    CTE_Idents
    LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
  GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;

我在示例数据中添加了一行(7,X,Y).

I added one row (7,X,Y) to your sample data.

SQL提琴

结果

|          allidents |
|--------------------|
|   Lydia,Mary,Nancy |
| Albert,Bob,Charles |
|                X,Y |
|                Zoe |

这篇关于查找生成林(有递归,PostgreSQL 9.5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆