查找生成林(有递归,PostgreSQL 9.5) [英] Finding the spanning forest (WITH RECURSIVE, PostgreSQL 9.5)
问题描述
我有一个 identities
(即别名)表,用于任意数量的人.每行都有一个先前的名称和一个新名称.在生产中,大约有100万行.例如:
I have a table of identities
(i.e. aliases) for an arbitrary number of people. Each row has a previous name and a new name. In production, there are about 1 M rows. For example:
id, old, new
---
1, 'Albert', 'Bob'
2, 'Bob', 'Charles'
3, 'Mary', 'Nancy'
4, 'Charles', 'Albert'
5, 'Lydia', 'Nancy'
6, 'Zoe', 'Zoe'
我想要的是生成 users
的列表,并引用它们各自的身份.这类似于在每个已连接身份图中查找所有节点,或查找生成林:
What I want is to generate the list of users
and reference all of their respective identities. This is analogous to finding all of the nodes in each graph of connected identities, or finding the spanning forest:
User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)
我一直在修改PostgreSQL的 WITH RECURSIVE
,但是它为每个生成了集合和子集.例如:
I've been tinkering with PostgreSQL's WITH RECURSIVE
, but it yields both sets and subsets for each. For example:
1,2,4 <-- spanning tree: good
2 <-- subset: discard
3,5 <-- spanning tree: good
4 <-- subset: discard
5 <-- subset: discard
6 <-- spanning tree: good
我只需要为每个用户生成完整的身份集(即生成树),该怎么做?
SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4以我的最新尝试.这是代码:
SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4 with my latest attempt. Here's the code:
WITH RECURSIVE search_graph AS (
SELECT id
, id AS min_id
, ARRAY[id] AS path
, ARRAY[old,new] AS emails
FROM identities
UNION
SELECT identities.id
, LEAST(identities.id, sg.min_id)
, (sg.path || identities.id)
, (sg.emails || identities.old || identities.new)
FROM search_graph sg
JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
WHERE identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;
结果:
1,2,4
2
3,5
4
5
6
推荐答案
我不久前写了一个类似问题的答案:如何查找无向图的所有连接子图.在这个问题中,我使用了SQL Server.有关中间CTE的详细说明,请参见该答案.我将该查询改编为Postgres.
I wrote an answer to a similar question a while ago: How to find all connected subgraphs of an undirected graph. In that question I used SQL Server. See that answer for detailed explanation of intermediate CTEs. I adapted that query to Postgres.
使用Postgres数组功能而不是将路径串联到 text
列中,可以更高效地编写代码.
It may be written more efficiently using Postgres array feature instead of concatenating the path into a text
column.
WITH RECURSIVE
CTE_Idents
AS
(
SELECT old AS Ident
FROM identities
UNION
SELECT new AS Ident
FROM identities
)
,CTE_Pairs
AS
(
SELECT old AS Ident1, new AS Ident2
FROM identities
WHERE old <> new
UNION
SELECT new AS Ident1, old AS Ident2
FROM identities
WHERE old <> new
)
,CTE_Recursive
AS
(
SELECT
CTE_Idents.Ident AS AnchorIdent
, Ident1
, Ident2
, ',' || Ident1 || ',' || Ident2 || ',' AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
UNION ALL
SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CTE_Recursive.IdentPath || CTE_Pairs.Ident2 || ',' AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 || ',%')
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult
UNION
SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
SELECT
CTE_Idents.Ident
,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)
ORDER BY COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
FROM
CTE_Idents
LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;
我在示例数据中添加了一行(7,X,Y)
.
I added one row (7,X,Y)
to your sample data.
结果
| allidents |
|--------------------|
| Lydia,Mary,Nancy |
| Albert,Bob,Charles |
| X,Y |
| Zoe |
这篇关于查找生成林(有递归,PostgreSQL 9.5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!