如何找到无向图的所有连通子图 [英] How to find all connected subgraphs of an undirected graph

查看：87 发布时间：2021/8/25 18:47:09 sql sql-server sql-server-2008

本文介绍了如何找到无向图的所有连通子图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一些帮助来解决我正在努力解决的问题.

I need some help for a problem that i am struggling to solve.

示例表:

ID |Identifier1 | Identifier2 
---------------------------------
1  |      a     | c         
2  |      b     | f         
3  |      a     | g         
4  |      c     | h        
5  |      b     | j         
6  |      d     | f         
7  |      e     | k  
8  |      i     |          
9  |      l     | h

我想对两列之间相互关联的标识符进行分组，并分配一个唯一的组 ID.

I want to group identifiers that are related with each other between two columns and assign a unique group id.

期望的输出:

Identifier | Gr_ID    |    Gr.Members                 
---------------------------------------------------
a       |      1      |   (a,c,g,h,l)  
b       |      2      |   (b,d,f,j)       
c       |      1      |   (a,c,g,h,l)  
d       |      2      |   (b,d,f,j)       
e       |      3      |   (e,k)                 
f       |      2      |   (b,d,f,j)       
g       |      1      |   (a,c,g,h,l)  
h       |      1      |   (a,c,g,h,l)  
j       |      2      |   (b,d,f,j)       
k       |      3      |   (e,k)                 
l       |      1      |   (a,c,g,h,l)  
i       |      4      |   (i)

注意:Gr.Members 列不是必须的，主要是为了更清晰的查看.

Note:the column Gr.Members is not necessary, mostly is used for a clearer view.

所以组的定义是:如果一行属于一个组与该组的至少一行共享至少一个标识符

So the definition for a group is: A row belongs to a group if it shares at least one identifier with at least one row of this group

但是必须将组 ID 分配给每个标识符(由两列的并集选择)而不是分配给行.

But the group id has to be assigned to each identifier(selected by the union of the two columns) not to the row.

有关如何构建查询以提供所需输出的任何帮助?

Any help on how to build a query to give the desired output?

谢谢.

更新:以下是一些额外的样本集及其预期输出.

Update: Below are some extra sample sets with their expected output.

给定表格:

Identifier1 | Identifier2   
----------------------------
    a       |   f
    a       |   g
    a       |  NULL
    b       |   c
    b       |   a
    b       |   h
    b       |   j
    b       |  NULL
    b       |  NULL
    b       |   g
    c       |   k
    c       |   b
    d       |   l
    d       |   f
    d       |   g
    d       |   m
    d       |   a
    d       |  NULL
    d       |   a
    e       |   c
    e       |   b
    e       |  NULL

预期输出:所有记录都应该属于同一个组，组 ID = 1.

Expected output: all the records should belong to the same group with group ID = 1.

给定表:

Identifier1 | Identifier2
--------------------------
a           |   a
b           |   b
c           |   a
c           |   b
c           |   c

预期输出:记录应该在同一个组中，组 ID = 1.

Expected output: The records should be in the same group with group ID = 1.

推荐答案

这是一个不使用游标，而是使用单个递归查询的变体.

Here is a variant that doesn't use cursor, but uses a single recursive query.

本质上，它将数据视为图中的边并递归遍历图的所有边，在检测到循环时停止.然后它将所有找到的循环放在组中，并给每个组一个编号.

Essentially, it treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected. Then it puts all found loops in groups and gives each group a number.

请参阅下面有关其工作原理的详细说明.我建议您运行查询 CTE-by-CTE 并检查每个中间结果以了解它的作用.

See the detailed explanations of how it works below. I recommend you to run the query CTE-by-CTE and examine each intermediate result to understand what it does.

示例 1

DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(2, 'b', 'b'),
(3, 'c', 'a'),
(4, 'c', 'b'),
(5, 'c', 'c');

示例 2

我添加了一个带有 z 值的行，以便有多个带有未配对值的行.

I added one more row with z value to have multiple rows with unpaired values.

DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(1, 'a', 'c'),
(2, 'b', 'f'),
(3, 'a', 'g'),
(4, 'c', 'h'),
(5, 'b', 'j'),
(6, 'd', 'f'),
(7, 'e', 'k'),
(8, 'i', NULL),
(88, 'z', 'z'),
(9, 'l', 'h');

示例 3

DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'f'),
(2, 'a', 'g'),
(3, 'a', NULL),
(4, 'b', 'c'),
(5, 'b', 'a'),
(6, 'b', 'h'),
(7, 'b', 'j'),
(8, 'b', NULL),
(9, 'b', NULL),
(10, 'b', 'g'),
(11, 'c', 'k'),
(12, 'c', 'b'),
(13, 'd', 'l'),
(14, 'd', 'f'),
(15, 'd', 'g'),
(16, 'd', 'm'),
(17, 'd', 'a'),
(18, 'd', NULL),
(19, 'd', 'a'),
(20, 'e', 'c'),
(21, 'e', 'b'),
(22, 'e', NULL);

查询

WITH
CTE_Idents
AS
(
    SELECT Ident1 AS Ident
    FROM @T

    UNION

    SELECT Ident2 AS Ident
    FROM @T
)
,CTE_Pairs
AS
(
    SELECT Ident1, Ident2
    FROM @T
    WHERE Ident1 <> Ident2

    UNION

    SELECT Ident2 AS Ident1, Ident1 AS Ident2
    FROM @T
    WHERE Ident1 <> Ident2
)
,CTE_Recursive
AS
(
    SELECT
        CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent 
        , Ident1
        , Ident2
        , CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent 
        , CTE_Pairs.Ident1
        , CTE_Pairs.Ident2
        , CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
        , CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
    SELECT AnchorIdent, Ident1, Ident2
    FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
    SELECT AnchorIdent, Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent, Ident2 AS Ident
    FROM CTE_RecursionResult
)
SELECT
    CTE_Idents.Ident
    ,CASE WHEN CA_Data.XML_Value IS NULL 
    THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
    ,DENSE_RANK() OVER(ORDER BY 
        CASE WHEN CA_Data.XML_Value IS NULL 
        THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
    ) AS GroupID
FROM
    CTE_Idents
    CROSS APPLY
    (
        SELECT CTE_CleanResult.Ident+','
        FROM CTE_CleanResult
        WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
        ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
    ) AS CA_XML(XML_Value)
    CROSS APPLY
    (
        SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
    ) AS CA_Data(XML_Value)
WHERE
    CTE_Idents.Ident IS NOT NULL
ORDER BY Ident;

结果 1

+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a     | a,b,c,       |       1 |
| b     | a,b,c,       |       1 |
| c     | a,b,c,       |       1 |
+-------+--------------+---------+

结果 2

+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a     | a,c,g,h,l,   |       1 |
| b     | b,d,f,j,     |       2 |
| c     | a,c,g,h,l,   |       1 |
| d     | b,d,f,j,     |       2 |
| e     | e,k,         |       3 |
| f     | b,d,f,j,     |       2 |
| g     | a,c,g,h,l,   |       1 |
| h     | a,c,g,h,l,   |       1 |
| i     | i            |       4 |
| j     | b,d,f,j,     |       2 |
| k     | e,k,         |       3 |
| l     | a,c,g,h,l,   |       1 |
| z     | z            |       5 |
+-------+--------------+---------+

结果 3

+-------+--------------------------+---------+
| Ident |       GroupMembers       | GroupID |
+-------+--------------------------+---------+
| a     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| b     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| c     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| d     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| e     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| f     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| g     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| h     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| j     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| k     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| l     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
| m     | a,b,c,d,e,f,g,h,j,k,l,m, |       1 |
+-------+--------------------------+---------+

工作原理

我将使用第二组示例数据进行说明.

How it works

I'll use the second set of sample data for this explanation.

CTE_Idents

CTE_Idents 给出出现在 Ident1 和 Ident2 列中的所有标识符的列表.由于它们可以以任何顺序出现，我们UNION 将两列放在一起.UNION 还会删除任何重复项.

CTE_Idents gives the list of all Identifiers that appear in both Ident1 and Ident2 columns. Since they can appear in any order we UNION both columns together. UNION also removes any duplicates.

+-------+
| Ident |
+-------+
| NULL  |
| a     |
| b     |
| c     |
| d     |
| e     |
| f     |
| g     |
| h     |
| i     |
| j     |
| k     |
| l     |
| z     |
+-------+

CTE_Pairs

CTE_Pairs 给出图在两个方向上的所有边的列表.同样，UNION 用于删除任何重复项.

CTE_Pairs gives the list of all edges of the graph in both directions. Again, UNION is used to remove any duplicates.

+--------+--------+
| Ident1 | Ident2 |
+--------+--------+
| a      | c      |
| a      | g      |
| b      | f      |
| b      | j      |
| c      | a      |
| c      | h      |
| d      | f      |
| e      | k      |
| f      | b      |
| f      | d      |
| g      | a      |
| h      | c      |
| h      | l      |
| j      | b      |
| k      | e      |
| l      | h      |
+--------+--------+

CTE_Recursive

CTE_Recursive 是查询的主要部分，它从每个唯一标识符开始递归遍历图.这些起始行由 UNION ALL 的第一部分产生.UNION ALL 的第二部分递归地连接到自身，将 Ident2 链接到 Ident1.由于我们预先制作了 CTE_Pairs 所有边都写在两个方向上，所以我们总是可以只将 Ident2 链接到 Ident1 并且我们将获得所有路径在图中.同时，查询构建 IdentPath - 到目前为止已遍历的以逗号分隔的标识符字符串.它用于 WHERE 过滤器:

CTE_Recursive is the main part of the query that recursively traverses the graph starting from each unique Identifier. These starting rows are produced by the first part of UNION ALL. The second part of UNION ALL recursively joins to itself linking Ident2 to Ident1. Since we pre-made CTE_Pairs with all edges written in both directions, we can always link only Ident2 to Ident1 and we'll get all paths in the graph. At the same time the query builds IdentPath - a string of comma-delimited Identifiers that have been traversed so far. It is used in the WHERE filter:

CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))

一旦我们遇到之前包含在路径中的标识符，递归就会停止，因为连接的节点列表已用完.AnchorIdent 是递归的起始标识符，稍后将用于对结果进行分组.Lvl 并没有真正使用，我把它包括进来是为了更好地理解正在发生的事情.

As soon as we come across the Identifier that had been included in the Path before, the recursion stops as the list of connected nodes is exhausted. AnchorIdent is the starting Identifier for the recursion, it will be used later to group results. Lvl is not really used, I included it for better understanding of what is going on.

+-------------+--------+--------+-------------+-----+
| AnchorIdent | Ident1 | Ident2 |  IdentPath  | Lvl |
+-------------+--------+--------+-------------+-----+
| a           | a      | c      | ,a,c,       |   1 |
| a           | a      | g      | ,a,g,       |   1 |
| b           | b      | f      | ,b,f,       |   1 |
| b           | b      | j      | ,b,j,       |   1 |
| c           | c      | a      | ,c,a,       |   1 |
| c           | c      | h      | ,c,h,       |   1 |
| d           | d      | f      | ,d,f,       |   1 |
| e           | e      | k      | ,e,k,       |   1 |
| f           | f      | b      | ,f,b,       |   1 |
| f           | f      | d      | ,f,d,       |   1 |
| g           | g      | a      | ,g,a,       |   1 |
| h           | h      | c      | ,h,c,       |   1 |
| h           | h      | l      | ,h,l,       |   1 |
| j           | j      | b      | ,j,b,       |   1 |
| k           | k      | e      | ,k,e,       |   1 |
| l           | l      | h      | ,l,h,       |   1 |
| l           | h      | c      | ,l,h,c,     |   2 |
| l           | c      | a      | ,l,h,c,a,   |   3 |
| l           | a      | g      | ,l,h,c,a,g, |   4 |
| j           | b      | f      | ,j,b,f,     |   2 |
| j           | f      | d      | ,j,b,f,d,   |   3 |
| h           | c      | a      | ,h,c,a,     |   2 |
| h           | a      | g      | ,h,c,a,g,   |   3 |
| g           | a      | c      | ,g,a,c,     |   2 |
| g           | c      | h      | ,g,a,c,h,   |   3 |
| g           | h      | l      | ,g,a,c,h,l, |   4 |
| f           | b      | j      | ,f,b,j,     |   2 |
| d           | f      | b      | ,d,f,b,     |   2 |
| d           | b      | j      | ,d,f,b,j,   |   3 |
| c           | h      | l      | ,c,h,l,     |   2 |
| c           | a      | g      | ,c,a,g,     |   2 |
| b           | f      | d      | ,b,f,d,     |   2 |
| a           | c      | h      | ,a,c,h,     |   2 |
| a           | h      | l      | ,a,c,h,l,   |   3 |
+-------------+--------+--------+-------------+-----+

CTE_CleanResult

CTE_CleanResult 只留下 CTE_Recursive 的相关部分，并再次使用 UNION 合并 Ident1 和 Ident2.


CTE_CleanResult leaves only relevant parts from CTE_Recursive and again merges both Ident1 and Ident2 using UNION.
+-------------+-------+
| AnchorIdent | Ident |
+-------------+-------+
| a           | a     |
| a           | c     |
| a           | g     |
| a           | h     |
| a           | l     |
| b           | b     |
| b           | d     |
| b           | f     |
| b           | j     |
| c           | a     |
| c           | c     |
| c           | g     |
| c           | h     |
| c           | l     |
| d           | b     |
| d           | d     |
| d           | f     |
| d           | j     |
| e           | e     |
| e           | k     |
| f           | b     |
| f           | d     |
| f           | f     |
| f           | j     |
| g           | a     |
| g           | c     |
| g           | g     |
| g           | h     |
| g           | l     |
| h           | a     |
| h           | c     |
| h           | g     |
| h           | h     |
| h           | l     |
| j           | b     |
| j           | d     |
| j           | f     |
| j           | j     |
| k           | e     |
| k           | k     |
| l           | a     |
| l           | c     |
| l           | g     |
| l           | h     |
| l           | l     |
+-------------+-------+

最终选择
现在我们需要为每个 AnchorIdent 构建一串以逗号分隔的 Ident 值.CROSS APPLY 和 FOR XML 可以做到.DENSE_RANK() 计算每个 AnchorIdent 的 GroupID 数字.
Now we need to build a string of comma-separated Ident values for each AnchorIdent. 
CROSS APPLY with FOR XML does it.
DENSE_RANK() calculates the GroupID numbers for each AnchorIdent.

                        这篇关于如何找到无向图的所有连通子图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何找到无向图的所有连通子图 [英] How to find all connected subgraphs of an undirected graph

问题描述

推荐答案

工作原理

How it works

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

如何找到无向图的所有连通子图 [英] How to find all connected subgraphs of an undirected graph

问题描述

推荐答案

工作原理

How it works

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭