如何创建存储过程以在用户之间的连接表中查找团体 [英] How to create a stored procedure to find cliques in the table of connections between users

查看:36
本文介绍了如何创建存储过程以在用户之间的连接表中查找团体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找一种从大型数据集中检索社区的方法,我遇到了一篇有关算法的文章,该算法似乎适用于大型数据集.无论如何,数据存储在两个表中:用户(节点)和连接,我想通过纯sql查询来检索社区,而无需自定义应用程序的帮助(我正在使用SQL Server 2008).

Loooking for a way to retrieve community from a large dataset I came across an article about the algorithm which seems to be apropriate for large datasets. Anyway the data is stored two tables: users (nodes) and connections and I would like to retrieve the communities by pure sql queries without help of custom applications (I'm using SQL Server 2008).

检索 cliques 的算法如下:

Read the graph G
Generate set neighbors(v) for every vertex of G
for each vertex v of G
call recursive_find_cliques(v, neighbors(v))
end for

Function recursive_find_cliques(x, n)
for each vertex t ∈ n by ascending order calculate set sigma
if sigma is not empty
extend x with t
call recursive_find_cliques(x, sigma)
end if
end for

其中sigma是一组顶点,可以与v及其相邻像素组成三角形.

where sigma is the set of vertices that could constitute triangles with v and its neighbors.

我已经创建了一个存储过程,该存储过程返回所选节点的邻居表,但是到目前为止,我还没有使用sql函数和高级查询,因此,问题如下:

I already created a stored procedure which returns a table of neighbors of selected node but so far I haven't delat with sql functions and advanced queries so the question is the following:

有人知道如何重写上面的SQL中的算法,以获得这套集团?作为问题可能有点抽象,我可能指出主要问题是创建一个递归函数(recursive_find_cliques(x,n))其中以表(n)作为参数).

Does anyone know how to rewrite the algorithm above in sql in order to get the set of cliques? As the question might be a little abstract, I may point out that the main problem is to create a recursive function (recursive_find_cliques(x, n)) which takes a table (n) as an argument).

谢谢!

这是到目前为止创建的存储过程:

Here is sthe stored procedure created so far:

CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN

SET XACT_ABORT ON
BEGIN TRAN

SET NOCOUNT ON;

CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)

CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)

CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)

CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)

DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int

INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId

UPDATE #Users SET Observed = 1 WHERE userLabel = @User

--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques

END

SELECT * FROM #Cliques

END

它还没有返回任何东西,因为它还没有完成.尽管它会为当前选定的节点检索所有邻居,然后下一步是实现recursive_find_cliques函数.

It does'not return anything yet as it is not finished. It though retrieves all neighbors for the currently selected nodes and the next step is to implement recursive_find_cliques function.

推荐答案

我意识到,只有当每个集团都有至少一个该集团中其他任何人都没有提及的用户时,我的第一个答案才有效.换句话说,将找不到诸如A-B,B-C,C-A之类的封闭集团.

I realised that my first answer only works when each clique has at least one user who is not referred to by any others in that clique. In other words, closed cliques like A-B, B-C, C-A will not be found.

这是解决此问题的解决方案.同样,我们有ID为1..20的用户.邻居关系有几种情况需要处理:

Here is a solution which solves this. Again we have users with IDs, now 1..20. There are several cases of neighbouring relations that need to be handled:

与简单情况相比,很难为每个集团找到一个独特的启动器.我们只需一点技巧即可实现:

Compared to the simple case, it is harder to find a unique starter for each clique. We achieve this with a little sleight of hand:

  • 对邻居进行重新排序,以便对于所有引用A-B,A小于B,而忽略任何A = B.

  • Reorder the neighbours so that for all references A-B, A is less than B, ignoring any A=B.

如果有可能导致循环的X-A,请从其中删除所有A-X引用.这将永远不会完全删除对A的引用,因为保留了X-A并将A-X添加到递归中.

From these, remove any A-X references if there are any X-A, which could cause a loop. This will never remove references to A completely because X-A remains and A-X will be added in the recursion.

结果集是开始"用户,我们使用它们来填充CTE:

The resultant set are the 'starting' users and we use them to prime the CTE:

-- Get all pairs, where UserA < UserB, dropping any A=B or B=A
WITH LRNeighbours(A, B) AS (
    SELECT
        Neighbours.UserA, Neighbours.UserB
    FROM
        Neighbours
    WHERE
        Neighbours.UserA < Neighbours.UserB
UNION ALL
    SELECT DISTINCT
        Neighbours.UserB, Neighbours.UserA
    FROM
        Neighbours
    WHERE
        Neighbours.UserA > Neighbours.UserB
),
-- Isolate those that are not referred to by a higher numbered key
Starters(userid) AS (
    SELECT DISTINCT
        A
    FROM    
        LRNeighbours
    WHERE 
        A NOT IN (
            SELECT 
                B
            FROM
                LRNeighbours
        )
),
-- The recursive Common Table Expression
cliques(userid, clique) AS (
    -- Number starters 1..N
    SELECT  
        userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
    FROM
        Starters
UNION ALL
    -- Recurse, adding users referred by siblings, avoiding starters themselves
    SELECT  
        B, clique 
    FROM
        LRNeighbours INNER JOIN 
        cliques ON
            LRNeighbours.A = cliques.userid 
            AND B NOT IN (
                SELECT
                    userid
                FROM
                    starters
            )
)
SELECT DISTINCT
    clique, userid 
FROM 
    cliques 
ORDER BY 
    clique, userid 

结果:

1   1
1   2
2   3
2   4
3   5
3   6
3   7
3   8
4   9
4   10
4   11
4   12
4   13
5   14
5   15
5   16
5   17
5   18
5   19
5   20

这篇关于如何创建存储过程以在用户之间的连接表中查找团体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆