在SQL中实现不相交集近似（联合查找） [英] Implementing Disjoint Set Approximation (Union Find) in SQL

查看：272 发布时间：2020/5/30 0:48:11 sql postgresql graph-theory

本文介绍了在SQL中实现不相交集近似（联合查找）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用SQL实现近似不相交集的最佳方法是什么？

What would be the best way to implement approximate Disjoint Sets using SQL?

详细信息

我有一个边表，存储为 [vertex_a，vertex_b] 的两列表。

I have a table of edges, stored as a two-column table of [vertex_a, vertex_b].

我需要一张包含不同集合的表，存储为 [vertex，set_id] ，每个顶点一行，并用不相交的set_id标记每个顶点。

I need a table of distinct sets, stored as [vertex, set_id] with one row per vertex, labeling each vertex with a disjoint set_id.

约束

必须是纯SQL实现。它可以利用Postgres特有的功能，但是最好使用纯ANSI SQL。

结果可能是近似的-可以在实际连接一些集合时将它们标记为不相交。最好调整近似范围，例如通过增加迭代次数。

没有库（没有Boost，Numpy，Scipy）。必须是SQL。

大多数集合将包含1到3个顶点。大集合很少，预计最大为10个顶点。

Must be a purely SQL implementation. It can leverage Postgres-specific functions, but pure ANSI SQL highly preferred.
The result can be approximate- it's acceptable to label a few sets as disjoint when they're actually connected. It's even better if the approximation bounds can be adjusted- by increasing iterations for example.
Libraries are out (no Boost, Numpy, Scipy). Must be SQL.
Most sets will contain 1 to 3 vertices. Very few large sets, expected max to be 10 vertices.

推荐答案

此纯sql代码在5分钟内（约8核/ 32 gb ram）将大约35000条记录分组。

This pure sql code grouped about 35000 records in 5 minutes (8 cores/32 gb ram). Enjoy.

--table with RELATIONS, idea is to place every related item in a bucket
create table RELATIONS
(
    bucket int,        -- initially 0
    bucketsub int,    -- initially 0
    relnr1 float,    
    relnr2 float    -- relnr1 = a, relnr2 = b means a and b are related
)

create table ##BucketRelnrs ( relnr float ); --table functions as temp list
declare @bucket int; 
declare @bucketsub int;
declare @nrOfUpdates int;
declare @relnr float;

set @bucket=0;
set @relnr=0;
WHILE @relnr>=0 and @bucket<50000 --to prevent the while loop from hanging.
BEGIN
    set @bucket = @bucket+1
    set @bucketsub=1;

    set @relnr = (select isnull(min(relnr1),-1) from RELATIONS where bucket=0); --pick the smallest relnr that has not been assigned a bucket yet
    set @nrOfUpdates = (select count(*) from RELATIONS where bucket=0 and (relnr1=@relnr or relnr2=@relnr));
    update RELATIONS set bucket=@bucket, bucketsub=@bucketsub where bucket=0 and (relnr1=@relnr or relnr2=@relnr);
    set @bucketsub = @bucketsub+1;

    WHILE @nrOfUpdates>0 and @bucketsub<=10    --to prevent the inner while loop from hanging, actually determines the number of iterations
    BEGIN
        --refill temp list with newly found related relnrs
        truncate table ##BucketRelnrs;
        insert into ##BucketRelnrs
        select distinct relnr1 from RELATIONS where bucket=@bucket
        union select distinct relnr2 from RELATIONS where bucket=@bucket;

        --calculate the number of relations that will be updated next, if zero then stop iteration
        set @nrOfUpdates =
        (
            select count(*) from RELATIONS where bucket=0
            and (relnr1 in (select relnr from ##BucketRelnrs) or relnr2 in (select relnr from ##BucketRelnrs))
        );

        --update the RELATIONS table
        update RELATIONS set bucket=@bucket, bucketsub=@bucketsub where bucket=0
        and (relnr1 in (select relnr from ##BucketRelnrs) or relnr2 in (select relnr from ##BucketRelnrs));

        set @bucketsub = @bucketsub+1;
    END
END

drop table ##BucketRelnrs; --clean temp table

这篇关于在SQL中实现不相交集近似（联合查找）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在SQL中实现不相交集近似（联合查找） [英] Implementing Disjoint Set Approximation (Union Find) in SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在SQL中实现不相交集近似（联合查找） [英] Implementing Disjoint Set Approximation (Union Find) in SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭