SQL分组交叉/重叠行 [英] SQL grouping interescting/overlapping rows

查看:155
本文介绍了SQL分组交叉/重叠行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Postgres中有一张下表,其中两列 a_sno b_sno 中有重叠的数据。



 创建表数据
(a_sno整数不为null,
b_sno整数不为null,
PRIMARY KEY (a_sno,b_sno)
);

插入数据(a_sno,b_sno)值
(4,5)
,(5,4)
,(5,6)
,(6,5)
,(6,7)
,(7,6)
,(9,10)
,(9,13)
,(10,9)
,(13,9)
,(10,13)
,(13,10)
,(10,14)
,(14,10)
,(13,14)
,(14,13)
,(11,15)
,(15,11);

从前6行中可以看到两个数据值4,5,6和7列相交/重叠,需要划分为一组。第7-16行和第17-18行将分别标记为第2组和第3行。



结果输出应如下所示:

  group |价值
------ + ------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15


解决方案

假定所有对都以它们的镜像组合形式存在(4,5)(5,4)。但是以下解决方案也可以在没有镜像重复对象的情况下工作。



简单情况



所有连接都可以排列单个升序和类似的复杂性,例如我在 fiddle

中添加的不可能,我们可以在rCTE中使用没有重复项的解决方案:



我首先获得最小的 a_sno 每组,并具有最小关联的 b_sno

  SELECT row_number( )OVER(ORDER BY a_sno)AS grp 
,a_sno,min(b_sno)AS b_sno
FROM data d
WHERE a_sno< b_sno
并且不存在(
从数据
中选择1 b_sno = d.a_sno
AND a_sno< b_sno

由a_sno进行分组;

这只需要一个查询级别,因为窗口函数可以基于聚合:





结果:

  grp a_sno b_sno 
1 4 5
2 9 10
3 11 15

我避免使用分支和重复的(重复的)行-长链可能会更贵。我在相关子查询中使用 ORDER BY b_sno LIMIT 1 使其在递归CTE中运行。





性能的关键是匹配索引,PK约束已经提供了该索引主键(a_sno,b_sno):不是相反的方式 (b_sno,a_sno)







 以递归t AS(
SELECT row_number()OVER(OR DER d.a_sno)AS AS grp
,a_sno,min(b_sno)AS b_sno- -最小的
FROM data d
WHERE a_sno< b_sno
AND NOT EXISTS(
从dat选择1 a
其中b_sno = d.a_sno
AND a_sno< b_sno

GROUP BY a_sno


,cte AS(
SELECT grp,b_sno AS sno FROM t

UNION ALL
SELECT c.grp
,(SELECT b_sno-相关子查询
从数据
那里a_sno = c.sno
AND a_sno< b_sno
按b_sno
的限制1)
从cte c
c.sno不为空

选择*从cte
在sno不为NULL-用NULL消除行
UNION ALL-无重复
SELECT grp,a_sno FROM t
ORDER BY grp,sno;



简单案例



所有节点都可以从根开始有一个或多个分支(最小的 sno )以升序到达。



这次,得到 all 个更大的 sno 并删除重复的节点,这些节点可以通过 UNION 进行多次访问最后:





 
SELECT rank()OVER(ORDER BY d.a_sno)AS grp
,a_sno,b_sno-从数据d
中获取最小a_sno
的所有行a_sno< b_sno
并且不存在(
从数据
中选择1 b_sno = d.a_sno
AND a_sno< b_sno


,cte AS(
SELECT grp,b_sno AS sno FROM t

UNION ALL
SELECT c.grp,d.b_sno
FROM cte c
JOIN数据d ON d.a_sno = c.sno
AND d.a_sno< d.b_sno-连接到所有连接的行

SELECT grp,sno FROM cte
UNION-消除重复
SELECT grp,a_sno FROM t-添加第一行
ORDER BY grp,sno;

与第一个解决方案不同,我们在这里没有得到最后一行为NULL(由相关的子查询)。



两者的效果都很好-特别是在长链/许多分支的情况下。所需结果:



SQL Fiddle (添加行来演示难度)。



无向图



如果存在无法通过递增遍历从根部获得的局部最小值,则上述解决方案将不起作用。在这种情况下,请考虑法兴的解决方案


I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.

create table data
( a_sno integer not null,  
  b_sno integer not null,
  PRIMARY KEY (a_sno,b_sno)
);

insert into data (a_sno,b_sno) values
  ( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);

As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.

The resulting output should look like this:

group | value
------+------
1     | 4
1     | 5
1     | 6
1     | 7
2     | 9
2     | 10
2     | 13
2     | 14
3     | 11
3     | 15

解决方案

Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.

Simple case

All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:

I start by getting minimum a_sno per group, with the minimum associated b_sno:

SELECT row_number() OVER (ORDER BY a_sno) AS grp
     , a_sno, min(b_sno) AS b_sno
FROM   data d
WHERE  a_sno < b_sno
AND    NOT EXISTS (
   SELECT 1 FROM data
   WHERE  b_sno = d.a_sno
   AND    a_sno < b_sno
   )
GROUP  BY a_sno;

This only needs a single query level since a window function can be built on an aggregate:

Result:

grp  a_sno  b_sno
1    4      5
2    9      10
3    11     15

I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.

Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round (b_sno, a_sno):

WITH RECURSIVE t AS (
   SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
        , a_sno, min(b_sno) AS b_sno  -- the smallest one
   FROM   data d
   WHERE  a_sno < b_sno
   AND    NOT EXISTS (
      SELECT 1 FROM data
      WHERE  b_sno = d.a_sno
      AND    a_sno < b_sno
      )
   GROUP  BY a_sno
   )

, cte AS (
   SELECT grp, b_sno AS sno FROM t

   UNION ALL
   SELECT c.grp
       , (SELECT b_sno  -- correlated subquery
          FROM   data
          WHERE  a_sno = c.sno
          AND    a_sno < b_sno
          ORDER  BY b_sno
          LIMIT  1)
   FROM   cte  c
   WHERE  c.sno IS NOT NULL
   )
SELECT * FROM cte
WHERE  sno IS NOT NULL   -- eliminate row with NULL
UNION  ALL               -- no duplicates
SELECT grp, a_sno FROM t
ORDER  BY grp, sno;

Less simple case

All nodes can be reached in ascending order with one or more branches from the root (smallest sno).

This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:

WITH RECURSIVE t AS (
   SELECT rank() OVER (ORDER BY d.a_sno) AS grp
        , a_sno, b_sno  -- get all rows for smallest a_sno
   FROM   data d
   WHERE  a_sno < b_sno
   AND    NOT EXISTS (
      SELECT 1 FROM data
      WHERE  b_sno = d.a_sno
      AND    a_sno < b_sno
      )
   )   
, cte AS (
   SELECT grp, b_sno AS sno FROM t

   UNION ALL
   SELECT c.grp, d.b_sno
   FROM   cte  c
   JOIN   data d ON d.a_sno = c.sno
                AND d.a_sno < d.b_sno  -- join to all connected rows
   )
SELECT grp, sno FROM cte
UNION                     -- eliminate duplicates
SELECT grp, a_sno FROM t  -- add first rows
ORDER  BY grp, sno;

Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).

Both should perform very well - especially with long chains / many branches. Result as desired:

SQL Fiddle (with added rows to demonstrate difficulty).

Undirected graph

If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.

这篇关于SQL分组交叉/重叠行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆