如果时间间隔重叠,则对行进行分组 [英] Group rows if they have overlapping time intervals

查看:47
本文介绍了如果时间间隔重叠,则对行进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个 SQL 查询来创建一个新字段 group_id 来标识每个 ID 中具有重叠 start_timeend_time 的记录 间隔.一个可接受的解决方案将为每个 ID 和重叠的一组间隔具有唯一的 group_id.例子:计算了 group_id 的 times 表示例

I need an SQL query to create a new field group_id which identifies records within each ID that have overlapping start_time and end_time intervals. An acceptable solution will have a unique group_id for each ID and overlapping set of intervals. Example: sample times table with group_id computed

ID	START_TIME	END_TIME	GROUP_ID
100	10:00:00	12:00:00	1
100	10:15:00	12:30:00	1
100	12:15:00	12:45:00	1
100	13:00:00	14:00:00	2
101	09:00:00	13:00:00	1
101	09:30:00	13:30:00	1
101	10:00:00	10:20:00	1
101	10:19:59	11:15:00	1
101	10:21:00	10:30:00	1
101	11:00:00	12:30:00	1
101	11:30:00	12:35:00	1
102	10:01:00	11:25:00	1
102	11:01:00	11:30:00	1
105	10:00:00	10:20:00	1
105	10:21:00	10:30:00	2
105	10:30:01	11:00:00	3
106	10:00:00	10:22:00	1
107	10:19:57	10:20:01	1
108	10:01:01	10:16:59	1

附加信息:对于给定的ID,如果它的任何区间重叠,则相应的记录属于同一组,因此应该具有相同的group_id.当 A 的 start_time 和/或 end_time 在 B 的 start_timeend_time 之间时,A 记录与另一条记录 B 重叠.

Additional Info: For a given ID, if any of its intervals overlap then the corresponding records belong to the same group, and thus should have the same group_id. A record A overlaps another record B when A’s start_time and/or end_time is between B’s start_time and end_time.

在示例中,ID = 100 有四个间隔.前三个重叠 => 第二个记录与第一个重叠(10:15 的 start_time 在 10 的 start_timeend_time 之间:00 到 12:00),第三个与第二个重叠(12:15 的 start_time 在 10 的 start_timeend_time 之间:15 至 12:30).因此,它们都具有相同的 group_id 1.ID = 100 的第四个区间不与该 ID,因此它成为自己的组,并带有新的 group_id.最后一条记录有一个完全不同的ID,所以它开始了第三组,也有一个新的group_id.

In the example, ID = 100 has four intervals. The first three overlap => the second record overlaps with the first (the start_time of 10:15 is between the start_time and end_time of 10:00 to 12:00) and the third overlaps with the second (the start_time of 12:15 is between the start_time and end_time of 10:15 to 12:30). Because of this, they all have the same group_id of 1. The fourth interval for ID = 100 does not overlap any of the other intervals within that ID, and so it becomes its own group with a new group_id. The last record has a completely different ID and so it starts a third group also with a new group_id.

我试过这个 MYSQL 脚本.输出不会重置组 ID 并按串行顺序继续.想知道哪些更改可以使其发挥作用.

edit: I've tried this MYSQL script. The output does not reset the group ID and continues in the serial order. Would like to know what changes can make it work.

  
WITH C1 AS (
SELECT *,
  CASE 
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
  partition by id
  ORDER BY start_time 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
  THEN 0 
  ELSE 1 
END AS isstart
FROM activity
) 
SELECT ID,start_time,end_time,
   SUM(isstart) OVER(ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG 
FROM C1;

Query Output:
100	10:00:00	12:00:00	1
100	10:15:00	12:30:00	1
100	12:15:00	12:45:00	1
100	13:00:00	14:00:00	2
101	09:00:00	13:00:00	3
101	09:30:00	13:30:00	3
101	10:00:00	10:20:00	3
101	10:19:59	11:15:00	3
101	10:21:00	10:30:00	3
101	11:00:00	12:30:00	3
101	11:30:00	12:35:00	3
102	10:01:00	11:25:00	4
102	11:01:00	11:30:00	4
105	10:00:00	10:20:00	5
105	10:21:00	10:30:00	6
105	10:30:01	11:00:00	7
106	10:00:00	10:22:00	8
107	10:19:57	10:20:01	9
108	10:01:01	10:16:59	10

(去掉mysql-server标签)

(Removing the mysql-server tag)

推荐答案

WITH C1 AS (
SELECT *,
  CASE 
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
  partition by id
  ORDER BY start_time 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
  THEN 0 
  ELSE 1 
END AS isstart
FROM activity
) 
SELECT ID,start_time,end_time,
   SUM(isstart) OVER(partition by id ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG 
FROM C1;

这应该对你有用

这篇关于如果时间间隔重叠,则对行进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆