在SQL中一遍合并间隔 [英] Merging intervals in one pass in SQL

查看:88
本文介绍了在SQL中一遍合并间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含两列的表:startend,都是整数,并且该表由第一列然后是第二列排序.每行代表一个间隔.

Let's say I have a table with two columns: start and end, both integers, and the table is ordered by the first, then second column. Each row represents an interval.

我需要的是合并的间隔表:所有重叠或相邻的间隔都变成一个.

What I need is the table of merged intervals: all overlapping or adjacent intervals gobbled up into one.

它可以用JOIN查询构造,但是行数是二次的,在我的情况下是400万行(我决定组成这个问题,因为查询仍在运行).

It can be constructed with a JOIN query, but that is quadratic in the number of rows, which is 4 million rows in my case (I decided to compose this question because the query is still running).

也可以通过遍历每一行并跟踪最大结束时间以单次的方式完成-但是如何在标准SQL中执行此操作或进行等效操作?在SQL中有任何 O(n)方法可以做到吗?我现在正在使用SQLite.特定于SQLite的解决方案这次也对我有帮助.

It can also be done in a single pass, by running through each row and keeping track of the maximum end time - but how to do that, or something equivalent, in standard SQL? Is there any O(n) way to do it in SQL? I'm using SQLite right now; a SQLite-specific solution would also help me out this time.

从相关问题的答案( 1 2 4 5 6 8 9 )我不知道是否有可能.

From the answers to related questions (1, 2, 3, 4, 5, 6, 7, 8, 9) I can't tell whether it's possible.

可以吗?

推荐答案

好,这是在MySQL中有效的解决方案(我不知道它是否在SQlite中有效).我认为但不能证明那是O(n)(丢弃最初对事件表进行排序所花费的时间,即,是否已经按照我认为的问题进行了排序.)

Well, here is a solution that works in MySQL (I don't know if it will work in SQlite). I think, but cannot prove, that is O(n) (discarding the time it takes to sort the events table initially, i.e. if it is already sorted as I think the question states.)

> SELECT * from events;
+-------+-----+
| start | end |
+-------+-----+
|     1 |   9 |
|     5 |   8 |
|     8 |  11 |
|    11 |  13 |
|    17 |  25 |
|    18 |  26 |
|    33 |  42 |
|    59 |  81 |
|    61 |  87 |
|    97 | 132 |
|   105 | 191 |
|   107 | 240 |
|   198 | 213 |
|   202 | 215 |
+-------+-----+
14 rows in set (0.00 sec)


SET @interval_id = 0;
SET @interval_end = 0;

SELECT
  MIN(start) AS start,
  MAX(end) AS end
  FROM
    (SELECT
       @interval_id := IF(start > @interval_end,
                          @interval_id + 1,
                          @interval_id) AS interval_id,
       @interval_end := IF(start < @interval_end,
                           GREATEST(@interval_end, end),
                           end) AS interval_end,
       events.*
     FROM events
     ORDER BY start,end) tmp
  GROUP BY interval_id;

+-------+------+
| start | end  |
+-------+------+
|     1 |   13 |
|    17 |   26 |
|    33 |   42 |
|    59 |   87 |
|    97 |  240 |
+-------+------+
5 rows in set (0.00 sec)

这篇关于在SQL中一遍合并间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆