扁平交叉时间盘 [英] Flattening intersecting timespans

查看:119
本文介绍了扁平交叉时间盘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量数据,具有给定ID的开始和结束时间,我需要将所有相交和相邻的时间段平坦化为一个组合的时间段。下面发布的示例数据都是相同的ID,所以我没有列出。

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan. The sample data posted below is all for the same ID so I didn't list it.

为了使事情更清晰,请查看03.06的示例数据.2009:

To make things a bit clearer, take a look at the sample data for 03.06.2009:

以下时间戳重叠或重复,需要合并成一次。

The following timespans are overlapping or contiunous and need to merge into one timespan


  • 05:54:48 - 10:00:13

  • 09:26:45 - 09:59:40

最后的时间范围是从05:54:48到10:00:13。由于在10:00:13和10:12:50之间有差距,我们也有以下时间段:

The resulting timespan would be from 05:54:48 to 10:00:13. Since there's a gap between 10:00:13 and 10:12:50 we also have the following timespans:


  • 10:12:50 - 10:27:25

  • 10:13:12 - 11:14:56

  • 10:27:25 - 10:27:31

  • 10:27:39 - 13:53:38

  • 11:14:56 - 11:15:03

  • 11:15:30 - 14:02:14

  • 13:53:38 - 13:53:43

  • 14 :02:14 - 14:02:31

  • 10:12:50 - 10:27:25
  • 10:13:12 - 11:14:56
  • 10:27:25 - 10:27:31
  • 10:27:39 - 13:53:38
  • 11:14:56 - 11:15:03
  • 11:15:30 - 14:02:14
  • 13:53:38 - 13:53:43
  • 14:02:14 - 14:02:31

导致一个合并的时间段从10:12:50到14:02: 31,因为它们是重叠的或相邻的。

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

下面你将会找到我需要的示例数据和扁平化数据。持续时间栏仅供参考。

Below you will find the sample data and the flattened data as I would need it. The duration column is just informative.

任何解决方案 - 不管是SQL还是不支持。

Any solution - be it SQL or not - is appreciated.

编辑:由于有许多不同而有趣的解决方案,我正在通过添加约束来改进我的原始问题,以查看最佳(如果有)解决方案泡沫:

EDIT: Since there are lots of different and interesting solutions I'm refining my original question by adding constraints to see the "best" (if there is one) solution bubble up:


  • 我从另一个系统通过ODBC获取数据。没有办法更改我的表格布局或添加索引

  • 数据仅由日期列索引(时间部分不是)

  • 每天大约有2.5k行

  • 数据的估计使用模式大致如下:


      <大部分时间(让我们说90%),用户只会查询一到两天(2.5k - 5k行)
    • 有时(9%)的范围将由一个月(〜75k行)

    • 很少(1%)范围将达一年(约900k行)

    • I'm getting the data via ODBC from another system. There's no way to change the table layout for me or adding indexes
    • The data is indexed only by the date column (the time part isn't)
    • There are about 2.5k rows for every day
    • The estimated usage pattern of the data is roughly as follows:
      • Most of the time (lets say 90%) the user will query just one or two days (2.5k - 5k rows)
      • Sometimes (9%) the range will be up to a month (~75k rows)
      • Rarely (1%) the range will be up to a year (~900k rows)

      在这些约束之内,最好的解决方案是什么?恐怕大多数解决方案将会非常缓慢,因为他们加入了日期和时间的组合,这不是我的情况下的一个索引领域。

      Within these constraints, what would be the best solution? I'm afraid that most of the solutions will be horribly slow since they join on the combination of date and time, which is not an index field in my case.

      您在客户端或服务器端执行所有合并?您是否首先创建一个优化的临时表,并使用该表中的一个提出的解决方案?我没有时间来测试解决方案,但是现在我会告诉你最适合我的工作。

      Would you do all the merging on the client or the server side? Would you first create an optimized temp table and use one of the proposed solutions with that table? I didn't have the time to test the solutions until now but I will keep you informed what works best for me.

      样本数据

      Date       | Start    | Stop
      -----------+----------+---------
      02.06.2009 | 05:55:28 | 09:58:27
      02.06.2009 | 10:15:19 | 13:58:24
      02.06.2009 | 13:58:24 | 13:58:43
      03.06.2009 | 05:54:48 | 10:00:13
      03.06.2009 | 09:26:45 | 09:59:40
      03.06.2009 | 10:12:50 | 10:27:25
      03.06.2009 | 10:13:12 | 11:14:56
      03.06.2009 | 10:27:25 | 10:27:31
      03.06.2009 | 10:27:39 | 13:53:38
      03.06.2009 | 11:14:56 | 11:15:03
      03.06.2009 | 11:15:30 | 14:02:14
      03.06.2009 | 13:53:38 | 13:53:43
      03.06.2009 | 14:02:14 | 14:02:31
      04.06.2009 | 05:48:27 | 09:58:59
      04.06.2009 | 06:00:00 | 09:59:07
      04.06.2009 | 10:15:52 | 13:54:52
      04.06.2009 | 10:16:01 | 13:24:20
      04.06.2009 | 13:24:20 | 13:24:24
      04.06.2009 | 13:24:32 | 14:00:39
      04.06.2009 | 13:54:52 | 13:54:58
      04.06.2009 | 14:00:39 | 14:00:49
      05.06.2009 | 05:53:58 | 09:59:12
      05.06.2009 | 10:16:05 | 13:59:08
      05.06.2009 | 13:59:08 | 13:59:16
      06.06.2009 | 06:04:00 | 10:00:00
      06.06.2009 | 10:16:54 | 10:18:40
      06.06.2009 | 10:18:40 | 10:18:45
      06.06.2009 | 10:23:00 | 13:57:00
      06.06.2009 | 10:23:48 | 13:57:54
      06.06.2009 | 13:57:21 | 13:57:38
      06.06.2009 | 13:57:54 | 13:57:58
      07.06.2009 | 21:59:30 | 01:58:49
      07.06.2009 | 22:12:16 | 01:58:39
      07.06.2009 | 22:12:25 | 01:58:28
      08.06.2009 | 02:10:33 | 05:56:11
      08.06.2009 | 02:10:43 | 05:56:23
      08.06.2009 | 02:10:49 | 05:55:59
      08.06.2009 | 05:55:59 | 05:56:01
      08.06.2009 | 05:56:11 | 05:56:14
      08.06.2009 | 05:56:23 | 05:56:27
      

      平整结果

      Date       | Start    | Stop     | Duration
      -----------+----------+----------+---------
      02.06.2009 | 05:55:28 | 09:58:27 | 04:02:59
      02.06.2009 | 10:15:19 | 13:58:43 | 03:43:24
      03.06.2009 | 05:54:48 | 10:00:13 | 04:05:25
      03.06.2009 | 10:12:50 | 14:02:31 | 03:49:41
      04.06.2009 | 05:48:27 | 09:59:07 | 04:10:40
      04.06.2009 | 10:15:52 | 14:00:49 | 03:44:58
      05.06.2009 | 05:53:58 | 09:59:12 | 04:05:14
      05.06.2009 | 10:16:05 | 13:59:16 | 03:43:11
      06.06.2009 | 06:04:00 | 10:00:00 | 03:56:00
      06.06.2009 | 10:16:54 | 10:18:45 | 00:01:51
      06.06.2009 | 10:23:00 | 13:57:58 | 03:34:58
      07.06.2009 | 21:59:30 | 01:58:49 | 03:59:19
      08.06.2009 | 02:10:33 | 05:56:27 | 03:45:54
      


      推荐答案

      。我使用DATETIME列。分开存储时间是我的看法中的一个错误,因为当时间到了午夜的时候你会有问题。你可以调整这个来处理这种情况,如果你需要的话。该解决方案还假设开始和结束时间不为NULL。再次,如果不是这样,您可以根据需要进行调整。

      Here is a SQL only solution. I used DATETIME for the columns. Storing the time separate is a mistake in my opinion, as you will have problems when the times go past midnight. You can adjust this to handle that situation though if you need to. The solution also assumes that the start and end times are NOT NULL. Again, you can adjust as needed if that's not the case.

      解决方案的一般要点是获取与其他任何其他内容不重叠的所有开始时间跨越,获得与任何跨度不重叠的所有结束时间,然后将两者结合在一起。

      The general gist of the solution is to get all of the start times that don't overlap with any other spans, get all of the end times that don't overlap with any spans, then match the two together.

      结果与您的预期结果相符,但在一种情况下,手头检查看起来像你的预期输出错误。 6日应该在2009-06-06 10:18:45.000结束。

      The results match your expected results except in one case, which checking by hand looks like you have a mistake in your expected output. On the 6th there should be a span that ends at 2009-06-06 10:18:45.000.

      SELECT
           ST.start_time,
           ET.end_time
      FROM
      (
           SELECT
                T1.start_time
           FROM
                dbo.Test_Time_Spans T1
           LEFT OUTER JOIN dbo.Test_Time_Spans T2 ON
                T2.start_time < T1.start_time AND
                T2.end_time >= T1.start_time
           WHERE
                T2.start_time IS NULL
      ) AS ST
      INNER JOIN
      (
           SELECT
                T3.end_time
           FROM
                dbo.Test_Time_Spans T3
           LEFT OUTER JOIN dbo.Test_Time_Spans T4 ON
                T4.end_time > T3.end_time AND
                T4.start_time <= T3.end_time
           WHERE
                T4.start_time IS NULL
      ) AS ET ON
           ET.end_time > ST.start_time
      LEFT OUTER JOIN
      (
           SELECT
                T5.end_time
           FROM
                dbo.Test_Time_Spans T5
           LEFT OUTER JOIN dbo.Test_Time_Spans T6 ON
                T6.end_time > T5.end_time AND
                T6.start_time <= T5.end_time
           WHERE
                T6.start_time IS NULL
      ) AS ET2 ON
           ET2.end_time > ST.start_time AND
           ET2.end_time < ET.end_time
      WHERE
           ET2.end_time IS NULL
      

      这篇关于扁平交叉时间盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆