从日期时间表中获取包含开始和结束值的列表 [英] Get list with start and end values from table of datetimes

查看:85
本文介绍了从日期时间表中获取包含开始和结束值的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我已经以这种方式建立了一张桌子

Currently i have a table built up like this way

DeviceID      Timestamp            Value
----------------------------------------
Device1       1.1.2011 10:00:00    3
Device1       1.1.2011 10:00:01    4
Device1       1.1.2011 10:00:02    4
Device1       1.1.2011 10:00:04    3
Device1       1.1.2011 10:00:05    4
Device1       1.1.2011 14:23:14    8
Device1       1.1.2011 14:23:15    7
Device1       1.1.2011 14:23:17    4
Device1       1.1.2011 14:23:18    2

如您所见,具有给定时间戳记(列类型为datetime)的设备中会有一些值.

As you can see there are coming in some values from a device with a given timestamp (the column type is datetime).

问题在于设备可以随时启动和停止,并且数据中没有直接信息表明启动或停止已经发生.但是从给定的时间戳列表中,很容易知道何时开始和停止发生,因为每两行的时间戳在五秒钟之内都属于同一度量.

The problem is that the device can be started and stopped at any point and there is no direct information within the data that a start or stop had occured. But from the list of given timestamps it is quite easy to tell when a start and stop had occured, due to the fact that whenever the timestamps of two rows is within five seconds, they belong to the same measurement.

现在,我想从此数据中获取像这样的列表:

Now i'd like to get out of this data a list like this:

DeviceID      Started              Ended
Device1       1.1.2011 10:00:00    1.1.2011 10:00:05
Device1       1.1.2011 14:23:14    1.1.2011 14:23:18

那么有什么想法可以快速地做到这一点?我所能想到的就是使用某种游标并手动比较每个日期时间对.但是我认为这会变得很慢,因为我们必须检查每一行中的每个值.

So any ideas how to do this in a fast way? All i can think about is using some kind of cursor and compare each datetime pair by hand. But i think this will get really slow cause we have to inspect each value in each row.

那么还有没有更好的SQL解决方案不能与游标一起使用?

So is there any better SQL solution which won't work with cursors?

目前,我已经测试了所有给定的答案.通过阅读,它们看起来都不错,并采用了一些有趣的方法.不幸的是,所有这些(到目前为止)在真实数据上都失败了.最大的问题似乎是数据量(目前,表中的数据量约为350万).仅对一小部分子集执行给定查询会产生预期的结果,但是将查询滚动到整个表上只会导致非常糟糕的性能.

Currently i tested all the given answers. And by reading they all look good and had some interesting approaches. Unfortunately all of them (so far) failed on the real data. The biggest problem seems to be the mass of the data (currently their are round about 3.5 millions entries in the table). Performing the given query only on a small subset leads to the expected results, but rolling the query onto the whole table just leads to a very bad performance.

我必须进一步测试并检查我是否可以对数据进行分块,并且仅将部分数据传递给这些给定算法中的一种,以使此事情顺利进行.但是也许你们中的一个人有另一个聪明的主意,可以更快地获得结果.

I have to further test and examine if i can chunkify the data and only pass a part of the data to one of these given algorithms to get this thing rolling. But maybe one of you has another smart idea to get the results a little bit faster.

好的,这些信息也可能会有所帮助: 目前,该表中大约有350万个条目.这是给定的列类型和索引:

Okay, these informations might help, too: Currently there are round about 3.5 million entries in the table. And here are the given column types and indizes:

  • _ID
    • int
    • 主键
    • 分组索引
    • 在我的示例中未提及此列,因为此查询不需要该列
    • _ID
      • int
      • Primary Key
      • Grouped Index
      • didn't mentioned this column in my example, cause it isn't needed for this query
      • int
      • 不为空
      • 索引
      • 日期时间
      • 不为空
      • 索引
      • 几个不同类型(int,real,tinyint)的未索引列
      • 全部可以为空

      也许这有助于改善您针对特定问题的(或新的)解决方案.

      Maybe this helps to improve your already (or new) solutions to the given problem.

      推荐答案

      -- Table var to store the gaps
      declare @T table
      (
        DeviceID varchar(10),
        PrevPeriodEnd datetime,
        NextPeriodStart datetime
      )
      
      -- Get the gaps
      ;with cte as 
      (
        select *,
          row_number() over(partition by DeviceID order by Timestamp) as rn
        from data
      )
      insert into @T
      select
        C1.DeviceID,
        C1.Timestamp as PrevPeriodEnd,
        C2.Timestamp as NextPeriodStart
      from cte as C1
        inner join cte as C2
          on C1.rn = C2.rn-1 and
             C1.DeviceID = C2.DeviceID and
             datediff(s, C1.Timestamp, C2.Timestamp) > 5
      
      -- Build islands from gaps in @T
      ;with cte1 as
      (
        -- Add first and last timestamp to gaps
        select DeviceID, PrevPeriodEnd, NextPeriodStart
        from @T
        union all
        select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart
        from data
        group by DeviceID
        union all
        select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd
        from data
        group by DeviceID
      ),
      cte2 as
      (
        select *,
          row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn
        from cte1
      )
      select
        C1.DeviceID,
        C1.NextPeriodStart as PeriodStart,
        C2.PrevPeriodEnd as PeriodEnd
      from cte2 as C1
        inner join cte2 as C2
          on C1.DeviceID = C2.DeviceID and
             C1.rn = C2.rn-1
      order by C1.DeviceID, C1.NextPeriodStart       
      

      这篇关于从日期时间表中获取包含开始和结束值的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆