从日期时间表中获取包含开始和结束值的列表 [英] Get list with start and end values from table of datetimes
问题描述
目前,我已经以这种方式建立了一张桌子
Currently i have a table built up like this way
DeviceID Timestamp Value
----------------------------------------
Device1 1.1.2011 10:00:00 3
Device1 1.1.2011 10:00:01 4
Device1 1.1.2011 10:00:02 4
Device1 1.1.2011 10:00:04 3
Device1 1.1.2011 10:00:05 4
Device1 1.1.2011 14:23:14 8
Device1 1.1.2011 14:23:15 7
Device1 1.1.2011 14:23:17 4
Device1 1.1.2011 14:23:18 2
如您所见,具有给定时间戳记(列类型为datetime)的设备中会有一些值.
As you can see there are coming in some values from a device with a given timestamp (the column type is datetime).
问题在于设备可以随时启动和停止,并且数据中没有直接信息表明启动或停止已经发生.但是从给定的时间戳列表中,很容易知道何时开始和停止发生,因为每两行的时间戳在五秒钟之内都属于同一度量.
The problem is that the device can be started and stopped at any point and there is no direct information within the data that a start or stop had occured. But from the list of given timestamps it is quite easy to tell when a start and stop had occured, due to the fact that whenever the timestamps of two rows is within five seconds, they belong to the same measurement.
现在,我想从此数据中获取像这样的列表:
Now i'd like to get out of this data a list like this:
DeviceID Started Ended
Device1 1.1.2011 10:00:00 1.1.2011 10:00:05
Device1 1.1.2011 14:23:14 1.1.2011 14:23:18
那么有什么想法可以快速地做到这一点?我所能想到的就是使用某种游标并手动比较每个日期时间对.但是我认为这会变得很慢,因为我们必须检查每一行中的每个值.
So any ideas how to do this in a fast way? All i can think about is using some kind of cursor and compare each datetime pair by hand. But i think this will get really slow cause we have to inspect each value in each row.
那么还有没有更好的SQL解决方案不能与游标一起使用?
So is there any better SQL solution which won't work with cursors?
目前,我已经测试了所有给定的答案.通过阅读,它们看起来都不错,并采用了一些有趣的方法.不幸的是,所有这些(到目前为止)在真实数据上都失败了.最大的问题似乎是数据量(目前,表中的数据量约为350万).仅对一小部分子集执行给定查询会产生预期的结果,但是将查询滚动到整个表上只会导致非常糟糕的性能.
Currently i tested all the given answers. And by reading they all look good and had some interesting approaches. Unfortunately all of them (so far) failed on the real data. The biggest problem seems to be the mass of the data (currently their are round about 3.5 millions entries in the table). Performing the given query only on a small subset leads to the expected results, but rolling the query onto the whole table just leads to a very bad performance.
我必须进一步测试并检查我是否可以对数据进行分块,并且仅将部分数据传递给这些给定算法中的一种,以使此事情顺利进行.但是也许你们中的一个人有另一个聪明的主意,可以更快地获得结果.
I have to further test and examine if i can chunkify the data and only pass a part of the data to one of these given algorithms to get this thing rolling. But maybe one of you has another smart idea to get the results a little bit faster.
好的,这些信息也可能会有所帮助: 目前,该表中大约有350万个条目.这是给定的列类型和索引:
Okay, these informations might help, too: Currently there are round about 3.5 million entries in the table. And here are the given column types and indizes:
- _ID
- int
- 主键
- 分组索引
- 在我的示例中未提及此列,因为此查询不需要该列
- _ID
- int
- Primary Key
- Grouped Index
- didn't mentioned this column in my example, cause it isn't needed for this query
- int
- 不为空
- 索引
- 日期时间
- 不为空
- 索引
- 几个不同类型(int,real,tinyint)的未索引列
- 全部可以为空
也许这有助于改善您针对特定问题的(或新的)解决方案.
Maybe this helps to improve your already (or new) solutions to the given problem.
推荐答案
-- Table var to store the gaps declare @T table ( DeviceID varchar(10), PrevPeriodEnd datetime, NextPeriodStart datetime ) -- Get the gaps ;with cte as ( select *, row_number() over(partition by DeviceID order by Timestamp) as rn from data ) insert into @T select C1.DeviceID, C1.Timestamp as PrevPeriodEnd, C2.Timestamp as NextPeriodStart from cte as C1 inner join cte as C2 on C1.rn = C2.rn-1 and C1.DeviceID = C2.DeviceID and datediff(s, C1.Timestamp, C2.Timestamp) > 5 -- Build islands from gaps in @T ;with cte1 as ( -- Add first and last timestamp to gaps select DeviceID, PrevPeriodEnd, NextPeriodStart from @T union all select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart from data group by DeviceID union all select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd from data group by DeviceID ), cte2 as ( select *, row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn from cte1 ) select C1.DeviceID, C1.NextPeriodStart as PeriodStart, C2.PrevPeriodEnd as PeriodEnd from cte2 as C1 inner join cte2 as C2 on C1.DeviceID = C2.DeviceID and C1.rn = C2.rn-1 order by C1.DeviceID, C1.NextPeriodStart
这篇关于从日期时间表中获取包含开始和结束值的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!