SQL-将时间序列事件转换为开/关对(处理可能缺少的“开"或“关") [英] SQL - Convert Time Series Events into On/Off Pairs (handling potential missing On's or Off's)

查看:104
本文介绍了SQL-将时间序列事件转换为开/关对(处理可能缺少的“开"或“关")的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SQL Server中,我有一组时间序列开/关事件,它们看起来像这样(为简单起见,我只显示了一个警报编号,但是在同一表中有很多):

In SQL Server, I have a set of time series on/off events which look like this (for simplicity I have only shown one Alarm Number, but there are many in the same table):

'Alarms' Table:
AlarmNumber   Time                      AlarmState
1592          2020-01-02 01:52:02       1
1592          2020-01-02 01:58:07       0
1592          2020-04-28 03:46:49       1
1592          2020-04-28 06:19:10       0
1592          2020-06-04 00:25:22       1
1592          2020-08-27 01:57:03       1
1592          2020-08-27 05:16:32       0
1592          2020-09-17 02:51:57       0

我正尝试转换成On/Off对:

Which I am trying convert into On/Off Pairs:

Output I am trying to achieve, ideally as an SQL View:
AlarmNumber   StartTime                 EndTime
1592          2020-01-02 01:52:02       2020-01-02 01:58:07
1592          2020-04-28 03:46:49       2020-04-28 06:19:10
1592          2020-06-04 00:25:22       NULL
1592          2020-08-27 01:57:03       2020-08-27 05:16:32
1592          NULL                      2020-09-17 02:51:57

如果我有一个干净的数据集,并且没有丢失打开"或关闭"事件,则可以使用以下方法实现此目的:

If I had a clean data set, with no missing 'On' or 'Off' events, I can achieve this with:

select tOn.AlarmNumber, tOn.Time StartTime, tOff.Time EndTime
from (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 1
) tOn
LEFT JOIN (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 0
) tOff
on (tOn.AlarmNumber = tOff.AlarmNumber and tOn.EventID = tOff.EventID)

(从Adriano Carneiro的答案修改后的代码,该答案位于从单个列开始的T-SQL开始和结束日期时间)

(Code modified from Adriano Carneiro's answer at T-SQL Start and end date times from a single column )

我的问题:谁能想到一种处理警报"表以实现我的示例输出的有效方法,该输出可处理缺少的开/关事件(在示例输出中显示为NULL)?

My Question: Can anyone think of an efficient way of processing 'Alarms' table to achieve my sample Output, which handles missing on/off events (shown as the NULL's in the sample Output)?

我的备份是使用Cursor和Where循环,但是我希望可以通过将成对的On/Off事件组合在一起来实现此目的,但我只是无法使其工作.我有超过500k的事件,因此要进行迭代的数据量很大.

My backup is to use a Cursor and a Where loop, but I was hoping there may be a way to doing it by grouping pairs of On/Off events together, I just haven't been able to get it working. I have 500k+ of events, so it is a large data set to be iterating over.

欢迎提出任何想法!

谢谢你, 托马斯

------更新2020年11月1日------

------ UPDATE 1st Nov 2020 ------

已经提供了两种出色的解决方案,它们都可以正常工作,并且对80,000行凌乱的现实世界数据的样本提供相同的结果.

Two great solutions have been provided and both work correctly and provide identical results on a sample of 80,000 rows of messy real world data.

  • GMB的解决方案更易于阅读,但运行速度稍慢
  • gotqn的解决方案是使用更多的代码行,但是在我的测试服务器上运行速度提高了约50%

推荐答案

确定行顺序后,只需按部分按SELECT键,然后使用UNION ALL将结果统一:

Once having the rows order, just SELECT them in parts and unite the results using UNION ALL:

DECLARE @DataSource TABLE
(
    [AlarmNumber] INT
   ,[Time] DATETIME2(0)
   ,[AlarmState] INT
);

INSERT INTO @DataSource ([AlarmNumber], [Time], [AlarmState])
VALUES (1592, '2020-01-02 01:52:02', 1)
      ,(1592, '2020-01-02 01:58:07', 0)
      ,(1592, '2020-04-28 03:46:49', 1)
      ,(1592, '2020-04-28 06:19:10', 0)
      ,(1592, '2020-06-04 00:25:22', 1)
      ,(1592, '2020-08-27 01:57:03', 1)
      ,(1592, '2020-08-27 05:16:32', 0)
      ,(1592, '2020-09-17 02:51:57', 0);

-- Add a rowID column to the data
WITH DataSource AS
(
    SELECT * ,ROW_NUMBER() Over(Partition by AlarmNumber order by [Time]) rowID
    FROM @DataSource
)

-- This is just here so we can sort the result at the end
SELECT * FROM (

-- Select rows of DataSource where there is an ON and subsequent OFF event (DS1 Alarm is ON and DS2 Alarm is OFF)
-- This also catches where there is an ON, but no subsequent OFF (DS2.Time will be NULL)
    SELECT DS1.AlarmNumber
            ,DS1.Time As StartTime
            ,DS2.Time As EndTime
    FROM DataSource DS1
    LEFT JOIN DataSource DS2
        ON DS1.[rowID] = DS2.[rowID] - 1
        AND DS1.AlarmNumber = DS2.AlarmNumber
        AND DS2.[AlarmState] = 0
    WHERE DS1.[AlarmState] = 1

    UNION ALL

    -- Select rows of DataSource where there is an OFF and there is no matching ON (aka it turned OFF without ever turning ON)
    SELECT DS2.AlarmNumber
            ,NULL As StartTime
            ,DS2.Time As EndTime
    FROM DataSource DS2

    INNER JOIN DataSource DS1
        ON DS2.[rowID] -1 = DS1.[rowID]
        AND DS1.[AlarmState] = 0
        AND DS2.AlarmNumber = DS1.AlarmNumber
    
    WHERE DS2.[AlarmState] = 0

    UNION ALL

    -- Select rows of DataSource where the first event for this alarm number is an OFF (it would otherwise be missed by the above)
    SELECT DS1.AlarmNumber
            ,NULL As StartTime
            ,DS1.Time As EndTime
    FROM DataSource DS1
    WHERE DS1.[AlarmState] = 0 AND DS1.rowID = 1
) z
ORDER BY COALESCE(StartTime,EndTime), AlarmNumber

这篇关于SQL-将时间序列事件转换为开/关对(处理可能缺少的“开"或“关")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆