如何识别表中哪些行满足特定条件,但条件是基于前一行的数据?提供的示例 [英] How can I identify which rows in a table have met a certain condition, but the condition is based on data in previous rows? Example provided

查看:38
本文介绍了如何识别表中哪些行满足特定条件,但条件是基于前一行的数据?提供的示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个包含以下数据的表:

ObjectId EventId EventDate1 342 2017-10-271 342 2018-01-061 343 2018-04-181 401 2018-10-151 342 2018-11-121 342 2018-11-291 401 2018-12-101 342 2019-02-211 343 2019-04-231 401 2019-11-041 343 2020-02-152 342 2018-06-082 343 2018-09-182 342 2018-10-02

我需要标记对象(由 ObjectId 标识)发生了所有 3 个事件(由 EventId 值 342、343 和 401 标识)的第一条记录.然后,该过程应以剩余的记录重新开始.我试过使用窗口函数来让它工作,但是重新开始"识别任何其他事件的过程让我感到困惑.

该算法在上述数据集上执行的输出为:

ObjectId EventId EventDate EventsComplete1 342 2017-10-27 01 342 2018-01-06 01 343 2018-04-18 01 401 2018-10-15 11 342 2018-11-12 01 342 2018-11-29 01 401 2018-12-10 01 342 2019-02-21 01 343 2019-04-23 11 401 2019-11-04 01 343 2020-02-15 02 342 2018-06-08 02 343 2018-09-18 02 342 2018-10-02 0

这是将在示例中创建数据集的查询.

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDateunion select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDateunion select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDateunion select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDateunion select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDateunion select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDateunion select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDateunion select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDateunion select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDateunion select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate

解决方案

以下基于 Set 的解决方案.

除使用位域外,未尝试任何优化过程.它有效,这对我来说已经足够了.我可以看到一些可能的简化点

我应该补充一点,真的,这个问题目前是未定义的,因为如果两个不同的事件可以在同一日期发生,那么我们应该将它们处理的顺序没有定义发生.因此,在这些情况下,第一个 CTE 中分配的行号是任意的.样本数据中未出现此类情况.

使用字符串连接路径 - 150 毫秒.

切换到位而不是字符串,仍然比光标(~15 ms)慢(~30 ms)

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate成吨union all select 1, 342, cast('2018-01-06' as date)union all select 1, 343, cast('2018-04-18' as date)union all select 1, 401, cast('2018-10-15' as date)union all select 1, 342, cast('2018-11-12' as date)union all select 1, 342, cast('2018-11-29' as date)union all select 1, 401, cast('2018-12-10' as date)union all select 1, 342, cast('2019-02-21' as date)union all select 1, 343, cast('2019-04-23' as date)union all select 1, 401, cast('2019-11-04' as date)union all select 1, 343, cast('2020-02-15' as date)union all select 2, 342, cast('2018-06-08' as date)union all select 2, 343, cast('2018-09-18' as date)union all select 2, 342, cast('2018-10-02' as date);走编号为-- 只需添加一个行号以使其更易于遵循(选择objectid,事件,活动日期,rn = row_number() over (partition by objectid order by eventdate asc),bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)从T),路径为-- 每一行的不同 eventid 的连接路径,作为一个位域(选择 n.objectid,n.eventid,n.事件日期,根 = n.rn,名词,位从编号 n联合所有选择 n.objectid,n.eventid,n.事件日期,p.root,名词,p.bits |n.bits从路径 p在 n.objectid = p.objectid 上加入编号为 n和 n.rn >p.n和 p.bits &n.bits = 0),候选人作为-- 具有包含所有 3 个值的路径的行(位 = 7)(选择   *从     (选择根,嗯,候选人 = iif(rn = min(rn) over(按根分区),1, 0)从路径其中位 = 7) C其中 c.candidate = 1)-- 按行号顺序获取没有较早候选者的候选行-- 有一个根到尾的路径,该路径与该候选路径重叠选择不同的n.objectid,n.eventid,n.事件日期,isnull(c.candidate, 0)从编号 n左加入候选人 c on c.rn = n.rn并且不存在(选择 *从候选人上一页其中 prev.rn 

lulz 的纯游标.

<预><代码>声明@triplets 表(objectid int, eventid int, eventdate date);声明 c 游标 fast_forward for选择 objectid, eventid, eventdate from t order by objectid, eventdate asc;宣布@ob int、@prevob int、@event int、@dt 日期、@bits tinyint = 0;打开 c;从 c 中取 next 到 @ob, @event, @dt;而@@fetch_status = 0开始如果(@ob = @prevob)开始如果@event = 342 设置@bits |= 1;否则如果@event = 343 设置@bits |= 2;否则如果@event = 401 设置@bits |= 4;如果(@bits = 7)开始插入@triplets 值(@ob、@event、@dt);设置@bits = 0结尾结尾否则选择@bits = 0,@prevob = @ob;从 c 中取 next 到 @ob, @event, @dt;结尾关闭 c;解除分配c;选择 t.*, iif(tt.objectid 为 null, 0, 1)从T在 t.objectid = tt.objectid 上左加入 @triplets tt和 t.eventid = tt.eventid和 t.eventdate = tt.eventdate;

I'm working with a table that contains the following data:

ObjectId   EventId   EventDate
1          342       2017-10-27
1          342       2018-01-06
1          343       2018-04-18
1          401       2018-10-15
1          342       2018-11-12
1          342       2018-11-29
1          401       2018-12-10
1          342       2019-02-21
1          343       2019-04-23
1          401       2019-11-04
1          343       2020-02-15
2          342       2018-06-08
2          343       2018-09-18
2          342       2018-10-02

I need to flag the first record where all 3 events (identified by EventId values 342, 343, and 401) have occurred for an object (identified by ObjectId). Then, the process should start again with the remaining records. I've tried using windowed functions to get this to work, but the "starting over" process of identifying any additional occurrences is tripping me up.

The output of this algorithm performed on the above data set is:

ObjectId   EventId   EventDate    EventsComplete
1          342       2017-10-27   0
1          342       2018-01-06   0
1          343       2018-04-18   0
1          401       2018-10-15   1
1          342       2018-11-12   0
1          342       2018-11-29   0
1          401       2018-12-10   0
1          342       2019-02-21   0
1          343       2019-04-23   1
1          401       2019-11-04   0
1          343       2020-02-15   0
2          342       2018-06-08   0
2          343       2018-09-18   0
2          342       2018-10-02   0

Here's a query that will create the data set in the example.

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDate
union select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate

解决方案

Set based solution below.

No optimisation passes have been attempted other than using a bitfield. It works, that's enough for me. I can see a few points of possible simplification

I should add that, really, this problem is currently undefined, because if two different events can occur on the same date, there is no definition for the order in which we should treat them to have occured. So the row number allocated in the first CTE is arbitrary in those cases. No such cases occur in the sample data.

Using string concatenated paths - 150 ms.

Switching to bits instead of strings, still slower (~30 ms) than the cursor (~15 ms)

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
into t
union all select 1, 342, cast('2018-01-06' as date)
union all select 1, 343, cast('2018-04-18' as date)
union all select 1, 401, cast('2018-10-15' as date)
union all select 1, 342, cast('2018-11-12' as date)
union all select 1, 342, cast('2018-11-29' as date)
union all select 1, 401, cast('2018-12-10' as date)
union all select 1, 342, cast('2019-02-21' as date)
union all select 1, 343, cast('2019-04-23' as date)
union all select 1, 401, cast('2019-11-04' as date)
union all select 1, 343, cast('2020-02-15' as date)
union all select 2, 342, cast('2018-06-08' as date)
union all select 2, 343, cast('2018-09-18' as date)
union all select 2, 342, cast('2018-10-02' as date);
go

with numbered as  
-- just adding a row number to make it easier to follow
(
   select   objectid, 
            eventid, 
            eventdate, 
            rn = row_number() over (partition by objectid order by eventdate asc),
            bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)
   from     t
),
paths as  
-- the concatenated paths of distinct eventid for each row, as a bitfield
(
   select      n.objectid, 
               n.eventid, 
               n.eventdate, 
               root = n.rn, 
               n.rn, 
               bits
   from        numbered n
   union all   
   select      n.objectid, 
               n.eventid, 
               n.eventdate, 
               p.root, 
               n.rn, 
               p.bits | n.bits
   from        paths       p
   join        numbered    n  on n.objectid = p.objectid
                                 and n.rn > p.rn
                                 and p.bits & n.bits = 0 
),
candidates as 
-- a row that has a path containing all 3 values (bits = 7)
(
   select   *
   from     (
               select   root, 
                        rn,
                        candidate = iif
                        (
                           rn = min(rn) over (partition by root), 
                           1, 0
                        )
               from     paths
               where    bits = 7
            ) c            
   where    c.candidate = 1
)
-- get the candidate rows where no earlier candidiate in row number order
-- has a root-to-end path which overlaps the path for this candidate
select      distinct 
            n.objectid,
            n.eventid,
            n.eventdate,
            isnull(c.candidate, 0)
from        numbered   n
left join   candidates c on c.rn = n.rn
                            and not exists 
                            (
                               select * 
                               from candidates prev
                               where prev.rn < c.rn
                                     and prev.rn > c.root
                                     and prev.root < c.rn
                            )
order by    n.objectid, 
            n.eventdate, 
            n.eventid

Pure cursor for the lulz.


declare @triplets table(objectid int, eventid int, eventdate date);
declare c cursor fast_forward for 
select objectid, eventid, eventdate from t order by objectid, eventdate asc;
declare 
   @ob int, @prevob int, @event int, @dt date, 
   @bits tinyint = 0;
open c;
fetch next from c into @ob, @event, @dt;
while @@fetch_status = 0
begin
    if (@ob = @prevob)
    begin           
        if @event = 342 set @bits |= 1;
        else if @event = 343 set @bits |= 2;
        else if @event = 401 set @bits |= 4;

        if (@bits = 7) 
        begin
            insert @triplets values (@ob, @event, @dt);
            set @bits = 0
        end
    end
    else select @bits = 0, @prevob = @ob;
    fetch next from c into @ob, @event, @dt;
end
close c;
deallocate c;
select      t.*, iif(tt.objectid is null, 0, 1)
from        t
left join   @triplets tt    on  t.objectid = tt.objectid 
                                and t.eventid = tt.eventid
                                and t.eventdate = tt.eventdate;

这篇关于如何识别表中哪些行满足特定条件,但条件是基于前一行的数据?提供的示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆