查找连续行子集的最小值和最大值 - 间隙和岛屿 [英] Find min and max for subsets of consecutive rows - gaps and islands

查看:33
本文介绍了查找连续行子集的最小值和最大值 - 间隙和岛屿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试构建查询.

输入按列 'rn' 中的 rownumber 排序,对于 'name' 中的每个唯一值从 1 开始,并在 'act' 中定义给定的条目序列.在act"列中,它包含多次出现的两个值,>sleep<;和>醒来<.目标是为这些值之一的每组连续行找到 startt 和 endd 的最小值和最大值.

The input is ordered by rownumber in column 'rn' starting with 1 for each unique value in 'name' and defining a given sequence of entries in 'act'. In column 'act' it holds two values in multiple occurence, >sleep< and >wake<. The goal is to find for each consecutive set of rows of one of those values the minimum and maximum value of startt and endd.

这应该是输入:

name       act        rn     startt endd
---------- ---------- ------ ------ ------
jimmy      sleep      1      1      3
jimmy      wake       2      4      7
jimmy      wake       3      8      10
jimmy      sleep      4      11     13
karen      wake       1      1      4
karen      sleep      2      5      7
karen      wake       3      8      9
karen      wake       4      10     12
karen      wake       5      13     14
karen      sleep      6      15     17
karen      sleep      7      18     20

所需的输出:

name       act        startt endd   
---------- ---------- ------ ------ 
jimmy      sleep      1      3      
jimmy      wake       4      10     
jimmy      sleep      11     13     
karen      wake       1      4      
karen      sleep      5      7      
karen      wake       8      14     
karen      sleep      15     20  

输入源不提供更多列.每个子集中的成员数量可能比这个例子中的要多得多.

The source of the input does not provide further columns. The number of members in each subset can be very much higher then in this example.

我尝试了不同的聚合方法,但都没有奏效.我相信使用 LEADLAGG 和进一步的诡计可能会让我到达那里,但这似乎非常不优雅.我认为区分每个子集是关键,即为其所有成员创建一个唯一的标识符.有了这个,minmax 的聚合就很简单了.也许我错了.也许这是不可能的.也许是自我加入.也许是一个递归的 cte.我不知道.

I tried different ways of aggregating, but none worked. I believe using LEAD and LAGG and further trickery might get me there, but that appears to be awfully unelegant. I have the notion it is key to differentiate each subset, i.e. create an identifier unique to all its members. With this at hand an aggregate with min and max is simple. Maybe i'm wrong. Maybe it's impossible. Maybe a self join. Maybe a recursive cte. I don't know.

所以:有人知道如何获得这个吗?非常感谢帮助.

So: does anybody know how to get this? Help is much appreciated.

更新:

感谢 Gordon Linoff、shawnt00 和其他发表评论的贡献者.根据您的建议,我觉得我的逻辑关闭工具箱存在重大差距.

Thank You to Gordon Linoff, shawnt00 and the other contributors who commented. With Your advice I feel major gaps in my toolbox of logic closing.

感兴趣的人:

declare @t table (
    name nvarchar(10)
    ,act nvarchar (10)
    ,startt smallint
    ,endd smallint
    )

insert into @t (
    name
    ,act
    ,startt
    ,endd
    )
values
     ('jimmy','sleep', 1,3)
    ,('jimmy','wake', 4,7)
    ,('jimmy','wake', 8,10)
    ,('jimmy','sleep', 11,13)
    ,('karen','wake', 1,4)
    ,('karen','sleep', 5,7)
    ,('karen','wake', 8,9)
    ,('karen','wake', 10,12)
    ,('karen','wake', 13,14)
    ,('karen','sleep', 15,17)
    ,('karen','sleep', 18,20)

; --- all rows, no aggregating
with 
cte as (
select 
    name
    ,act
    ,row_number() over (partition by name order by name,startt) rn
    ,row_number() over (partition by name, act order by name,startt) act_n
    ,startt
    ,endd
from
    @t )
select
    name
    ,act
    ,startt
    ,endd
    ,rn
    ,act_n
    ,rn - act_n diff
from 
    cte
order by 
    name
    ,rn

;--- aggregating for the desired ouput
with 
cte as (
select 
    name
    ,act
    ,row_number() over (partition by name order by name,startt) rn
    ,row_number() over (partition by name, act order by name,startt) act_n
    ,startt
    ,endd
from 
    @t )
select
    name
    ,act
    ,min(startt) startt
    ,max(endd)   endd
    ,min(rn)     min_rn
    ,max(rn)     max_rn
from 
    cte
group by 
    name
    ,act
    ,rn - act_n
order by 
    name
    ,min(rn)

推荐答案

您要查找相似行的连续组,然后进行聚合.我喜欢行号差异的方法:

You want to find consecutive groups of similar rows and then aggregation. I like the difference of row numbers approach:

select name, act, min(startt) as startt, max(endd) as endd
from (select i.*,
             row_number() over (partition by name, act order by rn) as seqnum_na,
             row_number() over (partition by name order by rn) as seqnum_n
      from input i
     ) i
group by (seqnum_n - seqnum_na), name, act;

您可以通过查看子查询的作用来了解其工作原理.

You can see how this works by looking at what the subquery does.

这篇关于查找连续行子集的最小值和最大值 - 间隙和岛屿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆