按ID对行进行分组,并找到具有日期间隔的最大/最小(date_from,date_to) [英] Group rows by ID and find max/min(date_from, date_to) with date gaps

查看:69
本文介绍了按ID对行进行分组,并找到具有日期间隔的最大/最小(date_from,date_to)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要按ID对数据进行分组并找到最大/最小(date_from,date_to).但是,如果有日期间隔,应该在新的一行.

I need to group data by id and find max/min (date_from, date_to). But if there is date gap it should be new row.

我有以下数据:

SYS_ID  ITEM_ID DATE_FROM   DATE_TO
1       1       01.01.2019  20.01.2019
1       1       15.01.2019  10.02.2019
1       1       15.02.2019  20.02.2019
1       1       18.02.2019  10.03.2019
1       1       10.03.2019  22.03.2019
1       2       01.01.2019  10.01.2019
1       2       15.01.2019  25.01.2019

结果应为:

SYS_ID  ITEM_ID DATE_FROM   DATE_TO
1       1       01.01.2019  10.02.2019
1       1       15.02.2019  22.03.2019
1       2       01.01.2019  10.01.2019
1       2       15.01.2019  25.01.2019

有没有一种方法可以不使用光标?

Is there a way to do this without using cursor?

推荐答案

使用间隙和孤岛方法

实时测试: http://sqlfiddle.com/#!18/0174b/3

with gap_detector as
(
     select
        sys_id, item_id,
        date_from, date_to,
        case when 
            lag(date_to) 
            over(partition by sys_id, item_id order by date_from) >= date_from
        then
            0
        else
            1
        end as gap
     from tbl
 )
 , grouper as
 (
     select
         sys_id, item_id,
         date_from, date_to,
         sum(gap) over(partition by sys_id, item_id order by date_from) as grp
     from gap_detector   
)  
select
    sys_id, item_id,
    min(date_from) as date_from,
    max(date_to) as date_to
from grouper
group by sys_id, item_id, grp

输出:

| sys_id | item_id |  date_from |    date_to |
|--------|---------|------------|------------|
|      1 |       1 | 2019-01-01 | 2019-02-10 |
|      1 |       1 | 2019-02-15 | 2019-03-22 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |


工作原理

首先,我们需要检测前一行(使用 lag )的date_to是否与当前date_from重叠.


How it works

First we need to detect if date_to from previous row (using lag) overlaps with current date_from.

请注意,我们有独立的date_from集,即 sys_id + item_id 组合的前一行(例如1,1)与另一个 sys_id + item_id 组合(例如1,2).因此, 1,2 的前一个date_to不是 2019年3月22日,而是 NULL .我们可以通过对它们进行分区来正确识别每个 sys_id + item_id 组合的前一行,即按sys_id,item_id 进行 partition.

Note that we have independent sets of date_from, that is, the previous row of sys_id + item_id combo (e.g., 1,1) does not overlap with another sys_id + item_id combo (e.g., 1,2). So the first previous date_to of 1,2 is not March 22, 2019, it's NULL instead. We can properly identify the previous row of each sys_id + item_id combo by partitioning them, i.e., partition by sys_id, item_id.

话虽如此,我们才能确定上一行的date_to是否与当前date_from重叠:

With that said here is how we can identify if date_to from previous row overlaps with current date_from:

  • 如果当前date_from与上一date_to重叠,请不要将当前date_from与上一行隔离开来,我们可以通过将当前行的值设置为0来实现.
  • 否则,如果当前date_from与上一个date_to不重叠,则通过将当前行标记为空白,将当前行与上一行隔离(换句话说,是 gap )将其值设置为1.稍后将说明为什么我们需要1和0.
  • If the current date_from overlaps with previous date_to, don't isolate the current date_from from the previous row, we can do this by giving the current row a value of 0.
  • Otherwise, if the current date_from does not overlap with previous date_to, isolate (in other word gap) the current row from previous row, by marking the current row as a gap, we can do this by giving it a value of 1. It will come later why we need 1 and 0.

实时测试: http://sqlfiddle.com/#!18/0174b/7

with gap_detector as
(
     select
        sys_id, item_id,
        date_from, date_to,
        case when 
            lag(date_to) 
            over(partition by sys_id, item_id order by date_from) >= date_from
        then
            0
        else
            1
        end as gap
     from tbl
)
select * 
from gap_detector
order by sys_id, item_id, date_from

输出:

| sys_id | item_id |  date_from |    date_to | gap |
|--------|---------|------------|------------|-----|
|      1 |       1 | 2019-01-01 | 2019-01-20 |   1 |
|      1 |       1 | 2019-01-15 | 2019-02-10 |   0 |
|      1 |       1 | 2019-02-15 | 2019-02-20 |   1 |
|      1 |       1 | 2019-02-18 | 2019-03-10 |   0 |
|      1 |       1 | 2019-03-10 | 2019-03-22 |   0 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |   1 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |   1 |        

下一步是通过对间隔标记(1和0)进行累加总计,将属于彼此的岛进行分组.通过在 sys_id + item_id 组合的窗口上执行 sum(gap)来完成总计.

Next step is to group the the islands that belong to each other by doing a running total over the gap markers (1 and 0). Running total is done by doing a sum(gap) over the window of sys_id + item_id combo.

sys_id + item_id 组合的每个窗口都可以通过对它们进行 partition 来独立地进行操作,即 partition bysys_id,item_id

Each window of sys_id + item_id combo can be operated on independently by doing a partition on them, i.e., partition by sys_id, item_id

实时测试: http://sqlfiddle.com/#!18/0174b/12

with gap_detector as
(
     select
        sys_id, item_id,
        date_from, date_to,
        case when 
            lag(date_to) 
            over(partition by sys_id, item_id order by date_from) >= date_from
        then
            0
        else
            1
        end as gap
     from tbl
 )
 , grouper as
 (
     select
         sys_id, item_id,
         date_from, date_to,
         gap,
         sum(gap) over(partition by sys_id, item_id order by date_from) as grp
     from gap_detector   
)  
select sys_id, item_id, date_from, date_to, gap, grp
from grouper

输出:

| sys_id | item_id |  date_from |    date_to | gap | grp |
|--------|---------|------------|------------|-----|-----|
|      1 |       1 | 2019-01-01 | 2019-01-20 |   1 |   1 |
|      1 |       1 | 2019-01-15 | 2019-02-10 |   0 |   1 |
|      1 |       1 | 2019-02-15 | 2019-02-20 |   1 |   2 |
|      1 |       1 | 2019-02-18 | 2019-03-10 |   0 |   2 |
|      1 |       1 | 2019-03-10 | 2019-03-22 |   0 |   2 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |   1 |   1 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |   1 |   2 |

最后,现在我们已经能够识别出哪些岛属于彼此(以 grp 表示),只需对这些 grp 标记来标识date_from和date_to在每个岛群( grp )上何时开始.

Finally, now that we are able to identify which islands belong to each other (denoted by grp), it's just a matter of doing a group by on those grp markers to identify when date_from and date_to started on each group (grp) of islands.

实时测试: http://sqlfiddle.com/#!18/0174b/13

select
    sys_id, item_id,
    min(date_from) as date_from,
    max(date_to) as date_to
from grouper
group by sys_id, item_id, grp

输出:

| sys_id | item_id |  date_from |    date_to |
|--------|---------|------------|------------|
|      1 |       1 | 2019-01-01 | 2019-02-10 |
|      1 |       1 | 2019-02-15 | 2019-03-22 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |

这篇关于按ID对行进行分组,并找到具有日期间隔的最大/最小(date_from,date_to)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆