按ID对行进行分组,并找到具有日期间隔的最大/最小(date_from,date_to) [英] Group rows by ID and find max/min(date_from, date_to) with date gaps
问题描述
我需要按ID对数据进行分组并找到最大/最小(date_from,date_to).但是,如果有日期间隔,应该在新的一行.
I need to group data by id and find max/min (date_from, date_to). But if there is date gap it should be new row.
我有以下数据:
SYS_ID ITEM_ID DATE_FROM DATE_TO
1 1 01.01.2019 20.01.2019
1 1 15.01.2019 10.02.2019
1 1 15.02.2019 20.02.2019
1 1 18.02.2019 10.03.2019
1 1 10.03.2019 22.03.2019
1 2 01.01.2019 10.01.2019
1 2 15.01.2019 25.01.2019
结果应为:
SYS_ID ITEM_ID DATE_FROM DATE_TO
1 1 01.01.2019 10.02.2019
1 1 15.02.2019 22.03.2019
1 2 01.01.2019 10.01.2019
1 2 15.01.2019 25.01.2019
有没有一种方法可以不使用光标?
Is there a way to do this without using cursor?
推荐答案
使用间隙和孤岛方法
实时测试: http://sqlfiddle.com/#!18/0174b/3
with gap_detector as
(
select
sys_id, item_id,
date_from, date_to,
case when
lag(date_to)
over(partition by sys_id, item_id order by date_from) >= date_from
then
0
else
1
end as gap
from tbl
)
, grouper as
(
select
sys_id, item_id,
date_from, date_to,
sum(gap) over(partition by sys_id, item_id order by date_from) as grp
from gap_detector
)
select
sys_id, item_id,
min(date_from) as date_from,
max(date_to) as date_to
from grouper
group by sys_id, item_id, grp
输出:
| sys_id | item_id | date_from | date_to |
|--------|---------|------------|------------|
| 1 | 1 | 2019-01-01 | 2019-02-10 |
| 1 | 1 | 2019-02-15 | 2019-03-22 |
| 1 | 2 | 2019-01-01 | 2019-01-10 |
| 1 | 2 | 2019-01-15 | 2019-01-25 |
工作原理
首先,我们需要检测前一行(使用 lag
)的date_to是否与当前date_from重叠.
How it works
First we need to detect if date_to from previous row (using lag
) overlaps with current date_from.
请注意,我们有独立的date_from集,即 sys_id
+ item_id
组合的前一行(例如1,1)与另一个 sys_id
+ item_id
组合(例如1,2).因此, 1,2
的前一个date_to不是 2019年3月22日
,而是 NULL
.我们可以通过对它们进行分区来正确识别每个 sys_id
+ item_id
组合的前一行,即按sys_id,item_id 进行 partition.
Note that we have independent sets of date_from, that is, the previous row of sys_id
+ item_id
combo (e.g., 1,1) does not overlap with another sys_id
+ item_id
combo (e.g., 1,2). So the first previous date_to of 1,2
is not March 22, 2019
, it's NULL
instead. We can properly identify the previous row of each sys_id
+ item_id
combo by partitioning them, i.e., partition by sys_id, item_id
.
话虽如此,我们才能确定上一行的date_to是否与当前date_from重叠:
With that said here is how we can identify if date_to from previous row overlaps with current date_from:
- 如果当前date_from与上一date_to重叠,请不要将当前date_from与上一行隔离开来,我们可以通过将当前行的值设置为0来实现.
- 否则,如果当前date_from与上一个date_to不重叠,则通过将当前行标记为空白,将当前行与上一行隔离(换句话说,是
gap
)将其值设置为1.稍后将说明为什么我们需要1和0.
- If the current date_from overlaps with previous date_to, don't isolate the current date_from from the previous row, we can do this by giving the current row a value of 0.
- Otherwise, if the current date_from does not overlap with previous date_to, isolate (in other word
gap
) the current row from previous row, by marking the current row as a gap, we can do this by giving it a value of 1. It will come later why we need 1 and 0.
实时测试: http://sqlfiddle.com/#!18/0174b/7
with gap_detector as
(
select
sys_id, item_id,
date_from, date_to,
case when
lag(date_to)
over(partition by sys_id, item_id order by date_from) >= date_from
then
0
else
1
end as gap
from tbl
)
select *
from gap_detector
order by sys_id, item_id, date_from
输出:
| sys_id | item_id | date_from | date_to | gap |
|--------|---------|------------|------------|-----|
| 1 | 1 | 2019-01-01 | 2019-01-20 | 1 |
| 1 | 1 | 2019-01-15 | 2019-02-10 | 0 |
| 1 | 1 | 2019-02-15 | 2019-02-20 | 1 |
| 1 | 1 | 2019-02-18 | 2019-03-10 | 0 |
| 1 | 1 | 2019-03-10 | 2019-03-22 | 0 |
| 1 | 2 | 2019-01-01 | 2019-01-10 | 1 |
| 1 | 2 | 2019-01-15 | 2019-01-25 | 1 |
下一步是通过对间隔标记(1和0)进行累加总计,将属于彼此的岛进行分组.通过在 sys_id
+ item_id
组合的窗口上执行 sum(gap)
来完成总计.
Next step is to group the the islands that belong to each other by doing a running total over the gap markers (1 and 0). Running total is done by doing a sum(gap)
over the window of sys_id
+ item_id
combo.
sys_id
+ item_id
组合的每个窗口都可以通过对它们进行 partition
来独立地进行操作,即 partition bysys_id,item_id
Each window of sys_id
+ item_id
combo can be operated on independently by doing a partition
on them, i.e., partition by sys_id, item_id
实时测试: http://sqlfiddle.com/#!18/0174b/12
with gap_detector as
(
select
sys_id, item_id,
date_from, date_to,
case when
lag(date_to)
over(partition by sys_id, item_id order by date_from) >= date_from
then
0
else
1
end as gap
from tbl
)
, grouper as
(
select
sys_id, item_id,
date_from, date_to,
gap,
sum(gap) over(partition by sys_id, item_id order by date_from) as grp
from gap_detector
)
select sys_id, item_id, date_from, date_to, gap, grp
from grouper
输出:
| sys_id | item_id | date_from | date_to | gap | grp |
|--------|---------|------------|------------|-----|-----|
| 1 | 1 | 2019-01-01 | 2019-01-20 | 1 | 1 |
| 1 | 1 | 2019-01-15 | 2019-02-10 | 0 | 1 |
| 1 | 1 | 2019-02-15 | 2019-02-20 | 1 | 2 |
| 1 | 1 | 2019-02-18 | 2019-03-10 | 0 | 2 |
| 1 | 1 | 2019-03-10 | 2019-03-22 | 0 | 2 |
| 1 | 2 | 2019-01-01 | 2019-01-10 | 1 | 1 |
| 1 | 2 | 2019-01-15 | 2019-01-25 | 1 | 2 |
最后,现在我们已经能够识别出哪些岛属于彼此(以 grp
表示),只需对这些 grp
标记来标识date_from和date_to在每个岛群( grp
)上何时开始.
Finally, now that we are able to identify which islands belong to each other (denoted by grp
), it's just a matter of doing a group by
on those grp
markers to identify when date_from and date_to started on each group (grp
) of islands.
实时测试: http://sqlfiddle.com/#!18/0174b/13
select
sys_id, item_id,
min(date_from) as date_from,
max(date_to) as date_to
from grouper
group by sys_id, item_id, grp
输出:
| sys_id | item_id | date_from | date_to |
|--------|---------|------------|------------|
| 1 | 1 | 2019-01-01 | 2019-02-10 |
| 1 | 1 | 2019-02-15 | 2019-03-22 |
| 1 | 2 | 2019-01-01 | 2019-01-10 |
| 1 | 2 | 2019-01-15 | 2019-01-25 |
这篇关于按ID对行进行分组,并找到具有日期间隔的最大/最小(date_from,date_to)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!