学校假期设置中的重叠差距和孤岛 [英] Overlapping gaps and islands in a school vacation setup
问题描述
我必须使用这个 periods
表:
期间
id | starts_on | ends_on
----+------------+------------
678 | 2019-12-21 | 2019-12-22
534 | 2019-12-23 | 2020-01-04
679 | 2019-12-28 | 2019-12-29
9 | 2020-01-01 | 2020-01-01
776 | 2020-01-04 | 2020-01-05
7 | 2020-01-06 | 2020-01-06
777 | 2020-01-11 | 2020-01-12
它列出了学生不必上学的所有时间段.不幸的是,有些时期是重叠的.当在学校放假期间发生周末或公共假期时(每个人都有自己的时段行),就会发生这种情况.
It lists all periods where students don't have to go to school. Unfortunately some of the periods overlap. This happens when during a school vacation a weekend or a public holiday occurs (each of them have their own periods rows).
With the help of Find rows with adjourning date ranges and accumulate their durations and Gaps and islands for school vacations in a country with federal states I ended up with this query:
SELECT p.id, p.starts_on, p.ends_on, grp,
(Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp)
) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
(PARTITION BY 1
ORDER BY starts_on
) AS grp
FROM (SELECT p.*,
lag(ends_on) OVER (PARTITION BY 1 ORDER BY starts_on) AS prev_eo
FROM (SELECT p.id, p.starts_on, p.ends_on FROM periods p
WHERE starts_on > '2019-12-15' AND
starts_on < '2020-01-15' ) p
) p
) p;
我得到的东西
结果为
id | starts_on | ends_on | grp | duration | array_agg
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 | 0 | 15 | {678,534,679}
534 | 2019-12-23 | 2020-01-04 | 0 | 15 | {678,534,679}
679 | 2019-12-28 | 2019-12-29 | 0 | 15 | {678,534,679}
9 | 2020-01-01 | 2020-01-01 | 1 | 1 | {9}
776 | 2020-01-04 | 2020-01-05 | 2 | 3 | {776,7}
7 | 2020-01-06 | 2020-01-06 | 2 | 3 | {776,7}
777 | 2020-01-11 | 2020-01-12 | 3 | 2 | {777}
前三行是 grp
0(ids 678、534和679).
The first three lines are the grp
0 (ids 678, 534 and 679).
但是ID 9、776和7也应该属于该 grp
.不幸的是,它们重叠了.是否有可能得到这样的结果(我不在乎顺序)?
But the ids 9, 776 and 7 should belong to that grp
too. Unfortunately they overlap. Is it possible to get a result which is somehow like this (I don't care about the order)?
id | starts_on | ends_on | grp | duration | array_agg
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 | 0 | 17 | {678,534,679,9,776,7}
534 | 2019-12-23 | 2020-01-04 | 0 | 17 | {678,534,679,9,776,7}
679 | 2019-12-28 | 2019-12-29 | 0 | 17 | {678,534,679,9,776,7}
9 | 2020-01-01 | 2020-01-01 | 0 | 17 | {678,534,679,9,776,7}
776 | 2020-01-04 | 2020-01-05 | 0 | 17 | {678,534,679,9,776,7}
7 | 2020-01-06 | 2020-01-06 | 0 | 17 | {678,534,679,9,776,7}
777 | 2020-01-11 | 2020-01-12 | 1 | 2 | {777}
我想知道总岛(grp 0)以天为单位的时间以及它包含的期间ID.
I want to know how long the total island (grp 0) is in days and which period ids it contains.
沙箱: https://rextester.com/SHVL41709
推荐答案
这是您其他问题的一个有趣变体.问题是 lag()
仅查看前一行以检查是否有重叠.相反,您想查看所有前面的行.
This is an interesting variant of your other problems. The issue is that lag()
only looks at the one preceding row to check for an overlap. Instead, you want to look at all preceding rows.
幸运的是,您可以为此使用累积的 max()
:
Fortunately, you can use a cumulative max()
for this purpose:
SELECT p.id, p.starts_on, p.ends_on, grp,
(Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp)
) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
(PARTITION BY 1
ORDER BY starts_on
) AS grp
FROM (SELECT p.*,
MAX(ends_on) OVER (ORDER BY starts_on ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_eo
FROM (SELECT p.id, p.starts_on, p.ends_on
FROM periods p
WHERE starts_on > '2019-12-15' AND
starts_on < '2020-01-15'
) p
) p
) p;
我不确定 PARTITION BY 1
应该做什么,但是我没有包括它.
I'm not sure what the PARTITION BY 1
is supposed to be doing, but I didn't include it.
这里是一个雷斯特.
预见您的下一个问题.这是一个挑战:如果开始时间相等,则累积最大值将不稳定.在这种情况下,您要么要删除重复项,要么要使累积最大值的排序保持稳定.
To anticipate your next question. This has one challenge: if the start times are ever equal, then the cumulative max is not stable. In that case, you either want to remove the duplicates or make the sorting for the cumulative max stable.
这篇关于学校假期设置中的重叠差距和孤岛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!