学校假期设置中的重叠差距和孤岛 [英] Overlapping gaps and islands in a school vacation setup

查看:53
本文介绍了学校假期设置中的重叠差距和孤岛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须使用这个 periods 表:

期间

id  | starts_on  |  ends_on   
----+------------+------------
678 | 2019-12-21 | 2019-12-22
534 | 2019-12-23 | 2020-01-04
679 | 2019-12-28 | 2019-12-29
  9 | 2020-01-01 | 2020-01-01
776 | 2020-01-04 | 2020-01-05
  7 | 2020-01-06 | 2020-01-06
777 | 2020-01-11 | 2020-01-12

它列出了学生不必上学的所有时间段.不幸的是,有些时期是重叠的.当在学校放假期间发生周末或公共假期时(每个人都有自己的时段行),就会发生这种情况.

It lists all periods where students don't have to go to school. Unfortunately some of the periods overlap. This happens when during a school vacation a weekend or a public holiday occurs (each of them have their own periods rows).

的帮助下延期日期范围并累积其持续时间

With the help of Find rows with adjourning date ranges and accumulate their durations and Gaps and islands for school vacations in a country with federal states I ended up with this query:

SELECT p.id, p.starts_on, p.ends_on, grp,
      (Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp) 
      ) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp) 
FROM (SELECT p.*,
            Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
                (PARTITION BY 1 
                  ORDER BY starts_on
                ) AS grp 
      FROM (SELECT p.*,
                  lag(ends_on) OVER (PARTITION BY 1 ORDER BY starts_on) AS prev_eo 
            FROM (SELECT p.id, p.starts_on, p.ends_on FROM periods p
            WHERE starts_on > '2019-12-15' AND
                  starts_on < '2020-01-15' ) p 
          ) p 
  ) p;

我得到的东西

结果为

id  | starts_on  |  ends_on   | grp | duration |   array_agg   
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 |   0 |       15 | {678,534,679}
534 | 2019-12-23 | 2020-01-04 |   0 |       15 | {678,534,679}
679 | 2019-12-28 | 2019-12-29 |   0 |       15 | {678,534,679}
  9 | 2020-01-01 | 2020-01-01 |   1 |        1 | {9}
776 | 2020-01-04 | 2020-01-05 |   2 |        3 | {776,7}
  7 | 2020-01-06 | 2020-01-06 |   2 |        3 | {776,7}
777 | 2020-01-11 | 2020-01-12 |   3 |        2 | {777}

前三行是 grp 0(ids 678、534和679).

The first three lines are the grp 0 (ids 678, 534 and 679).

但是ID 9、776和7也应该属于该 grp .不幸的是,它们重叠了.是否有可能得到这样的结果(我不在乎顺序)?

But the ids 9, 776 and 7 should belong to that grp too. Unfortunately they overlap. Is it possible to get a result which is somehow like this (I don't care about the order)?

id  | starts_on  |  ends_on   | grp | duration |   array_agg   
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 |   0 |       17 | {678,534,679,9,776,7}
534 | 2019-12-23 | 2020-01-04 |   0 |       17 | {678,534,679,9,776,7}
679 | 2019-12-28 | 2019-12-29 |   0 |       17 | {678,534,679,9,776,7}
  9 | 2020-01-01 | 2020-01-01 |   0 |       17 | {678,534,679,9,776,7}
776 | 2020-01-04 | 2020-01-05 |   0 |       17 | {678,534,679,9,776,7}
  7 | 2020-01-06 | 2020-01-06 |   0 |       17 | {678,534,679,9,776,7}
777 | 2020-01-11 | 2020-01-12 |   1 |        2 | {777}

我想知道总岛(grp 0)以天为单位的时间以及它包含的期间ID.

I want to know how long the total island (grp 0) is in days and which period ids it contains.

沙箱: https://rextester.com/SHVL41709

推荐答案

这是您其他问题的一个有趣变体.问题是 lag()仅查看前一行以检查是否有重叠.相反,您想查看所有前面的行.

This is an interesting variant of your other problems. The issue is that lag() only looks at the one preceding row to check for an overlap. Instead, you want to look at all preceding rows.

幸运的是,您可以为此使用累积的 max():

Fortunately, you can use a cumulative max() for this purpose:

SELECT p.id, p.starts_on, p.ends_on, grp,
      (Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp) 
      ) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp) 
FROM (SELECT p.*,
            Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
                (PARTITION BY 1 
                  ORDER BY starts_on
                ) AS grp 
      FROM (SELECT p.*,
                  MAX(ends_on) OVER (ORDER BY starts_on ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_eo 
            FROM (SELECT p.id, p.starts_on, p.ends_on 
                  FROM periods p
                  WHERE starts_on > '2019-12-15' AND
                        starts_on < '2020-01-15'
                 ) p 
          ) p 
  ) p;

我不确定 PARTITION BY 1 应该做什么,但是我没有包括它.

I'm not sure what the PARTITION BY 1 is supposed to be doing, but I didn't include it.

这里是一个雷斯特.

预见您的下一个问题.这是一个挑战:如果开始时间相等,则累积最大值将不稳定.在这种情况下,您要么要删除重复项,要么要使累积最大值的排序保持稳定.

To anticipate your next question. This has one challenge: if the start times are ever equal, then the cumulative max is not stable. In that case, you either want to remove the duplicates or make the sorting for the cumulative max stable.

这篇关于学校假期设置中的重叠差距和孤岛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆