查找具有延期日期范围的行并累积其持续时间 [英] Find rows with adjourning date ranges and accumulate their durations
问题描述
我的PostgreSQL数据库存储学校放假,公共假期和周末日期,供父母计划假期。很多时候,学校假期在周末或公共假期休会。我想显示一个学校假期的非上课日总数。
示例数据
位置
SELECT ID,名称,is_federal_state
FROM位置
WHERE is_federal_state =真正;
| id |名称| is_federal_state |
| ---- | ------------------- | ------------------ |
| 2 |巴登-符腾堡州|真实|
| 3 |拜仁真实|
假日或假期类型
SELECT id,名称FROM holiday_or_vacation_types;
| id |名称|
| ---- | ----------------------- |
| 1 |赫伯斯特|
| 8 | Wochenende |
Herbst是德语的秋天, Wochenende是德语的周末。 / p>
期间
SELECT ID,starts_on ,ends_on,holiday_or_vacation_type_id
从期间
,其中location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
| ----- | -------------- | -------------- | --------- -------------------- |
| 670 | 2019-10-26 | 2019-10-27 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 8 |
任务
我要 select
所有期间
其中 location_id
等于2。我想计算每个期间的持续时间(天)。可以使用以下SQL查询完成:
SELECT id,starts_on,ends_on,
(ends_on-starts_on + 1 )AS持续时间,
holiday_or_vacation_type_id
FROM期间
| id | starts_on | ends_on |持续时间holiday_or_vacation_type_id |
| ----- | -------------- | -------------- | --------- -| ----------------------------- |
| 670 | 2019-10-26 | 2019-10-27 | 2 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 3 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 2 | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 2 | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 2 | 8 |
任何看日历的人都会看到ID 670(周末),532(休会)和533(休假)休会。因此,他们总共要休6天的假期。到目前为止,我是使用一个计算该假期的程序来完成此工作的。但这会占用大量资源(实际表中包含约500,000个项目)。
问题1
哪个SQL查询将导致以下输出(是否添加了 real_duration
列)?
| id | starts_on | ends_on |持续时间real_duration | holiday_or_vacation_type_id |
| ----- | -------------- | -------------- | --------- -| ------------------- | ----------------------------- ||
| 670 | 2019-10-26 | 2019-10-27 | 2 | 6 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 3 | 6 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 | 6 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 2 | 2 | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 2 | 2 | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 2 | 2 | 8 |
问题2
有可能在 part_of_range
字段中列出休会期?这就是结果。可以使用SQL吗?
| id | starts_on | ends_on |持续时间part_of_range | holiday_or_vacation_type_id |
| ----- | -------------- | -------------- | --------- -| ------------------- | ----------------------------- ||
| 670 | 2019-10-26 | 2019-10-27 | 2 | 670,532,533 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 3 | 670,532,533 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 | 670,532,533 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 2 | | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 2 | | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 2 | | 8 |
这是一个空白和孤岛的问题。在这种情况下,您可以使用 lag()
来查看某岛的起点,然后是一个累积的总和。
最后的操作是一些聚合(使用窗口函数):
SELECT p。*,
(Max(ends_on)OVER (PARTITION BY location_id,grp)-最小(starts_on)OVER(PARTITION BY location_id,grp))+ 1 AS持续时间,
Array_agg(p.id)OVER(PARTITION BY location_id)
FROM(选择p 。*,
Count(*)过滤器(WHERE prev_eo< starts_on-INTERVAL'1 day')OVER(PARTITION BY location_id ORDER BY starts_on)AS grp
FROM(SELECT id,starts_on,ends_on,location_id ,holiday_or_vacation_type_id,
滞后(ends_on)超过(按位置ID的顺序划分ORDER BY(starts_on))AS prev_eo
从期间
)p
)p;
My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.
Example Data
locations
SELECT id, name, is_federal_state
FROM locations
WHERE is_federal_state = true;
| id | name | is_federal_state |
|----|-------------------|------------------|
| 2 | Baden-Württemberg | true |
| 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types;
| id | name |
|----|-----------------------|
| 1 | Herbst |
| 8 | Wochenende |
"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".
periods
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id
FROM periods
WHERE location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 8 |
Task
I want to select
all periods
where location_id
equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:
SELECT id, starts_on, ends_on,
(ends_on - starts_on + 1) AS duration,
holiday_or_vacation_type_id
FROM periods
| id | starts_on | ends_on | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).
Problem 1
Which SQL query would result in the following output (is adds a real_duration
column)? Is that even possible with SQL?
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
Problem 2
It is possible to list the adjourning periods in a part_of_range
field? This would be the result. Can that be done with SQL?
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
This is a gaps and islands problem. In this case you can use lag()
to see where an island starts and then a cumulative sum.
The final operation is some aggregation (using window functions):
SELECT p.*,
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id,
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods
) p
) p;
这篇关于查找具有延期日期范围的行并累积其持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!