查找具有延期日期范围的行并累积其持续时间 [英] Find rows with adjourning date ranges and accumulate their durations

查看:70
本文介绍了查找具有延期日期范围的行并累积其持续时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的PostgreSQL数据库存储学校放假,公共假期和周末日期,供父母计划假期。很多时候,学校假期在周末或公共假期休会。我想显示一个学校假期的非上课日总数。



示例数据



位置

  SELECT ID,名称,is_federal_state 
FROM位置
WHERE is_federal_state =真正;



  | id |名称| is_federal_state | 
| ---- | ------------------- | ------------------ |
| 2 |巴登-符腾堡州|真实|
| 3 |拜仁真实|

假日或假期类型

  SELECT id,名称FROM holiday_or_vacation_types; 



  | id |名称| 
| ---- | ----------------------- |
| 1 |赫伯斯特|
| 8 | Wochenende |

Herbst是德语的秋天, Wochenende是德语的周末。 / p>

期间

  SELECT ID,starts_on ,ends_on,holiday_or_vacation_type_id 
从期间
,其中location_id = 2
ORDER BY starts_on;



  | id | starts_on | ends_on | holiday_or_vacation_type_id | 
| ----- | -------------- | -------------- | --------- -------------------- |
| 670 | 2019-10-26 | 2019-10-27 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 8 |



任务



我要 select 所有期间其中 location_id 等于2。我想计算每个期间的持续时间(天)。可以使用以下SQL查询完成:

  SELECT id,starts_on,ends_on,
(ends_on-starts_on + 1 )AS持续时间,
holiday_or_vacation_type_id
FROM期间



  | id | starts_on | ends_on |持续时间holiday_or_vacation_type_id | 
| ----- | -------------- | -------------- | --------- -| ----------------------------- |
| 670 | 2019-10-26 | 2019-10-27 | 2 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 3 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 2 | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 2 | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 2 | 8 |

任何看日历的人都会看到ID 670(周末),532(休会)和533(休假)休会。因此,他们总共要休6天的假期。到目前为止,我是使用一个计算该假期的程序来完成此工作的。但这会占用大量资源(实际表中包含约500,000个项目)。



问题1



哪个SQL查询将导致以下输出(是否添加了 real_duration 列)?

  | id | starts_on | ends_on |持续时间real_duration | holiday_or_vacation_type_id | 
| ----- | -------------- | -------------- | --------- -| ------------------- | ----------------------------- ||
| 670 | 2019-10-26 | 2019-10-27 | 2 | 6 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 3 | 6 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 | 6 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 2 | 2 | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 2 | 2 | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 2 | 2 | 8 |



问题2



有可能在 part_of_range 字段中列出休会期?这就是结果。可以使用SQL吗?

  | id | starts_on | ends_on |持续时间part_of_range | holiday_or_vacation_type_id | 
| ----- | -------------- | -------------- | --------- -| ------------------- | ----------------------------- ||
| 670 | 2019-10-26 | 2019-10-27 | 2 | 670,532,533 | 8 |
| 532 | 2019-10-28 | 2019-10-30 | 3 | 670,532,533 | 1 |
| 533 | 2019-10-31 | 2019-10-31 | 1 | 670,532,533 | 1 |
| 671 | 2019-11-02 | 2019-11-03 | 2 | | 8 |
| 672 | 2019-11-09 | 2019-11-10 | 2 | | 8 |
| 673 | 2019-11-16 | 2019-11-17 | 2 | | 8 |


解决方案

这是一个空白和孤岛的问题。在这种情况下,您可以使用 lag()来查看某岛的起点,然后是一个累积的总和。



最后的操作是一些聚合(使用窗口函数):

  SELECT p。*,
(Max(ends_on)OVER (PARTITION BY location_id,grp)-最小(starts_on)OVER(PARTITION BY location_id,grp))+ 1 AS持续时间,
Array_agg(p.id)OVER(PARTITION BY location_id)
FROM(选择p 。*,
Count(*)过滤器(WHERE prev_eo< starts_on-INTERVAL'1 day')OVER(PARTITION BY location_id ORDER BY starts_on)AS grp
FROM(SELECT id,starts_on,ends_on,location_id ,holiday_or_vacation_type_id,
滞后(ends_on)超过(按位置ID的顺序划分ORDER BY(starts_on))AS prev_eo
从期间
)p
)p;


My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.

Example Data

locations

SELECT id, name, is_federal_state 
FROM locations 
WHERE is_federal_state = true;

| id | name              | is_federal_state |
|----|-------------------|------------------|
| 2  | Baden-Württemberg | true             |
| 3  | Bayern            | true             |

holiday_or_vacation_types

SELECT id, name FROM holiday_or_vacation_types;

| id | name                  |
|----|-----------------------|
| 1  | Herbst                |
| 8  | Wochenende            |

"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".

periods

SELECT id, starts_on, ends_on, holiday_or_vacation_type_id 
FROM periods 
WHERE location_id = 2 
ORDER BY starts_on;

| id  | starts_on    | ends_on      | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 8                           |

Task

I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:

SELECT id, starts_on, ends_on, 
       (ends_on - starts_on + 1) AS duration, 
       holiday_or_vacation_type_id 
FROM periods

| id  | starts_on    | ends_on      | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        | 8                           |

Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).

Problem 1

Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?

| id  | starts_on    | ends_on      | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 6             | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 6             | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 6             | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        | 2             | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        | 2             | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        | 2             | 8                           |

Problem 2

It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?

| id  | starts_on    | ends_on      | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 670,532,533   | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 670,532,533   | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 670,532,533   | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        |               | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        |               | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        |               | 8                           |

解决方案

This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.

The final operation is some aggregation (using window functions):

SELECT p.*, 
      (Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
      Array_agg(p.id) OVER (PARTITION BY location_id) 
FROM (SELECT p.*, 
             Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
      FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id, 
                   lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
            FROM periods 
           ) p
     ) p;

这篇关于查找具有延期日期范围的行并累积其持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆