复杂的“缺口和离岛”问题 [英] Complex 'Gaps and Islands' issue
问题描述
我在Postgres DB中有一个这样的表:
I have a table in a Postgres DB like this:
person | eventdate | type
--------------------------------------
<uuid-1> | 2016-05-14 | 300
<uuid-3> | 2016-05-14 | 300
<uuid-1> | 2016-05-15 | 301
<uuid-1> | 2016-05-16 | 301
<uuid-1> | 2016-05-18 | 304
<uuid-1> | 2016-05-22 | 300
<uuid-2> | 2016-05-22 | 304
<uuid-2> | 2016-05-27 | 301
<uuid-1> | 2016-05-30 | 300
<uuid-1> | 2016-06-01 | 300
<uuid-2> | 2016-06-15 | 501
<uuid-2> | 2016-06-16 | 301
<uuid-4> | 2016-06-16 | 300
<uuid-5> | 2016-06-20 | 300
<uuid-1> | 2016-06-21 | 300
<uuid-2> | 2016-06-21 | 300
<uuid-2> | 2016-06-23 | 301
<uuid-2> | 2016-06-30 | 300
<uuid-3> | 2016-06-30 | 300
<uuid-4> | 2016-06-30 | 300
该表包含员工因各种原因缺勤的每一天的非连续日条目(类型)缺席。
但是缺勤期可能会跨越几天,并且在以前没有相同类型的5天之内的任何缺勤条目仍被视为相同缺勤期间的一部分。
The table contains non consecutive day entries for each day an employee is absence for different reasons (types) of absence. However an absence period could span several of these days and any absence entry that is within 5 days of a previous absence of the same type is still considered part of the same absence 'period'.
我需要获取每个员工缺勤期间的输出,包括这些期间的开始和结束日期,以及该跨多个日期的期间内的总天数。
I need to get output for each employees absences periods with start and end date of those periods, plus the total number of days within that multi-date-spanned period.
由于本报告目的不同类型的缺勤被认为是相同的,这使情况更加复杂。因此,在上面的示例中,类型300、301、304将被视为相同。
This is further complicated by the fact that different types of absence are considered the same for the purpose of this report. So in the example above, type 300, 301, 304 would be treated as the same.
因此,在我上面的示例中,以下内容是我想要的...
So from my example above the following would be what I am after ...
person | startdate | enddate | days | type
--------------------------------------------------------------------
<uuid-1> | 2016-05-14 | 2016-05-22 | 5 | 300
<uuid-3> | 2016-05-14 | 2016-04-14 | 1 | 300
<uuid-2> | 2016-05-22 | 2016-04-27 | 2 | 304
<uuid-1> | 2016-05-30 | 2016-06-01 | 2 | 300
<uuid-2> | 2016-06-15 | 2016-06-15 | 1 | 501
<uuid-2> | 2016-06-16 | 2016-06-16 | 1 | 301
<uuid-4> | 2016-06-16 | 2016-06-16 | 1 | 300
<uuid-5> | 2016-06-20 | 2016-06-20 | 1 | 300
<uuid-1> | 2016-06-21 | 2016-06-21 | 1 | 300
<uuid-2> | 2016-06-21 | 2016-06-23 | 2 | 300
<uuid-2> | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-3> | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-4> | 2016-06-30 | 2016-06-30 | 1 | 300
如何查询此表以显示此输出?
How do I query this table to this output?
推荐答案
尚不清楚如何确定每个期间的类型
。我选择了最小数量。
It's unclear how you determine the type
for each period. I chose the minimum number.
假设基本表定义为:
CREATE TABLE tbl (person text, eventdate date, type int);
基本上,我建议窗口函数,以标识同一时期(岛屿)的成员。然后合计:
Basically, I suggest window functions in two nested subqueries to identify members of the same period (island). Then aggregate:
SELECT person, period
, min(eventdate) AS startdate
, max(eventdate) AS enddate
, count(*) AS days
, min(type) AS type
FROM (
SELECT person, eventdate, type
, count(gap) OVER (PARTITION BY person ORDER BY eventdate) AS period
FROM (
SELECT person, eventdate, type
, CASE WHEN lag(eventdate) OVER (PARTITION BY person ORDER BY eventdate)
> eventdate - 6 -- within 5 days
THEN NULL -- same period
ELSE TRUE -- next period
END AS gap
FROM tbl
) sub
) sub
GROUP BY person, period
ORDER BY person, period;
结果(基于您的示例数据):
Result (based on your example data):
person | period | startdate | enddate | days | type
----------+--------+------------+------------+------+------
<uuid-1> | 1 | 2016-05-14 | 2016-05-22 | 5 | 300
<uuid-1> | 2 | 2016-05-30 | 2016-06-01 | 2 | 300
<uuid-1> | 3 | 2016-06-21 | 2016-06-21 | 1 | 300
<uuid-2> | 1 | 2016-05-22 | 2016-05-27 | 2 | 301
<uuid-2> | 2 | 2016-06-15 | 2016-06-23 | 4 | 300
<uuid-2> | 3 | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-3> | 1 | 2016-05-14 | 2016-05-14 | 1 | 300
<uuid-3> | 2 | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-4> | 1 | 2016-06-16 | 2016-06-16 | 1 | 300
<uuid-4> | 2 | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-5> | 1 | 2016-06-20 | 2016-06-20 | 1 | 300
如果可以用不同的类型多次输入同一个人的同一天,并且仅想要计算不同天,使其为: count(DISTINCT eventdate)AS days
。
If the same day for the same person can be entered multiple times with different types, and you only want to count distinct days, make it: count(DISTINCT eventdate) AS days
.
相关,带有详细说明:
- Select longest continuous sequence
- How to label groups in postgresql when group belonging depends on the preceding line?
BTW,事件日期-6
适用于数据类型 date
,但不是 timestamp
:
BTW, eventdate - 6
works for data type date
, but not for timestamp
:
- How do I determine the last day of the previous month using PostgreSQL?
这篇关于复杂的“缺口和离岛”问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!