复杂的“缺口和离岛”问题 [英] Complex 'Gaps and Islands' issue

查看:94
本文介绍了复杂的“缺口和离岛”问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Postgres DB中有一个这样的表:

I have a table in a Postgres DB like this:

person    |     eventdate     |  type 
--------------------------------------
<uuid-1>  |   2016-05-14      |   300
<uuid-3>  |   2016-05-14      |   300
<uuid-1>  |   2016-05-15      |   301
<uuid-1>  |   2016-05-16      |   301
<uuid-1>  |   2016-05-18      |   304
<uuid-1>  |   2016-05-22      |   300
<uuid-2>  |   2016-05-22      |   304
<uuid-2>  |   2016-05-27      |   301
<uuid-1>  |   2016-05-30      |   300
<uuid-1>  |   2016-06-01      |   300
<uuid-2>  |   2016-06-15      |   501
<uuid-2>  |   2016-06-16      |   301
<uuid-4>  |   2016-06-16      |   300
<uuid-5>  |   2016-06-20      |   300
<uuid-1>  |   2016-06-21      |   300
<uuid-2>  |   2016-06-21      |   300
<uuid-2>  |   2016-06-23      |   301
<uuid-2>  |   2016-06-30      |   300
<uuid-3>  |   2016-06-30      |   300
<uuid-4>  |   2016-06-30      |   300

该表包含员工因各种原因缺勤的每一天的非连续日条目(类型)缺席。
但是缺勤期可能会跨越几天,并且在以前没有相同类型的5天之内的任何缺勤条目仍被视为相同缺勤期间的一部分。

The table contains non consecutive day entries for each day an employee is absence for different reasons (types) of absence. However an absence period could span several of these days and any absence entry that is within 5 days of a previous absence of the same type is still considered part of the same absence 'period'.

我需要获取每个员工缺勤期间的输出,包括这些期间的开始和结束日期,以及该跨多个日期的期间内的总天数。

I need to get output for each employees absences periods with start and end date of those periods, plus the total number of days within that multi-date-spanned period.

由于本报告目的不同类型的缺勤被认为是相同的,这使情况更加复杂。因此,在上面的示例中,类型300、301、304将被视为相同。

This is further complicated by the fact that different types of absence are considered the same for the purpose of this report. So in the example above, type 300, 301, 304 would be treated as the same.

因此,在我上面的示例中,以下内容是我想要的...

So from my example above the following would be what I am after ...

person    |     startdate     |       enddate     |  days   |  type 
--------------------------------------------------------------------
<uuid-1>  |     2016-05-14    |     2016-05-22    |   5     |   300
<uuid-3>  |     2016-05-14    |     2016-04-14    |   1     |   300
<uuid-2>  |     2016-05-22    |     2016-04-27    |   2     |   304
<uuid-1>  |     2016-05-30    |     2016-06-01    |   2     |   300
<uuid-2>  |     2016-06-15    |     2016-06-15    |   1     |   501
<uuid-2>  |     2016-06-16    |     2016-06-16    |   1     |   301
<uuid-4>  |     2016-06-16    |     2016-06-16    |   1     |   300
<uuid-5>  |     2016-06-20    |     2016-06-20    |   1     |   300
<uuid-1>  |     2016-06-21    |     2016-06-21    |   1     |   300
<uuid-2>  |     2016-06-21    |     2016-06-23    |   2     |   300
<uuid-2>  |     2016-06-30    |     2016-06-30    |   1     |   300
<uuid-3>  |     2016-06-30    |     2016-06-30    |   1     |   300
<uuid-4>  |     2016-06-30    |     2016-06-30    |   1     |   300

如何查询此表以显示此输出?

How do I query this table to this output?

推荐答案

尚不清楚如何确定每个期间的类型。我选择了最小数量。

It's unclear how you determine the type for each period. I chose the minimum number.

假设基本表定义为:

CREATE TABLE tbl (person text, eventdate date, type int);

基本上,我建议窗口函数,以标识同一时期(岛屿)的成员。然后合计:

Basically, I suggest window functions in two nested subqueries to identify members of the same period (island). Then aggregate:

SELECT person, period
     , min(eventdate) AS startdate
     , max(eventdate) AS enddate
     , count(*)       AS days
     , min(type)      AS type
FROM  (
   SELECT person, eventdate, type
        , count(gap) OVER (PARTITION BY person ORDER BY eventdate) AS period
   FROM  (
      SELECT person, eventdate, type
           , CASE WHEN lag(eventdate) OVER (PARTITION BY person ORDER BY eventdate)
                     > eventdate - 6  -- within 5 days
                  THEN NULL           -- same period
                  ELSE TRUE           -- next period
             END AS gap
      FROM   tbl
      ) sub
   ) sub
GROUP  BY person, period
ORDER  BY person, period;

结果(基于您的示例数据):

Result (based on your example data):

  person  | period | startdate  |  enddate   | days | type
----------+--------+------------+------------+------+------
 <uuid-1> |      1 | 2016-05-14 | 2016-05-22 |    5 |  300
 <uuid-1> |      2 | 2016-05-30 | 2016-06-01 |    2 |  300
 <uuid-1> |      3 | 2016-06-21 | 2016-06-21 |    1 |  300
 <uuid-2> |      1 | 2016-05-22 | 2016-05-27 |    2 |  301
 <uuid-2> |      2 | 2016-06-15 | 2016-06-23 |    4 |  300
 <uuid-2> |      3 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-3> |      1 | 2016-05-14 | 2016-05-14 |    1 |  300
 <uuid-3> |      2 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-4> |      1 | 2016-06-16 | 2016-06-16 |    1 |  300
 <uuid-4> |      2 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-5> |      1 | 2016-06-20 | 2016-06-20 |    1 |  300

如果可以用不同的类型多次输入同一个人的同一天,并且仅想要计算不同天,使其为: count(DISTINCT eventdate)AS days

If the same day for the same person can be entered multiple times with different types, and you only want to count distinct days, make it: count(DISTINCT eventdate) AS days.

相关,带有详细说明:

  • Select longest continuous sequence
  • How to label groups in postgresql when group belonging depends on the preceding line?

BTW,事件日期-6 适用于数据类型 date ,但不是 timestamp

BTW, eventdate - 6 works for data type date, but not for timestamp:

  • How do I determine the last day of the previous month using PostgreSQL?

这篇关于复杂的“缺口和离岛”问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆