从重叠的日期范围中获取不同的连续日期范围 [英] Get distinct consecutive date ranges from overlapping date ranges

查看:50
本文介绍了从重叠的日期范围中获取不同的连续日期范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从重叠日期列表中获取彼此不重叠的日期范围列表,并在重叠期间获取硬币总数.我试过用谷歌搜索一个例子,但到目前为止没有运气.我可能没有使用正确的关键词?

I need to get a list of date ranges that are NOT overlapping with each other from a list of overlapping dates and get the sum of coins during that overlap. I have tried googling for an example but no luck so far. I might not be using the right key words?

我有一个重叠日期的列表

I have a list of overlapping dates

1.1.2018 - 31.1.2018 80
7.1.2018 - 10.1.2018 10
7.1.2018 - 31.1.2018 10
11.1.2018 - 31.1.2018 5
25.1.2018 - 27.1.2018 5
2.2.2018 - 23.2.2018 100

期望的结果是

1.1.2018 - 6.7.2018 80 coins
7.1.2018 - 10.1.2018 100 coins
11.1.2018 - 24.1.2018 95 coins
25.1.2018 - 27.1.2018 100 coins
28.1.2018 - 31.1.2018 95 coins
2.2.2018 - 23.2.2018 100 coins

这是它应该如何工作的图

Here is a figure how it should work

|------------------------------|
       |---|
       |-----------------------|
           |-------------------|
                      |---|
                                   |----------------------|
Outcome              
|------|---|----------|---|----|   |----------------------|
   80   100     95     100  95                100

这是我的测试数据

drop table coinsonperiod2;
create table coinsonperiod2(
  id serial,
  startdate date,
  enddate date,
  coins integer,
  userid integer
);
insert into coinsonperiod2 (startdate, enddate, coins,userid) values
  ('2018-01-01','2018-01-31', 80,1)
, ('2018-01-07','2018-01-10', 10,1)
, ('2018-01-07','2018-01-31', 10,1)
, ('2018-01-11','2018-01-31', 5,1)
, ('2018-01-25','2018-01-27', 5,1)
, ('2018-02-02','2018-02-23', 100,2)
, ('2018-01-01','2018-01-31', 80,2)
, ('2018-01-07','2018-01-10', 10,2)
, ('2018-01-07','2018-01-31', 10,2)
, ('2018-01-11','2018-01-31', 5,2)
, ('2018-01-25','2018-01-27', 5,2)
, ('2018-02-02','2018-02-23', 100,3)
; 

更新:实际上,StephenM 和 joops 的答案不符合我想要的结果.两个答案都显示结束日期错误.

UPDATE: Actually StephenM's and joops answers do not meet my desired outcome. Both answers show enddate wrong.

当一个时期结束时,下一个应该在第二天开始(如果有间隙,则更晚).在我想要的结果中,1.1.2018-6.1.2018 包括第 6 天.6th 和 7th 之间没有差距,因为 7th 包含在 7.1.2018-10.1.2018 中.

When one period ends the next should start next day (or later if there is a gap). In my desired outcome 1.1.2018-6.1.2018 includes the 6th day. There is no gap between 6th and 7th because 7th is included in 7.1.2018-10.1.2018.

更新2:现在我明白了开区间、半开区间和闭区间的区别.在 joops 解决方案中,必须针对半开区间进行计算,但我想要的结果是闭区间.这就是为什么必须减少结束日期以使结果为闭区间的原因.如果我错了,请纠正我.

UPDATE2: Now I understood what is the difference between open, half open and closed intervals. In joops solution, calculation must be done against half open intervals, but my desired outcome is closed interval. That is why enddate must be reduced to make the outcome as closed interval. Correct me if I am wrong.

我还在示例数据中添加了 userid,并进一步修改了 joops 解决方案.这是给出了我想要的结果的查询.

I also added userid in the sample data and modified joops solution some more. Here is the query that gives me my desired outcome.

with changes AS (
  SELECT
    userid,
    startdate AS tickdate,
    coins,
    1         AS cover
  FROM coinsonperiod2
  UNION ALL
  -- add 1 day to correct intervals into half open intervals, so the calculation is correct
  SELECT
    userid,
    1 + enddate AS tickdate,
    -1 * coins,
    -1          AS cover
  FROM coinsonperiod2
)
, sumchanges  AS (
    SELECT
      userid,
      tickdate,
      SUM(coins) AS change,
      SUM(cover) AS cover
    FROM changes
    GROUP BY tickdate, userid
)
, aggregated AS (
    SELECT
      userid   AS userid,
      tickdate AS startdate,
      lead(tickdate)
      over www AS enddate,
      sum(change)
      OVER www AS cash,
      sum(cover)
      OVER www AS cover
    FROM sumchanges
    WINDOW www AS (
      partition by userid
      ORDER BY tickdate )
)
-- reduce 1 day from the enddate to make closed interval
SELECT
userid
, startdate
, enddate-1 as enddate
, cash
, cover
FROM aggregated
WHERE cover > 0
ORDER BY userid, startdate
;

结果:

推荐答案

逻辑是:

  • 在区间的开始将其值添加到累积和
  • 在区间的结束从这个总和中减去它的值
  • 但是为了扫描日期变更线,我们必须收集所有(唯一的)日期/时间戳,无论是开始还是停止.
  • at the beginning of an interval add its value to a cumulative sum
  • at the end of an interval substract its value from this sum
  • but in order to sweep the dateline, we'll have to collect al the (unique) date/time stamps, either start or stop.

所以重点是:将数据从一系列间隔转换为一系列(开始/停止)事件,然后聚合这些数据.

So the point is: convert the data from a series of intervals to a series of (start/stop) events, and aggregate over these.

-- \i tmp.sql

create table coinsonperiod(
  id serial,
  startdate date,
  enddate date,
  coins integer
);
insert into coinsonperiod (startdate, enddate, coins) values
  ('2018-01-01','2018-01-31', 80)
, ('2018-01-07','2018-01-10', 10)
, ('2018-01-07','2018-01-31', 10)
, ('2018-01-11','2018-01-31', 5)
, ('2018-01-25','2018-01-27', 5)
, ('2018-02-02','2018-02-23', 100)
        ;

WITH changes AS (
    SELECT startdate AS tickdate , coins
            , 1 AS cover
    FROM coinsonperiod
    UNION ALL
    -- add 1 day to convert to half-open intervals
    SELECT 1+enddate AS tickdate, -1* coins
            , -1 AS cover
    FROM coinsonperiod
    )
, sumchanges  AS (
        SELECT tickdate, SUM(coins) AS change, SUM(cover) AS cover
        FROM changes
        GROUP BY tickdate
        )
, aggregated AS (
        SELECT
        tickdate AS startdate
        , lead(tickdate) over www AS enddate
        , sum(change) OVER www AS cash
          -- number of covered intervals
        , sum(cover) OVER www AS cover
        FROM sumchanges
        WINDOW www AS (ORDER BY tickdate)
        )
             -- substract one day from enddate to correct back to closed intervals
SELECT startdate, enddate-1 AS enddate, cash, cover
FROM aggregated
WHERE cover > 0
ORDER BY startdate
        ;

这篇关于从重叠的日期范围中获取不同的连续日期范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆