如何忽略间隔为30天的行? [英] How to ignore rows with moving 30 day interval?

查看:48
本文介绍了如何忽略间隔为30天的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我陷入了SQL问题.假设我们在Redshift中有一个像这样的数据集:

I got stuck with a SQL problem. Let's say we have a dataset like this in Redshift :

account_id  day          event_id
111         2019-01-01   1000
111         2019-01-02   1001
111         2019-01-02   1002
111         2019-01-10   1003
111         2019-01-25   1004
111         2019-02-05   1005
111         2019-02-24   1006
111         2019-02-28   1007
111         2019-03-02   1008
111         2019-03-15   1009
222         2019-01-01   1000
222         2019-01-02   1001
222         2019-01-02   1002
222         2019-01-10   1003
222         2019-01-25   1004
222         2019-02-05   1005
222         2019-02-24   1006
222         2019-02-28   1007
222         2019-03-02   1008
222         2019-03-15   1009

我需要选择在窗口PER ACCOUNT_ID的30天后发生的event_id,但是然后根据我发现的第一个事件日期更改新窗口的开始日期.

I need to pick event_ids that happen after 30 days of window PER ACCOUNT_ID, but then change starting date of new window based on the first event date I find.

所以在这种情况下,对于ACCOUNT_IDS 111和222都是这样:

so in this case FOR BOTH ACCOUNT_IDS 111 and 222:

  • 我们首先选择event_id = 1000,然后我们应该忽略一切,直到2月1日(30天)
  • 然后我们选择event_id = 1005,直到3月5日,我们都应忽略所有内容(因为event_id = 1005在2月5日发生)
  • 然后我们在3月15日提取event_id = 1009,并且应该忽略所有内容,直到4月15日...

你得到图片了.

该怎么做?

推荐答案

我也找不到纯粹基于窗口函数的解决方案.

I also couldn't find a solution purely based on window functions.

但是在PostgreSql中,递归CTE可以做到这一点.

But in PostgreSql a recursive CTE works for this.

临时表用于具有可用于连接到下一条记录的ID.

The temp table is used to have an id that can be used to connect to the next record.

CREATE TEMPORARY TABLE tempEventDates (
 id SERIAL primary key, 
 account_id int not null,
 day date not null,
 min_day date not null,
 event_id int not null
);

INSERT INTO tempEventDates (account_id, day, min_day, event_id)
SELECT account_id, day,
MIN(day) OVER (PARTITION BY account_id) as min_day, event_id
FROM yourtable
GROUP BY account_id, day, event_id
ORDER BY account_id, day, event_id;

WITH RECURSIVE RCTE AS
(
    SELECT id, account_id, event_id, day, min_day
    FROM tempEventDates
    WHERE day = min_day

    UNION ALL

    SELECT t.id, t.account_id, t.event_id, t.day, 
     CASE WHEN t.day > c.min_day + interval '30 days' THEN t.day ELSE c.min_day END
    FROM RCTE c
    JOIN tempEventDates t 
      ON t.account_id = c.account_id 
     AND t.id = c.id + 1
)
SELECT account_id, day, event_id
FROM RCTE
WHERE day = min_day
ORDER BY account_id, day;

关于妊娠期妊娠的测试此处

A test on rextester here

这篇关于如何忽略间隔为30天的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆