如何忽略间隔为30天的行? [英] How to ignore rows with moving 30 day interval?
问题描述
我陷入了SQL问题.假设我们在Redshift中有一个像这样的数据集:
I got stuck with a SQL problem. Let's say we have a dataset like this in Redshift :
account_id day event_id
111 2019-01-01 1000
111 2019-01-02 1001
111 2019-01-02 1002
111 2019-01-10 1003
111 2019-01-25 1004
111 2019-02-05 1005
111 2019-02-24 1006
111 2019-02-28 1007
111 2019-03-02 1008
111 2019-03-15 1009
222 2019-01-01 1000
222 2019-01-02 1001
222 2019-01-02 1002
222 2019-01-10 1003
222 2019-01-25 1004
222 2019-02-05 1005
222 2019-02-24 1006
222 2019-02-28 1007
222 2019-03-02 1008
222 2019-03-15 1009
我需要选择在窗口PER ACCOUNT_ID的30天后发生的event_id,但是然后根据我发现的第一个事件日期更改新窗口的开始日期.
I need to pick event_ids that happen after 30 days of window PER ACCOUNT_ID, but then change starting date of new window based on the first event date I find.
所以在这种情况下,对于ACCOUNT_IDS 111和222都是这样:
so in this case FOR BOTH ACCOUNT_IDS 111 and 222:
- 我们首先选择event_id = 1000,然后我们应该忽略一切,直到2月1日(30天)
- 然后我们选择event_id = 1005,直到3月5日,我们都应忽略所有内容(因为event_id = 1005在2月5日发生)
- 然后我们在3月15日提取event_id = 1009,并且应该忽略所有内容,直到4月15日...
你得到图片了.
该怎么做?
推荐答案
我也找不到纯粹基于窗口函数的解决方案.
I also couldn't find a solution purely based on window functions.
但是在PostgreSql中,递归CTE可以做到这一点.
But in PostgreSql a recursive CTE works for this.
临时表用于具有可用于连接到下一条记录的ID.
The temp table is used to have an id that can be used to connect to the next record.
CREATE TEMPORARY TABLE tempEventDates (
id SERIAL primary key,
account_id int not null,
day date not null,
min_day date not null,
event_id int not null
);
INSERT INTO tempEventDates (account_id, day, min_day, event_id)
SELECT account_id, day,
MIN(day) OVER (PARTITION BY account_id) as min_day, event_id
FROM yourtable
GROUP BY account_id, day, event_id
ORDER BY account_id, day, event_id;
WITH RECURSIVE RCTE AS
(
SELECT id, account_id, event_id, day, min_day
FROM tempEventDates
WHERE day = min_day
UNION ALL
SELECT t.id, t.account_id, t.event_id, t.day,
CASE WHEN t.day > c.min_day + interval '30 days' THEN t.day ELSE c.min_day END
FROM RCTE c
JOIN tempEventDates t
ON t.account_id = c.account_id
AND t.id = c.id + 1
)
SELECT account_id, day, event_id
FROM RCTE
WHERE day = min_day
ORDER BY account_id, day;
关于妊娠期妊娠的测试此处
A test on rextester here
这篇关于如何忽略间隔为30天的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!