Postgres中的有效时间序列查询 [英] Efficient time series querying in Postgres
问题描述
我的PG数据库中有一个表,看起来像这样:
I have a table in my PG db that looks somewhat like this:
id | widget_id | for_date | score |
每个引用的小部件都有很多这些项。每个小部件每天总是1个,但是有空白。
Each referenced widget has a lot of these items. It's always 1 per day per widget, but there are gaps.
我想要得到的结果是包含自X以来每个日期的所有小部件。通过generate系列引入:
What I want to get is a result that contains all the widgets for each date since X. The dates are brought in via generate series:
SELECT date.date::date
FROM generate_series('2012-01-01'::timestamp with time zone,'now'::text::date::timestamp with time zone, '1 day') date(date)
ORDER BY date.date DESC;
如果没有给定widget_id的日期条目,我想使用上一个。所以说小部件1337在2012-05-10上没有条目,但是在2012-05-08上,那么我希望结果集在2012-05-10上也显示2012-05-08条目:
If there is no entry for a date for a given widget_id, I want to use the previous one. So say widget 1337 doesn't have an entry on 2012-05-10, but on 2012-05-08, then I want the resultset to show the 2012-05-08 entry on 2012-05-10 as well:
Actual data:
widget_id | for_date | score
1312 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1337 | 2012-05-08 | 41
1337 | 2012-05-11 | 500
Desired output based on generate series:
widget_id | for_date | score
1336 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1336 | 2012-05-08 | 20
1337 | 2012-05-08 | 41
1336 | 2012-05-09 | 20
1337 | 2012-05-09 | 41
1336 | 2012-05-10 | 20
1337 | 2012-05-10 | 41
1336 | 2012-05-11 | 20
1337 | 2012-05-11 | 500
最终我想将其归结为一个视图,这样我每天就有一致的数据集可以轻松查询。
Eventually I want to boil this down into a view so I have consistent data sets per day that I can query easily.
编辑:使样本数据和预期结果集更清晰
Made the sample data and expected resultset clearer
推荐答案
select
widget_id,
for_date,
case
when score is not null then score
else first_value(score) over (partition by widget_id, c order by for_date)
end score
from (
select
a.widget_id,
a.for_date,
s.score,
count(score) over(partition by a.widget_id order by a.for_date) c
from (
select widget_id, g.d::date for_date
from (
select distinct widget_id
from score
) s
cross join
generate_series(
(select min(for_date) from score),
(select max(for_date) from score),
'1 day'
) g(d)
) a
left join
score s on a.widget_id = s.widget_id and a.for_date = s.for_date
) s
order by widget_id, for_date
这篇关于Postgres中的有效时间序列查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!