Postgres中的时间序列查询 [英] Time series querying in Postgres

查看:155
本文介绍了Postgres中的时间序列查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是@Erwin对 Postgres中有效的时间序列查询的回答中的一个问题。

This is a follow on question from @Erwin's answer to Efficient time series querying in Postgres.

为了使事情简单,我将使用与该问题相同的表结构

In order to keep things simple I'll use the same table structure as that question

id | widget_id | for_date | score |

最初的问题是获取范围内每个日期每个小部件的分数。如果日期没有小部件的条目,则显示该小部件上一个条目的得分。如果所有数据都包含在您要查询的范围内,则使用交叉联接和窗口函数的解决方案效果很好。我的问题是,即使以前的分数不在我们正在查看的日期范围内,我也希望得到。

The original question was to get score for each of the widgets for every date in a range. If there was no entry for a widget on a date then show the score from the previous entry for that widget. The solution using a cross join and a window function worked well if all the data was contained in the range you were querying for. My problem is I want the previous score even if it lies outside the date range we are looking at.

示例数据:

INSERT INTO score (id, widget_id, for_date, score) values
(1, 1337, '2012-04-07', 52),
(2, 2222, '2012-05-05', 99),
(3, 1337, '2012-05-07', 112),
(4, 2222, '2012-05-07', 101);

当我查询2012年5月5日至5月10日的范围时(即 generate_series ('2012-05-05':: date,'2012-05-10':: date,'1d'))我想得到以下内容:

When I query for the range May 5th to May 10th 2012 (ie generate_series('2012-05-05'::date, '2012-05-10'::date, '1d')) I would like to get the following:

DAY          WIDGET_ID  SCORE
May, 05 2012    1337    52
May, 05 2012    2222    99
May, 06 2012    1337    52
May, 06 2012    2222    99
May, 07 2012    1337    112
May, 07 2012    2222    101
May, 08 2012    1337    112
May, 08 2012    2222    101
May, 09 2012    1337    112
May, 09 2012    2222    101
May, 10 2012    1337    112
May, 10 2012    2222    101

到目前为止最好的解决方案(也是@Erwin的)是:

The best solution so far (also by @Erwin) is:

SELECT a.day, a.widget_id, s.score
FROM  (
   SELECT d.day, w.widget_id
         ,max(s.for_date) OVER (PARTITION BY w.widget_id ORDER BY d.day) AS effective_date
   FROM  (SELECT generate_series('2012-05-05'::date, '2012-05-10'::date, '1d')::date AS day) d
   CROSS  JOIN (SELECT DISTINCT widget_id FROM score) AS w
   LEFT   JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
   ) a
LEFT JOIN  score s ON s.for_date = a.effective_date AND s.widget_id = a.widget_id
ORDER BY a.day, a.widget_id;

但是您可以在 SQL提琴,它在前两天为小部件1337生成空分数。我想从第1行的位置看到更早的52分。

But as you can see in this SQL Fiddle it produces null scores for widget 1337 on the first two days. I would like to see the earlier score of 52 from row 1 in its place.

是否可以有效地做到这一点?

Is it possible to do this in an efficient way?

推荐答案

@Roman提及 DISTINCT ON 可以解决此问题。此相关答案中的详细信息:

As @Roman mentioned, DISTINCT ON can solve this. Details in this related answer:

  • Select first row in each GROUP BY group?

子查询通常比CTE快一点。

Subqueries are generally a bit faster than CTEs, though:

SELECT DISTINCT ON (d.day, w.widget_id)
       d.day, w.widget_id, s.score
FROM   generate_series('2012-05-05'::date, '2012-05-10'::date, '1d') d(day)
CROSS  JOIN (SELECT DISTINCT widget_id FROM score) AS w
LEFT   JOIN score s ON s.widget_id = w.widget_id AND s.for_date <= d.day
ORDER  BY d.day, w.widget_id, s.for_date DESC;

您可以在 FROM中使用返回表之类的集合返回函数列表。

SQL小提琴

一个多列索引应该是性能的关键:

One multicolumn index should be the key to performance:

CREATE INDEX score_multi_idx ON score (widget_id, for_date, score)

第三列得分仅包含在Postgres 9.2中,使其成为覆盖索引或更高版本

The third column score is only included to make it a covering index in Postgres 9.2 or later. You would not include it in earlier versions.

当然,如果您有许多小部件且工作日很长,则 CROSS JOIN 产生很多行,并带有价格标签。仅选择您实际需要的小部件和日期。

Of course, if you have many widgets and a wide range of days, the CROSS JOIN produces a lot of rows, which has a price-tag. Only select the widgets and days you actually need.

这篇关于Postgres中的时间序列查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆