获取分组的第一个和最后一个时间戳之间的另一个字段的差异 [英] Get difference of another field between first and last timestamps of grouping

查看:220
本文介绍了获取分组的第一个和最后一个时间戳之间的另一个字段的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的表 sensor_values timestamp / code>, sensor_id 和另一个名为 sensors 的表 sensor_id name



我通常执行枢轴查询以获取按天分组的汇总数据像这样:

  SELECT MIN(to_char(s1.timestamp :: timestamptz,'YYYY-MM-DD HH24:MI:SS TZ'))AS时间,
SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END)ASNickname1,
SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END)ASNickname2,
SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END)asNickname3
来自sensor_values s1
WHERE s1.timestamp: :timestamptz> ='2013-10-14T00:00:00 + 00:00':: timestamptz
AND s1.timestamp :: timestamptz< ='2013-10-18T00:00:00 + 00: 00':: timestamptz
AND s1.sensor_id IN(572,542,571,540,541,573)
GROUP BY date_trunc('day',s1.timestamp)ORDER BY 1;

这样工作正常如果有点慢。然而,是否可以编写一个类似的查询
,而不是对组求和,得到每个分组中最新和最早的时间戳之间的差异,即在这种情况下的天?



这是因为我有一些传感器数据不断增加(电kwh米)
,并想知道在特定时间框架的消费。

解决方案

步骤1:释放手刹




/ p>



  SELECT to_char(MIN(ts):: timestamptz,'YYYY-MM-DD HH24: MI:SS TZ')AS min_time 
,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END)AS nickname1
,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END)AS nickname2
,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END)AS nickname3
FROM sensor_values
- LEFT JOIN sensor_values_cleaned s2 USING(sensor_id,ts)
WHERE ts> 2013-10-14T00:00:00 + 00:00':: timestamptz :: timestamp
AND ts< '2013-10-18T00:00:00 + 00:00':: timestamptz :: timestamp
AND sensor_id IN(572,542,571,540,541,573)
GROUP BY ts :: date as day
ORDER BY 1;



主要点




  • 替换 保留字 (在标准SQL中)。

    timestamp - > ts

    时间 - > min_time


  • 由于连接使用相同的列名称,因此您可以使用更简单的 USING 子句 USING(sensor_id,ts )

    然而,由于第二个表 sensor_values_cleaned 与此查询100%无关,因此我将其完全删除。 p>


  • 由于@joop已建议,请切换 min() to_char / code>在您的第一个输出列。这样,Postgres可以从原始列值中确定最小值,这通常更快,并且可以利用索引。在这种特定情况下, date 的排序也比 text


  • 类似的考虑适用于您的 WHERE 条件:

    WHERE ts :: timestamptz> ='2013-10-14T00:00:00 + 00:00':: timestamptz

      WHERE ts> ='2013-10-14T00:00:00 + 00:00':: timestamptz :: timestamp 
    / pre>

    第二个是 sargable ,并且可以使用 ts 上的普通索引 - 对大表中的效果有很大影响!


  • 使用 ts :: date 而不是 date_trunc('day',ts)


  • 很可能你的第二个WHERE条件略有不同。一般来说,您排除上边框

      AND ts < =  '2013-10-18T00:00:00 + 00:00'...  

      AND ts < '2013-10-18T00:00:00 + 00:00'...  


  • 当混合 timestamp timestamptz 注意效果。例如,您的 WHERE 条件不在当地时间00:00截止(除非本地时间与UTC一致)。详情如下:

    完全忽略时区Rails和PostgreSQL




步骤2:您的请求




...每个分组中最新和最早的时间戳之间的差异


我想你的意思是:

... 的值最新和最早的时间戳之间的区别...

否则会更简单。



使用 window functions ,特别是 first_value() last_value )。小心使用组合,您需要非标准窗口框架的last_value()在这种情况下。比较:

PostgreSQL聚合或窗口函数只返回最后一个值



我将此与 DISTINCT ON ,在这种情况下比 GROUP BY (这将需要另一个子查询级别)更方便:

  SELECT DISTINCT ON(ts :: date,sensor_id)
ts :: date AS day
,to_char((min(ts)OVER(PARTITION BY ts :: date) ):: timestamptz
,'YYYY-MM-DD HH24:MI:SS TZ')AS min_time
,sensor_id
,last_value(value)OVER(PARTITION BY ts :: date,sensor_id ORDER BY ts
未定义预处理和未定界跟踪之间的范围)
- first_value(value)OVER(PARTITION BY ts :: date,sensor_id ORDER BY ts)
AS val_range
FROM sensor_values
WHERE ts> ='2013-10-14T00:00:00 + 0':: timestamptz :: timestamp
AND ts< '2013-10-18T00:00:00 + 0':: timestamptz :: timestamp
AND sensor_id IN(540,541,542,571,572,573)
ORDER BY ts :: date, sensor_id;

- > SQLfiddle demo。



步骤3:数据透视表



根据上面的查询,我使用 code>



> c>

  SELECT * FROM crosstab(
$$ SELECT DISTINCT ON(1,3)
ts :: date AS day
,to_char (ts)OVER(PARTITION BY ts :: date)):: timestamptz,'YYYY-MM-DD HH24:MI:SS TZ')AS min_time
,sensor_id
,last_value PARTITION BY ts :: date,sensor_id ORDER BY ts未定义预处理和未定界跟踪之间的范围)
- first_value(value)OVER(PARTITION BY ts :: date,sensor_id ORDER BY ts)AS val_range
FROM sensor_values
WHERE ts> ='2013-10-14T00:00:00 + 0':: timestamptz :: timestamp
AND ts< '2013-10-18T00:00:00 + 0':: timestamptz :: timestamp
AND sensor_id IN(540,541,542,571,572,573)
ORDER BY 1,3 $$

,$$ VALUES(540),(541),(542),(571),(572),(573)$$

日期,min_time文本,s540数字,s541数字,s542数字,s571数字,s572数字,s573数字);

返回(并且比以前更快):

  day | min_time | s540 | s541 | s542 | s571 | s572 | s573 
------------ + -------------------------- + ----- - + ------- + ------- + ------- + ------- + -------
2013-10- 14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94


I have a very large table called sensor_values with the columns timestamp, value, sensor_id and another table called sensors which has say sensor_id, name.

I often perform a pivot query to get summed data grouped by day like this:

SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,
      SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END) AS "Nickname1",
      SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END) AS "Nickname2",
      SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END) AS "Nickname3"
FROM sensor_values s1
WHERE s1.timestamp::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz
AND s1.timestamp::timestamptz <= '2013-10-18T00:00:00+00:00'::timestamptz
AND s1.sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY date_trunc('day', s1.timestamp) ORDER BY 1 ; 

This works OK if a bit slow. However, is it possible to write a similar query that instead of summing the groups it gets the difference between the latest and earliest timestamps in each grouping, i.e. day in this case?

This is because I have some sensor data that is ever increasing (electrical kwh meter) and would like to know the consumption in a particular time frame.

解决方案

Step 1: Release the handbrakes

... if a bit slow

SELECT to_char(MIN(ts)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
      ,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END) AS nickname1
      ,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END) AS nickname2
      ,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END) AS nickname3
FROM   sensor_values
-- LEFT JOIN sensor_values_cleaned s2 USING (sensor_id, ts)
WHERE  ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
AND    ts <  '2013-10-18T00:00:00+00:00'::timestamptz::timestamp
AND    sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP  BY ts::date AS day
ORDER  BY 1;

Major points

  • Replace reserved words (in standard SQL) in your identifiers.
    timestamp -> ts
    time -> min_time

  • Since the join is on identical column names you can use the simpler USING clause in the join condition: USING (sensor_id, ts)
    However, since the second table sensor_values_cleaned is 100% irrelevant to this query, I removed it entirely.

  • As @joop already advised, switch min() and to_char() in your first out put column. This way, Postgres can determine the minimum from the original column value, which is generally faster and may be able to utilize an index. In this specific case, ordering by date is also cheaper than ordering by a text, which would also have to consider collation rules.

  • A similar consideration applies to your WHERE condition:
    WHERE ts::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz

    WHERE  ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
    

    The second one is sargable and can utilize a plain index on ts - to great effect on performance in big tables!

  • Using ts::date instead of date_trunc('day', ts). Simpler, faster, same result.

  • Most probably your second WHERE condition is slightly incorrect. Generally, you would exclude the upper border:

    AND    ts <=  '2013-10-18T00:00:00+00:00' ...

    AND    ts <   '2013-10-18T00:00:00+00:00' ...

  • When mixing timestamp and timestamptz one needs to be aware of the effects. For instance, your WHERE condition doesn't cut at 00:00 local time (except if local time coincides with UTC). Details here:
    Ignoring timezones altogether in Rails and PostgreSQL

Step 2: Your request

...the difference between the latest and earliest timestamps in each grouping

And by that I suppose you mean:
...the difference between the value of the latest and earliest timestamps ...
Otherwise it would be much simpler.

Use window functions for that, in particular first_value() and last_value(). Careful with the combination, you want a non-standard window frame for last_value() in this case. Compare:
PostgreSQL aggregate or window function to return just the last value

I combine this with DISTINCT ON, which is more convenient in this case than GROUP BY (which would need another subquery level):

SELECT DISTINCT ON (ts::date, sensor_id)
       ts::date AS day
      ,to_char((min(ts)  OVER (PARTITION BY ts::date))::timestamptz
              ,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
      ,sensor_id
      ,last_value(value)    OVER (PARTITION BY ts::date, sensor_id ORDER BY ts
                     RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts)
                                                                   AS val_range
FROM   sensor_values
WHERE  ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND    ts <  '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND    sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER  BY ts::date, sensor_id;

-> SQLfiddle demo.

Step 3: Pivot table

Building on the query above I use crosstab() from the additional module tablefunc:

SELECT * FROM crosstab(
   $$SELECT DISTINCT ON (1,3)
            ts::date AS day
           ,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
           ,sensor_id
           ,last_value(value)    OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
            - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range
     FROM   sensor_values
     WHERE  ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
     AND    ts <  '2013-10-18T00:00:00+0'::timestamptz::timestamp
     AND    sensor_id IN (540, 541, 542, 571, 572, 573)
     ORDER  BY 1, 3$$

   ,$$VALUES (540), (541), (542), (571), (572), (573)$$
   )
AS ct (day date, min_time text, s540 numeric, s541 numeric, s542 numeric, s571 numeric, s572 numeric, s573 numeric);

Returns (and much faster than before):

    day     |         min_time         | s540  | s541  | s542  | s571  | s572  | s573
------------+--------------------------+-------+-------+-------+-------+-------+-------
 2013-10-14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
 2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
 2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
 2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94

这篇关于获取分组的第一个和最后一个时间戳之间的另一个字段的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆