获取分组的第一个和最后一个时间戳之间的另一个字段的差异 [英] Get difference of another field between first and last timestamps of grouping
问题描述
我有一个非常大的表 sensor_values
与 timestamp
, / code>,
sensor_id
和另一个名为 sensors
的表 sensor_id
, name
。
我通常执行枢轴查询以获取按天分组的汇总数据像这样:
SELECT MIN(to_char(s1.timestamp :: timestamptz,'YYYY-MM-DD HH24:MI:SS TZ'))AS时间,
SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END)ASNickname1,
SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END)ASNickname2,
SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END)asNickname3
来自sensor_values s1
WHERE s1.timestamp: :timestamptz> ='2013-10-14T00:00:00 + 00:00':: timestamptz
AND s1.timestamp :: timestamptz< ='2013-10-18T00:00:00 + 00: 00':: timestamptz
AND s1.sensor_id IN(572,542,571,540,541,573)
GROUP BY date_trunc('day',s1.timestamp)ORDER BY 1;
这样工作正常如果有点慢。然而,是否可以编写一个类似的查询
,而不是对组求和,得到每个分组中最新和最早的时间戳之间的差异,即在这种情况下的天?
这是因为我有一些传感器数据不断增加(电kwh米)
,并想知道在特定时间框架的消费。
步骤1:释放手刹
/ p>
SELECT to_char(MIN(ts):: timestamptz,'YYYY-MM-DD HH24: MI:SS TZ')AS min_time
,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END)AS nickname1
,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END)AS nickname2
,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END)AS nickname3
FROM sensor_values
- LEFT JOIN sensor_values_cleaned s2 USING(sensor_id,ts)
WHERE ts> 2013-10-14T00:00:00 + 00:00':: timestamptz :: timestamp
AND ts< '2013-10-18T00:00:00 + 00:00':: timestamptz :: timestamp
AND sensor_id IN(572,542,571,540,541,573)
GROUP BY ts :: date as day
ORDER BY 1;
主要点
-
替换 保留字 (在标准SQL中)。
timestamp
- >ts
时间
- >min_time
-
由于连接使用相同的列名称,因此您可以使用更简单的
USING
子句 :USING(sensor_id,ts )
然而,由于第二个表sensor_values_cleaned
与此查询100%无关,因此我将其完全删除。 p> -
由于@joop已建议,请切换
min()
和to_char / code>在您的第一个输出列。这样,Postgres可以从原始列值中确定最小值,这通常更快,并且可以利用索引。在这种特定情况下,
date
的排序也比text
-
类似的考虑适用于您的
WHERE
条件:
WHERE ts :: timestamptz> ='2013-10-14T00:00:00 + 00:00':: timestamptzWHERE ts> ='2013-10-14T00:00:00 + 00:00':: timestamptz :: timestamp
/ pre>
第二个是 sargable ,并且可以使用
ts
上的普通索引 - 对大表中的效果有很大影响! -
使用
ts :: date
而不是date_trunc('day',ts)
。 -
很可能你的第二个WHERE条件略有不同。一般来说,您排除上边框:
AND ts < = '2013-10-18T00:00:00 + 00:00'...
AND ts < '2013-10-18T00:00:00 + 00:00'...
-
当混合
timestamp
和timestamptz
注意效果。例如,您的WHERE
条件不在当地时间00:00截止(除非本地时间与UTC一致)。详情如下:
完全忽略时区Rails和PostgreSQL
步骤2:您的请求
...每个分组中最新和最早的时间戳之间的差异
我想你的意思是:
... 的值最新和最早的时间戳之间的区别...
否则会更简单。
使用 window functions ,特别是 first_value()
和 last_value )
。小心使用组合,您需要非标准窗口框架的last_value()在这种情况下。比较:
PostgreSQL聚合或窗口函数只返回最后一个值
我将此与 DISTINCT ON
,在这种情况下比 GROUP BY
(这将需要另一个子查询级别)更方便:
SELECT DISTINCT ON(ts :: date,sensor_id)
ts :: date AS day
,to_char((min(ts)OVER(PARTITION BY ts :: date) ):: timestamptz
,'YYYY-MM-DD HH24:MI:SS TZ')AS min_time
,sensor_id
,last_value(value)OVER(PARTITION BY ts :: date,sensor_id ORDER BY ts
未定义预处理和未定界跟踪之间的范围)
- first_value(value)OVER(PARTITION BY ts :: date,sensor_id ORDER BY ts)
AS val_range
FROM sensor_values
WHERE ts> ='2013-10-14T00:00:00 + 0':: timestamptz :: timestamp
AND ts< '2013-10-18T00:00:00 + 0':: timestamptz :: timestamp
AND sensor_id IN(540,541,542,571,572,573)
ORDER BY ts :: date, sensor_id;
步骤3:数据透视表
根据上面的查询,我使用 code>
> c>
SELECT * FROM crosstab(
$$ SELECT DISTINCT ON(1,3)
ts :: date AS day
,to_char (ts)OVER(PARTITION BY ts :: date)):: timestamptz,'YYYY-MM-DD HH24:MI:SS TZ')AS min_time
,sensor_id
,last_value PARTITION BY ts :: date,sensor_id ORDER BY ts未定义预处理和未定界跟踪之间的范围)
- first_value(value)OVER(PARTITION BY ts :: date,sensor_id ORDER BY ts)AS val_range
FROM sensor_values
WHERE ts> ='2013-10-14T00:00:00 + 0':: timestamptz :: timestamp
AND ts< '2013-10-18T00:00:00 + 0':: timestamptz :: timestamp
AND sensor_id IN(540,541,542,571,572,573)
ORDER BY 1,3 $$
,$$ VALUES(540),(541),(542),(571),(572),(573)$$
)
日期,min_time文本,s540数字,s541数字,s542数字,s571数字,s572数字,s573数字);
返回(并且比以前更快):
day | min_time | s540 | s541 | s542 | s571 | s572 | s573
------------ + -------------------------- + ----- - + ------- + ------- + ------- + ------- + -------
2013-10- 14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94
I have a very large table called sensor_values
with the columns timestamp
, value
, sensor_id
and another table called sensors
which has say sensor_id
, name
.
I often perform a pivot query to get summed data grouped by day like this:
SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,
SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END) AS "Nickname1",
SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END) AS "Nickname2",
SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END) AS "Nickname3"
FROM sensor_values s1
WHERE s1.timestamp::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz
AND s1.timestamp::timestamptz <= '2013-10-18T00:00:00+00:00'::timestamptz
AND s1.sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY date_trunc('day', s1.timestamp) ORDER BY 1 ;
This works OK if a bit slow. However, is it possible to write a similar query that instead of summing the groups it gets the difference between the latest and earliest timestamps in each grouping, i.e. day in this case?
This is because I have some sensor data that is ever increasing (electrical kwh meter) and would like to know the consumption in a particular time frame.
Step 1: Release the handbrakes
... if a bit slow
SELECT to_char(MIN(ts)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END) AS nickname1
,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END) AS nickname2
,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END) AS nickname3
FROM sensor_values
-- LEFT JOIN sensor_values_cleaned s2 USING (sensor_id, ts)
WHERE ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
AND ts < '2013-10-18T00:00:00+00:00'::timestamptz::timestamp
AND sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY ts::date AS day
ORDER BY 1;
Major points
Replace reserved words (in standard SQL) in your identifiers.
timestamp
->ts
time
->min_time
Since the join is on identical column names you can use the simpler
USING
clause in the join condition:USING (sensor_id, ts)
However, since the second tablesensor_values_cleaned
is 100% irrelevant to this query, I removed it entirely.As @joop already advised, switch
min()
andto_char()
in your first out put column. This way, Postgres can determine the minimum from the original column value, which is generally faster and may be able to utilize an index. In this specific case, ordering bydate
is also cheaper than ordering by atext
, which would also have to consider collation rules.A similar consideration applies to your
WHERE
condition:
WHERE ts::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptzWHERE ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
The second one is sargable and can utilize a plain index on
ts
- to great effect on performance in big tables!Using
ts::date
instead ofdate_trunc('day', ts)
. Simpler, faster, same result.Most probably your second WHERE condition is slightly incorrect. Generally, you would exclude the upper border:
AND ts <= '2013-10-18T00:00:00+00:00' ...
AND ts < '2013-10-18T00:00:00+00:00' ...
When mixing
timestamp
andtimestamptz
one needs to be aware of the effects. For instance, yourWHERE
condition doesn't cut at 00:00 local time (except if local time coincides with UTC). Details here:
Ignoring timezones altogether in Rails and PostgreSQL
Step 2: Your request
...the difference between the latest and earliest timestamps in each grouping
And by that I suppose you mean:
...the difference between the value of the latest and earliest timestamps ...
Otherwise it would be much simpler.
Use window functions for that, in particular first_value()
and last_value()
. Careful with the combination, you want a non-standard window frame for last_value() in this case. Compare:
PostgreSQL aggregate or window function to return just the last value
I combine this with DISTINCT ON
, which is more convenient in this case than GROUP BY
(which would need another subquery level):
SELECT DISTINCT ON (ts::date, sensor_id)
ts::date AS day
,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz
,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
,sensor_id
,last_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts)
AS val_range
FROM sensor_values
WHERE ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND ts < '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER BY ts::date, sensor_id;
Step 3: Pivot table
Building on the query above I use crosstab()
from the additional module tablefunc
:
SELECT * FROM crosstab(
$$SELECT DISTINCT ON (1,3)
ts::date AS day
,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
,sensor_id
,last_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range
FROM sensor_values
WHERE ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND ts < '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER BY 1, 3$$
,$$VALUES (540), (541), (542), (571), (572), (573)$$
)
AS ct (day date, min_time text, s540 numeric, s541 numeric, s542 numeric, s571 numeric, s572 numeric, s573 numeric);
Returns (and much faster than before):
day | min_time | s540 | s541 | s542 | s571 | s572 | s573
------------+--------------------------+-------+-------+-------+-------+-------+-------
2013-10-14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94
这篇关于获取分组的第一个和最后一个时间戳之间的另一个字段的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!