如何在Postgres中获得时间间隔的平均值 [英] How to get average values for time intervals in Postgres

查看:1720
本文介绍了如何在Postgres中获得时间间隔的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是PostgreSQL 9.6。我有这样一个表:

  mac sn loc时间日期vin1 vin2 vin3 
1a34 4as11111111 aaaa 7:06: 18 1/1/2018 447.42 472.32 682.59
1a34 4as11111111 aaaa 7:06:43 1/1/2018 455.97 476.25 682.59
1a34 4as11111111 aaaa 7:07:35 1/1/2018 470.88 484.2 682.5

我需要计算 vin1 ,在300秒(5分钟)的时间间隔内, vin2 vin3 。例如,从第一次(7:06:18 - 7:11:18)开始,范围内的日期。我可以用这个查询选择我需要的数据:

pre $ select $ from table
where sn ='4as11111111'和2018-01-01和2018-01-02之间的日期;

但我不知道如何按照300秒的时间间隔对它进行分组,并计算<$对于那些5分钟的间隔,c $ c> vin1 , vin2 vin3 得到这样的东西:

  mac sn loc时间日期vin1_av vin2_av vin3_av 
1a34 4as11111111 aaaa 7:06:18 1 / 1/2018 450.0 480.32 600.59
1a34 4as11111111 aaaa 7:11:18 1/1/2018 460.0 490.25 782.59
1a34 4as11111111 aaaa 7:16:18 1/1/2018 470.88 500.2 600.5 $ b $


解决方案数据库设计

尽管您可以 使用单独的日期 time 列,那么与单个 timestamp 列。我会改编:

  ALTER TABLE tbl ADD列ts timestamp; 
UPDATE tbl SET ts =日期+时间; - 假设实际日期和时间类型
ALTER TABLE tbl DROP列日期,DROP列时间;

如果日期和时间不是实际的日期 time 数据类型,使用 to_timestamp()。相关:



查询



然后查询更简单一些:

  SELECT * 
FROM(
SELECT sn,generate_series(最小值(ts),最大值(ts),间隔'5min')AS ts
FROM tbl
WHERE sn ='4as11111111'
AND ts> ='2018-01-01'
AND ts''2018-01-02'
GROUP BY 1
)grid
交叉连接横向(
SELECT round(avg(vin1),2) AS vin1_av
,round(avg(vin2),2)AS vin2_av
,round(avg(vin3),2)AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts> = grid.ts
AND ts< grid.ts + interval'5 min'
)avg;

db<> fiddle here 在第一个子查询中生成一个开始时间网格 grid ,从给定时间范围内的第一行到最后一个 qualifying 行运行。

使用 LATERAL join连接到每个分区中的行,并立即在子查询 avg 中汇总平均值。由于聚合,即使没有找到任何条目,它总是返回一行。在这种情况下,平均值默认为 NULL



结果包括第一个和最后一个符合条件的行之间的所有时间间隔给定的时间框架。各种其他结果组成也是有意义的。就像在给定的时间范围内包含所有时隙一样,或者只是包含实际值的时隙。所有可能的,我必须选择一个解释。



索引



至少有这个多列索引: p>

  CRATE INDEX foo_idx ON tbl(sn,ts); 

(sn,ts,vin1,vin2,vin3) code>允许仅索引扫描 - 如果满足某些前提条件,特别是如果表格行比示例中宽的多。



密切相关: p>



根据您的原始表格



根据评论的要求和澄清,以及稍后更新再次进入问题包括列 mac loc 。我假设你需要每个(mac,loc)的单独平均值。



date code>和 time 仍然是单独的列,vin *列是 float 类型,行:

  SELECT t.mac,grid.sn,t.loc,grid.ts :: time AS time,grid。 ts :: date AS日期
,t.vin1_av,t.vin2_av,t.vin3_av
FROM(
SELECT sn,generate_series(min(date + time) - 动态生成时间戳
,max(日期+时间)
,时间间隔'5分钟')AS ts
FROM tbl
WHERE sn ='4as11111111'
AND date + time> ='2018-01-01 0:0'
和日期+时间''2018-01-02 0:0'
GROUP BY 1
)grid
CROSS JOIN LATERAL(
SELECT mac,loc
,round(avg(vin1):: numeric,2)AS vin1_av - 为round()转换为数字
,round(avg(vin2): :数字,2)AS vin2_av - 但舍入是选项al
,round(avg(vin3):: numeric,2)AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND date + time> = grid.ts
和日期+时间< grid.ts + interval'5 min'
GROUP BY mac,loc
HAVING count(*)> 0 - 排除空插槽
)t;

创建一个多列表达式索引来支持它:

  CRATE INDEX bar_idx ON tbl(sn,(date + time)); 

db<> fiddle 这里

但是我会用 timestamp 一直。


I'm using PostgreSQL 9.6. I have a table like this:

mac   sn         loc   time     date      vin1    vin2    vin3
1a34 4as11111111 aaaa  7:06:18  1/1/2018  447.42  472.32  682.59
1a34 4as11111111 aaaa  7:06:43  1/1/2018  455.97  476.25  682.59
1a34 4as11111111 aaaa  7:07:35  1/1/2018  470.88  484.2   682.5

I need to calculate the average of the vin1, vin2, vin3 within time intervals of 300 sec (5 min). For example, starting from the first time (7:06:18 - 7:11:18), for the dates in range. I can select the data I need with this query:

select * from table
where sn='4as11111111' and date between '2018-01-01' and '2018-01-02';

But I don't know how to group it by 300 sec time intervals and calculate average for vin1, vin2, vin3 columns for those 5 min interval, to get something like this:

mac  sn          loc     time     date      vin1_av  vin2_av  vin3_av
1a34 4as11111111 aaaa   7:06:18  1/1/2018  450.0    480.32   600.59
1a34 4as11111111 aaaa   7:11:18  1/1/2018  460.0    490.25   782.59
1a34 4as11111111 aaaa   7:16:18  1/1/2018  470.88   500.2    600.5

Any help would be greatly appreciated.

解决方案

DB design

While you can work with separate date and time columns, there is really no advantage over a single timestamp column. I would adapt:

ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time;  -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;

If date and time are not actual date and time data types, use to_timestamp(). Related:

Query

Then the query is a bit simpler:

SELECT *
FROM  (
   SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
   FROM   tbl
   WHERE  sn = '4as11111111'
   AND    ts >= '2018-01-01'
   AND    ts <  '2018-01-02'
   GROUP  BY 1
   ) grid
CROSS  JOIN LATERAL (
   SELECT round(avg(vin1), 2) AS vin1_av
        , round(avg(vin2), 2) AS vin2_av
        , round(avg(vin3), 2) AS vin3_av
   FROM   tbl
   WHERE  sn =  grid.sn
   AND    ts >= grid.ts
   AND    ts <  grid.ts + interval '5 min'
   ) avg;

db<>fiddle here

Generate a grid of start times in the first subquery grid, running from the first to the last qualifying row in the given time frame.

Join to rows that fall in each partition with a LATERAL join and immediately aggregate averages in the subquery avg. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL in this case.

The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.

Index

At least have this multicolumn index:

CRATE INDEX foo_idx ON tbl (sn, ts);

Or on (sn, ts, vin1, vin2, vin3) to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.

Closely related:

Based on your original table

As requested and clarified in the comment, and later updated again in the question to include the columns mac and loc. I assume you want separate averages per (mac, loc).

date and time are still separate columns, vin* columns are type float, and exclude time slots without rows:

SELECT t.mac, grid.sn, t.loc, grid.ts::time AS time, grid.ts::date AS date
     , t.vin1_av, t.vin2_av, t.vin3_av
FROM  (
   SELECT sn, generate_series(min(date+time)  -- generate timestamp on the fly
                            , max(date+time)
                            , interval '5 min') AS ts
   FROM   tbl
   WHERE  sn = '4as11111111'
   AND    date+time >= '2018-01-01 0:0'
   AND    date+time <  '2018-01-02 0:0'
   GROUP  BY 1
   ) grid
CROSS JOIN LATERAL (
   SELECT mac, loc
        , round(avg(vin1)::numeric, 2) AS vin1_av  -- cast to numeric for round()
        , round(avg(vin2)::numeric, 2) AS vin2_av  -- but rounding is optional
        , round(avg(vin3)::numeric, 2) AS vin3_av
   FROM   tbl
   WHERE  sn = grid.sn
   AND    date+time >= grid.ts
   AND    date+time <  grid.ts + interval '5 min'
   GROUP  BY mac, loc
   HAVING count(*) > 0  -- exclude empty slots
   ) t;

Create a multicolumn expression index to support this:

CRATE INDEX bar_idx ON tbl (sn, (date+time));

db<>fiddle here

But I would just use timestamp all along.

这篇关于如何在Postgres中获得时间间隔的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆