如何在Postgres中获得时间间隔的平均值 [英] How to get average values for time intervals in Postgres
问题描述
我使用的是PostgreSQL 9.6。我有这样一个表:
mac sn loc时间日期vin1 vin2 vin3
1a34 4as11111111 aaaa 7:06: 18 1/1/2018 447.42 472.32 682.59
1a34 4as11111111 aaaa 7:06:43 1/1/2018 455.97 476.25 682.59
1a34 4as11111111 aaaa 7:07:35 1/1/2018 470.88 484.2 682.5
我需要计算 vin1
,在300秒(5分钟)的时间间隔内, vin2
, vin3
。例如,从第一次(7:06:18 - 7:11:18)开始,范围内的日期。我可以用这个查询选择我需要的数据:
pre $ select $ from table
where sn ='4as11111111'和2018-01-01和2018-01-02之间的日期;
但我不知道如何按照300秒的时间间隔对它进行分组,并计算<$对于那些5分钟的间隔,c $ c> vin1 , vin2
, vin3
得到这样的东西:
mac sn loc时间日期vin1_av vin2_av vin3_av
1a34 4as11111111 aaaa 7:06:18 1 / 1/2018 450.0 480.32 600.59
1a34 4as11111111 aaaa 7:11:18 1/1/2018 460.0 490.25 782.59
1a34 4as11111111 aaaa 7:16:18 1/1/2018 470.88 500.2 600.5 $ b $
解决方案数据库设计
尽管您可以 使用单独的日期
和 time
列,那么与单个 timestamp
列。我会改编:
ALTER TABLE tbl ADD列ts timestamp;
UPDATE tbl SET ts =日期+时间; - 假设实际日期和时间类型
ALTER TABLE tbl DROP列日期,DROP列时间;
如果日期和时间不是实际的日期
和 time
数据类型,使用 to_timestamp()
。相关:
查询
然后查询更简单一些:
SELECT *
FROM(
SELECT sn,generate_series(最小值(ts),最大值(ts),间隔'5min')AS ts
FROM tbl
WHERE sn ='4as11111111'
AND ts> ='2018-01-01'
AND ts''2018-01-02'
GROUP BY 1
)grid
交叉连接横向(
SELECT round(avg(vin1),2) AS vin1_av
,round(avg(vin2),2)AS vin2_av
,round(avg(vin3),2)AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts> = grid.ts
AND ts< grid.ts + interval'5 min'
)avg;
db<> fiddle here 在第一个子查询中生成一个开始时间网格 grid
,从给定时间范围内的第一行到最后一个 qualifying 行行运行。
使用 LATERAL
join连接到每个分区中的行,并立即在子查询 avg
中汇总平均值。由于聚合,即使没有找到任何条目,它总是返回一行。在这种情况下,平均值默认为 NULL
。
结果包括第一个和最后一个符合条件的行之间的所有时间间隔给定的时间框架。各种其他结果组成也是有意义的。就像在给定的时间范围内包含所有时隙一样,或者只是包含实际值的时隙。所有可能的,我必须选择一个解释。
索引
至少有这个多列索引: p>
CRATE INDEX foo_idx ON tbl(sn,ts);
或(sn,ts,vin1,vin2,vin3) code>允许仅索引扫描 - 如果满足某些前提条件,特别是如果表格行比示例中宽的多。
密切相关: p>
根据您的原始表格
根据评论的要求和澄清,以及稍后更新再次进入问题包括列 mac
和 loc
。我假设你需要每个(mac,loc)
的单独平均值。
date code>和 time
仍然是单独的列,vin *列是 float
类型,行:
SELECT t.mac,grid.sn,t.loc,grid.ts :: time AS time,grid。 ts :: date AS日期
,t.vin1_av,t.vin2_av,t.vin3_av
FROM(
SELECT sn,generate_series(min(date + time) - 动态生成时间戳
,max(日期+时间)
,时间间隔'5分钟')AS ts
FROM tbl
WHERE sn ='4as11111111'
AND date + time> ='2018-01-01 0:0'
和日期+时间''2018-01-02 0:0'
GROUP BY 1
)grid
CROSS JOIN LATERAL(
SELECT mac,loc
,round(avg(vin1):: numeric,2)AS vin1_av - 为round()转换为数字
,round(avg(vin2): :数字,2)AS vin2_av - 但舍入是选项al
,round(avg(vin3):: numeric,2)AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND date + time> = grid.ts
和日期+时间< grid.ts + interval'5 min'
GROUP BY mac,loc
HAVING count(*)> 0 - 排除空插槽
)t;
创建一个多列表达式索引来支持它:
CRATE INDEX bar_idx ON tbl(sn,(date + time));
db<> fiddle 这里
但是我会用 timestamp
一直。
I'm using PostgreSQL 9.6. I have a table like this:
mac sn loc time date vin1 vin2 vin3
1a34 4as11111111 aaaa 7:06:18 1/1/2018 447.42 472.32 682.59
1a34 4as11111111 aaaa 7:06:43 1/1/2018 455.97 476.25 682.59
1a34 4as11111111 aaaa 7:07:35 1/1/2018 470.88 484.2 682.5
I need to calculate the average of the vin1
, vin2
, vin3
within time intervals of 300 sec (5 min). For example, starting from the first time (7:06:18 - 7:11:18), for the dates in range. I can select the data I need with this query:
select * from table
where sn='4as11111111' and date between '2018-01-01' and '2018-01-02';
But I don't know how to group it by 300 sec time intervals and calculate average for vin1
, vin2
, vin3
columns for those 5 min interval, to get something like this:
mac sn loc time date vin1_av vin2_av vin3_av
1a34 4as11111111 aaaa 7:06:18 1/1/2018 450.0 480.32 600.59
1a34 4as11111111 aaaa 7:11:18 1/1/2018 460.0 490.25 782.59
1a34 4as11111111 aaaa 7:16:18 1/1/2018 470.88 500.2 600.5
Any help would be greatly appreciated.
解决方案 DB design
While you can work with separate date
and time
columns, there is really no advantage over a single timestamp
column. I would adapt:
ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;
If date and time are not actual date
and time
data types, use to_timestamp()
. Related:
Query
Then the query is a bit simpler:
SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;
db<>fiddle here
Generate a grid of start times in the first subquery grid
, running from the first to the last qualifying row in the given time frame.
Join to rows that fall in each partition with a LATERAL
join and immediately aggregate averages in the subquery avg
. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL
in this case.
The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.
Index
At least have this multicolumn index:
CRATE INDEX foo_idx ON tbl (sn, ts);
Or on (sn, ts, vin1, vin2, vin3)
to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.
Closely related:
- Slow LEFT JOIN on CTE with time intervals
- Best way to count records by arbitrary time intervals in Rails+Postgres
Based on your original table
As requested and clarified in the comment, and later updated again in the question to include the columns mac
and loc
. I assume you want separate averages per (mac, loc)
.
date
and time
are still separate columns, vin* columns are type float
, and exclude time slots without rows:
SELECT t.mac, grid.sn, t.loc, grid.ts::time AS time, grid.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (
SELECT sn, generate_series(min(date+time) -- generate timestamp on the fly
, max(date+time)
, interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND date+time >= '2018-01-01 0:0'
AND date+time < '2018-01-02 0:0'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND date+time >= grid.ts
AND date+time < grid.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;
Create a multicolumn expression index to support this:
CRATE INDEX bar_idx ON tbl (sn, (date+time));
db<>fiddle here
But I would just use timestamp
all along.
这篇关于如何在Postgres中获得时间间隔的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文