具有任意精度(低至毫秒)的 Postgresql SQL GROUP BY 时间间隔 [英] Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
问题描述
我将测量数据存储在以下结构中:
I have my measurement data stored into the following structure:
CREATE TABLE measurements(
measured_at TIMESTAMPTZ,
val INTEGER
);
我已经知道使用
(a) date_trunc('hour',measured_at)
和
(b) generate_series
我可以通过以下方式汇总我的数据:
I would be able to aggregate my data by:
microseconds,
milliseconds
.
.
.
但是是否可以将数据聚合 5 分钟或任意数量的秒数?是否可以以任意倍数的秒数聚合测量数据?
But is it possible to aggregate the data by 5 minutes or let's say an arbitrary amount of seconds? Is it possible to aggregate measured data by an arbitrary multiple of seconds?
我需要按不同时间分辨率汇总的数据将它们输入 FFT 或 AR 模型,以便查看可能的季节性.
I need the data aggregated by different time resolutions to feed them into a FFT or an AR-Model in order to see possible seasonalities.
推荐答案
您可以通过添加由 generate_series() 创建的间隔来生成桶"表.此 SQL 语句将在您的数据中生成第一天的五分钟桶表(min(measured_at)
的值).
You can generate a table of "buckets" by adding intervals created by generate_series(). This SQL statement will generate a table of five-minute buckets for the first day (the value of min(measured_at)
) in your data.
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n
将那个语句包装在一个公用表表达式中,您可以将其连接和分组,就好像它是一个基表一样.
Wrap that statement in a common table expression, and you can join and group on it as if it were a base table.
with five_min_intervals as (
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n
)
select f.start_time, f.end_time, avg(m.val) avg_val
from measurements m
right join five_min_intervals f
on m.measured_at >= f.start_time and m.measured_at < f.end_time
group by f.start_time, f.end_time
order by f.start_time
按任意秒数分组是类似的——使用date_trunc()
.
Grouping by an arbitrary number of seconds is similar--use date_trunc()
.
generate_series() 的更一般用途可让您避免猜测五分钟时间段的上限.在实践中,您可能会将其构建为视图或函数.您可能会从基表中获得更好的性能.
A more general use of generate_series() lets you avoid guessing the upper limit for five-minute buckets. In practice, you'd probably build this as a view or a function. You might get better performance from a base table.
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, ((select max(measured_at)::date - min(measured_at)::date from measurements) + 1)*24*60, 5) n;
这篇关于具有任意精度(低至毫秒)的 Postgresql SQL GROUP BY 时间间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!