按数据间隔分组 [英] Group by data intervals
问题描述
我有一个表,用于存储一段时间内网络上的带宽使用情况。一栏将包含日期时间(主键),另一栏将记录带宽。每分钟记录一次数据。
I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.
如果用户每隔15分钟请求数据(在指定的开始和结束日期的24小时内) ),是否可以通过单个查询来获取我需要的数据,还是必须编写存储过程/游标才能做到这一点?然后,用户可以请求5分钟间隔的数据,等等。
If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.
我很可能会使用Postgres,但是还有其他更好的NOSQL选项吗?
I will most likely be using Postgres but are there other NOSQL options which would be better?
有什么想法吗?
推荐答案
WITH t AS (
SELECT ts, (random()*100)::int AS bandwidth
FROM generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
)
SELECT date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,count(*) AS rows_in_timeslice -- optional
,sum(bandwidth) AS sum_bandwidth
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND ts < '2012-09-03 00:00:00+02'::timestamptz -- careful with borders
GROUP BY 1, 2
ORDER BY 1, 2;
CTE t
提供的数据可能像表一样:一个时间戳 ts
每分钟,且带宽为
。 (您不需要该部分,而是使用表。)
The CTE t
provides data like your table might hold: one timestamp ts
per minute with a bandwidth
number. (You don't need that part, you work with your table instead.)
这里是一个非常相似的问题的非常相似的解决方案-详细说明了该特定问题聚合工作:
Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:
- date_trunc 5 minute interval in PostgreSQL
关于运行总和的类似问题-详细说明和所用各种功能的链接:
Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:
- PostgreSQL: running count of rows for a query 'by minute'
WITH -- same as above ...
SELECT DISTINCT ON (1,2)
date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,bandwidth AS bandwith_sample_at_min15
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz
AND ts < '2012-09-03 00:00:00+02'::timestamptz
ORDER BY 1, 2, ts DESC;
每15分钟检索一次一个未汇总样本-从窗口中的最后可用行。如果该行不丢失,这将是第15分钟。关键部分是 DISTINCT ON
和 ORDER BY
。
有关此处使用的技术的更多信息:
Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON
and ORDER BY
.
More information about the used technique here:
- Select first row in each GROUP BY group?
这篇关于按数据间隔分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!