按数据间隔分组 [英] Group by data intervals

查看:106
本文介绍了按数据间隔分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,用于存储一段时间内网络上的带宽使用情况。一栏将包含日期时间(主键),另一栏将记录带宽。每分钟记录一次数据。

I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.

如果用户每隔15分钟请求数据(在指定的开始和结束日期的24小时内) ),是否可以通过单个查询来获取我需要的数据,还是必须编写存储过程/游标才能做到这一点?然后,用户可以请求5分钟间隔的数据,等等。

If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.

我很可能会使用Postgres,但是还有其他更好的NOSQL选项吗?

I will most likely be using Postgres but are there other NOSQL options which would be better?

有什么想法吗?

推荐答案

WITH t AS (
   SELECT ts, (random()*100)::int AS bandwidth
   FROM   generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
   )

SELECT date_trunc('hour', ts) AS hour_stump
      ,(extract(minute FROM ts)::int / 15) AS min15_slot
      ,count(*) AS rows_in_timeslice               -- optional
      ,sum(bandwidth) AS sum_bandwidth
FROM   t
WHERE  ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND    ts <  '2012-09-03 00:00:00+02'::timestamptz -- careful with borders 
GROUP  BY 1, 2
ORDER  BY 1, 2;

CTE t 提供的数据可能像表一样:一个时间戳 ts 每分钟,且带宽为 。 (您不需要该部分,而是使用表。)

The CTE t provides data like your table might hold: one timestamp ts per minute with a bandwidth number. (You don't need that part, you work with your table instead.)

这里是一个非常相似的问题的非常相似的解决方案-详细说明了该特定问题聚合工作:

Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:

  • date_trunc 5 minute interval in PostgreSQL

关于运行总和的类似问题-详细说明和所用各种功能的链接:

Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:

  • PostgreSQL: running count of rows for a query 'by minute'
WITH -- same as above ...

SELECT DISTINCT ON (1,2)
       date_trunc('hour', ts) AS hour_stump
      ,(extract(minute FROM ts)::int / 15) AS min15_slot
      ,bandwidth AS bandwith_sample_at_min15
FROM   t
WHERE  ts >= '2012-09-02 00:00:00+02'::timestamptz
AND    ts <  '2012-09-03 00:00:00+02'::timestamptz
ORDER  BY 1, 2, ts DESC;

每15分钟检索一次一个未汇总样本-从窗口中的最后可用行。如果该行不丢失,这将是第15分钟。关键部分是 DISTINCT ON ORDER BY

有关此处使用的技术的更多信息:

Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON and ORDER BY.
More information about the used technique here:

  • Select first row in each GROUP BY group?

这篇关于按数据间隔分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆