选择15分钟窗口的数据-PostgreSQL [英] Select data for 15 minute windows - PostgreSQL

查看:296
本文介绍了选择15分钟窗口的数据-PostgreSQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我在PostgreSQL中有一个这样的表:

Right so I have a table such as this in PostgreSQL:

timestamp              duration

2013-04-03 15:44:58    4
2013-04-03 15:56:12    2
2013-04-03 16:13:17    9
2013-04-03 16:16:30    3
2013-04-03 16:29:52    1
2013-04-03 16:38:25    1
2013-04-03 16:41:37    9
2013-04-03 16:44:49    1
2013-04-03 17:01:07    9
2013-04-03 17:07:48    1
2013-04-03 17:11:00    2
2013-04-03 17:11:16    2
2013-04-03 17:15:17    1
2013-04-03 17:16:53    4
2013-04-03 17:20:37    9
2013-04-03 17:20:53    3
2013-04-03 17:25:48    3
2013-04-03 17:29:26    1
2013-04-03 17:32:38    9
2013-04-03 17:36:55    4

我想获得以下输出:

timestampwindowstart = 2013-04-03 15:44:58

timestampwindowstart = 2013-04-03 15:44:58

duration    count
1           0
2           1
3           0
4           1
9           0

timestampwindowstart = 2013-04-03 15:59:58

timestampwindowstart = 2013-04-03 15:59:58

duration    count
1           0
2           0
3           0
4           0
9           1

timestampwindowstart = 2013-04-03 16:14:58

timestampwindowstart = 2013-04-03 16:14:58

duration    count
1           1
2           0
3           1
4           0
9           0

timestampwindowstart = 2013-04-03 16:29:58

timestampwindowstart = 2013-04-03 16:29:58

duration    count
1           2
2           0
3           0
4           0
9           1

等...

所以基本上它在15分钟的窗口中循环显示时间戳并输出不同的持续时间值及其频率(计数)。 timestampwindowstart值是窗口的最早时间戳(即timestampwindowfinish = timestampwindowstart + 15分钟)

So basically it cycles through the timestamps in 15 minute windows and outputs the distinct duration values along with their frequency (count). The timestampwindowstart value is the earliest timestamp for the window (i.e timestampwindowfinish = timestampwindowstart + 15 minutes)

因此,我可以绘制15分钟间隔直方图...

This is so I can then plot the 15 minute interval histograms...

我曾尝试阅读,但要想动起来有点麻烦,而且我没有太多时间...

I have tried reading up but it is a bit complicated for me to get my head around and I don't have much time...

感谢您的帮助!

推荐答案

快速而肮脏的方式: http://sqlfiddle.com/#!1/bd2f6/21 我将列命名为 tstamp 而不是您的时间戳记

Quick and dirty way: http://sqlfiddle.com/#!1/bd2f6/21 I named my column tstamp instead of your timestamp

with t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
    (select duration from tmp group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmp on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

简要说明:


  1. 计算最小时间戳和最大时间戳

  2. 生成最小和最大时间戳之间的15分钟间隔

  3. 具有唯一持续时间值的交叉联接结果

  4. 左联接原始数据(左联接很重要,因为这将保留输出中所有可能的组合,并且会出现 null 其中给定间隔不存在持续时间。

  5. 汇总数据。 count(null)= 0

  1. Calculate minimum and maximum timestamp
  2. Generate 15 minutes intervals between minimum and maximum
  3. Cross join results with unique values of duration
  4. Left join original data (left join is important, because this will keep all possible combination in output and there will be null where duration does not exists for given interval.
  5. Aggregate data. count(null)=0

如果您有更多的表并且应该对它们的联合应用算法,假设我们有三个表 tmp1 ,tmp2,tmp3 全部包含列 tstamp duration 的列。解决方案:

In case you have more tables and the algorithm should be applied on their union. Suppose we have three tables tmp1, tmp2, tmp3 all with columns tstamp and duration. The we can extend the previous solution:

with 

tmpout as (
  select * from tmp1 union all
  select * from tmp2 union all
  select * from tmp3
)

,t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
    (select duration from tmpout group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmpout on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

您应该真的知道 with 子句。对于PostgreSQL中的任何数据分析来说,它都是无价之宝。

You should really know with clause in PostgreSQL. It is invaluable concept for any data analysis in PostgreSQL.

这篇关于选择15分钟窗口的数据-PostgreSQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆