选择15分钟窗口的数据-PostgreSQL [英] Select data for 15 minute windows - PostgreSQL
问题描述
好的,所以我在PostgreSQL中有一个这样的表:
Right so I have a table such as this in PostgreSQL:
timestamp duration
2013-04-03 15:44:58 4
2013-04-03 15:56:12 2
2013-04-03 16:13:17 9
2013-04-03 16:16:30 3
2013-04-03 16:29:52 1
2013-04-03 16:38:25 1
2013-04-03 16:41:37 9
2013-04-03 16:44:49 1
2013-04-03 17:01:07 9
2013-04-03 17:07:48 1
2013-04-03 17:11:00 2
2013-04-03 17:11:16 2
2013-04-03 17:15:17 1
2013-04-03 17:16:53 4
2013-04-03 17:20:37 9
2013-04-03 17:20:53 3
2013-04-03 17:25:48 3
2013-04-03 17:29:26 1
2013-04-03 17:32:38 9
2013-04-03 17:36:55 4
我想获得以下输出:
timestampwindowstart = 2013-04-03 15:44:58
timestampwindowstart = 2013-04-03 15:44:58
duration count
1 0
2 1
3 0
4 1
9 0
timestampwindowstart = 2013-04-03 15:59:58
timestampwindowstart = 2013-04-03 15:59:58
duration count
1 0
2 0
3 0
4 0
9 1
timestampwindowstart = 2013-04-03 16:14:58
timestampwindowstart = 2013-04-03 16:14:58
duration count
1 1
2 0
3 1
4 0
9 0
timestampwindowstart = 2013-04-03 16:29:58
timestampwindowstart = 2013-04-03 16:29:58
duration count
1 2
2 0
3 0
4 0
9 1
等...
所以基本上它在15分钟的窗口中循环显示时间戳并输出不同的持续时间值及其频率(计数)。 timestampwindowstart值是窗口的最早时间戳(即timestampwindowfinish = timestampwindowstart + 15分钟)
So basically it cycles through the timestamps in 15 minute windows and outputs the distinct duration values along with their frequency (count). The timestampwindowstart value is the earliest timestamp for the window (i.e timestampwindowfinish = timestampwindowstart + 15 minutes)
因此,我可以绘制15分钟间隔直方图...
This is so I can then plot the 15 minute interval histograms...
我曾尝试阅读,但要想动起来有点麻烦,而且我没有太多时间...
I have tried reading up but it is a bit complicated for me to get my head around and I don't have much time...
感谢您的帮助!
推荐答案
快速而肮脏的方式: http://sqlfiddle.com/#!1/bd2f6/21 我将列命名为 tstamp
而不是您的时间戳记
Quick and dirty way: http://sqlfiddle.com/#!1/bd2f6/21 I named my column tstamp
instead of your timestamp
with t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
(select duration from tmp group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmp on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
简要说明:
- 计算最小时间戳和最大时间戳
- 生成最小和最大时间戳之间的15分钟间隔
- 具有唯一持续时间值的交叉联接结果
- 左联接原始数据(左联接很重要,因为这将保留输出中所有可能的组合,并且会出现
null
其中给定间隔不存在持续时间。 - 汇总数据。
count(null)= 0
- Calculate minimum and maximum timestamp
- Generate 15 minutes intervals between minimum and maximum
- Cross join results with unique values of duration
- Left join original data (left join is important, because this will keep all possible combination in output and there will be
null
where duration does not exists for given interval. - Aggregate data.
count(null)=0
如果您有更多的表并且应该对它们的联合应用算法,假设我们有三个表 tmp1 ,tmp2,tmp3
全部包含列 tstamp
和 duration
的列。解决方案:
In case you have more tables and the algorithm should be applied on their union. Suppose we have three tables tmp1, tmp2, tmp3
all with columns tstamp
and duration
. The we can extend the previous solution:
with
tmpout as (
select * from tmp1 union all
select * from tmp2 union all
select * from tmp3
)
,t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
(select duration from tmpout group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmpout on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
您应该真的知道 with
子句。对于PostgreSQL中的任何数据分析来说,它都是无价之宝。
You should really know with
clause in PostgreSQL. It is invaluable concept for any data analysis in PostgreSQL.
这篇关于选择15分钟窗口的数据-PostgreSQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!