PostgreSQL:运行查询行的计数“分钟” [英] PostgreSQL: running count of rows for a query 'by minute'
问题描述
我需要查询每分钟的总计数,直到那一分钟。
I need to query for each minute the total count of rows up to that minute.
我迄今为止所能达到的最好的并不是这样。它返回每分钟的计数,而不是每分钟的总计数:
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
推荐答案
只有活动时间
最短
不会比这更简单:
Only minutes with activity
Shortest
Won't get much simpler than this:
SELECT DISTINCT
date_trunc('minute', "when") AS minute
,count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
-
使用 date_trunc()。
不要在查询中包含
id
,因为你想要GROUP BY
分片。Don't include
id
in the query, since you want toGROUP BY
minute slices.count()
主要用作简单的聚合功能。附加OVER
子句使其成为窗口功能。在窗口定义中省略PARTITION BY
- 您希望在所有行上运行计数。默认情况下,从ORDER BY
定义的当前行的第一行到最后一个对等体。 我引用手册:count()
is mostly used as plain aggregate function. Appending anOVER
clause makes it a window function. OmitPARTITION BY
in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined byORDER BY
. I quote the manual:
默认框架选项是
RANGE UNBOUNDED PRECEDING
,这是
相同作为RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW;
它将
框架设置为从分区开始的所有行,通过当前
行的最后一个对等体ORDER BY
订购。这恰好恰好是
使用
count(*)
而不是count(id)
。它更适合你的问题(行数)。通常比count(id)
更快。而且,虽然我们可能假设id
是NOT NULL
,但是在问题中还没有指定,所以<$严格来说,c $ c> count(id)是错误。Use
count(*)
rather thancount(id)
. It fits your question better ("count of rows"). It is generally slightly faster thancount(id)
. And, while we might assume thatid
isNOT NULL
, it has not been specified in the question, socount(id)
is wrong, strictly speaking.你不能
GROUP BY
在相同查询级别的分片。在窗口函数之前应用聚合函数,窗口函数count(*)
每分钟只能看到1行。
然而,您可以SELECT DISTINCT
,因为窗口函数后应用DISTINCT
。You can't
GROUP BY
minute slices at the same query level. Aggregate functions are applied before window functions, the window functioncount(*)
would only see 1 row per minute this way.
You can, however,SELECT DISTINCT
, becauseDISTINCT
is applied after window functions.ORDER BY 1
只是的缩写ORDER BY date_trunc('分钟',何时)
这里。
1
用作参考<$ c中的第一个表达式的位置参数$ c> SELECT 子句ORDER BY 1
is just shorthand forORDER BY date_trunc('minute', "when")
here.
1
serves as positional parameter referencing the 1st expression in theSELECT
clause.使用 to_char(),如果您需要美化结果。像这样:
Use to_char() if you need to beautify the result. Like this:
SELECT DISTINCT to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute ,count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct FROM mytable ORDER BY date_trunc('minute', "when");
最快
Fastest
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct FROM ( SELECT date_trunc('minute', "when") AS minute ,count(*) AS minute_ct FROM tbl GROUP BY 1 ) sub ORDER BY 1;
很像上述,但是:
-
我使用一个子查询来折叠每分钟的行数。
I use a subquery to fold and count rows per minute.
这样我们每分钟可以得到不同的行外部查询和
DISTINCT
步骤是不需要的。This way we get distinct rows per minute in the outer query and the
DISTINCT
step is not needed.使用
sum ()
作为窗口聚合函数现在从子查询中添加计数。Use
sum()
as window aggregate function now to add up the counts from the subquery.我发现这个速度要比每分钟很多行快得多。
I found this to be substantially faster with many rows per minute.
@GabiMe在评论中询问如何在每个
分钟
中获得一行框架,包括没有事件发生的地方(基表中没有行):@GabiMe asked in a comment how to get one row for every
minute
in the time frame, including those where no event occurs (no row in base table):SELECT DISTINCT m.minute, count(c.minute) OVER (ORDER BY m.minute) AS running_ct FROM (SELECT generate_series(date_trunc('minute', min("when")) , max(minute), '1 min') AS minute FROM tbl) m LEFT JOIN (SELECT date_trunc('minute', "when") AS minute FROM tbl) c USING (minute) ORDER BY 1;
-
在每个分钟之间生成一行,第一个和最后一个事件与
generate_series()
。在一个子查询中合并generate_series()
与聚合函数。Generate a row for every minute in the time frame between the first and the last event with
generate_series()
. Combinegenerate_series()
with aggregate functions in one subquery.code> LEFT JOIN 将所有时间戳缩短到分钟和计数。
NULL
值(没有行存在)不添加到运行计数。LEFT JOIN
to all timestamps truncated to the minute and count.NULL
values (where no row exists) do not add to the running count.使用CTE:
WITH cte AS ( SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct FROM tbl GROUP BY 1 ) SELECT m.minute , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct FROM (SELECT generate_series(date_trunc('minute', min("when")) ,max(minute), '1 min') AS minute FROM cte) m LEFT JOIN cte c USING (minute) ORDER BY 1;
很像上述,但是:
-
同样,在第一步中每分钟折叠和计数行数,省略以后
DISTINCT
的需要。
与
count()
不同,sum()可以返回NULL
。所以我把它包裹在COALESCE中以获得0。Different than
count()
, sum() can returnNULL
. So I wrapped it in COALESCE to get 0 instead.每分钟有很多行和几行,当 此版本与子查询相比,
上的索引应该更快:
With many rows and few rows per minute, and with an index on
"when"
this version with a subquery should be even faster:SELECT m.minute , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct FROM (SELECT generate_series(date_trunc('minute', min("when")) , max("when"), '1 min') AS minute FROM tbl) m LEFT JOIN ( SELECT date_trunc('minute', "when") AS minute ,count(*) AS minute_ct FROM tbl GROUP BY 1 ) c USING (minute) ORDER BY 1;
- 这是我用Postgres 9.1测试的几个变体中最快的 - 9.4。
这篇关于PostgreSQL:运行查询行的计数“分钟”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-
-