PostgreSQL:按分钟运行查询的行数 [英] PostgreSQL: running count of rows for a query 'by minute'

查看:29
本文介绍了PostgreSQL:按分钟运行查询的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要每分钟查询到该分钟的总行数.

I need to query for each minute the total count of rows up to that minute.

我目前所能达到的最好成绩并不能解决问题.它返回每分钟计数,而不是每分钟的总计数:

The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute

推荐答案

只返回活动分钟

最短

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;

使用 date_trunc(),它返回的正是你所需要的.

Use date_trunc(), it returns exactly what you need.

不要在查询中包含 id,因为您想要 GROUP BY 分钟切片.

Don't include id in the query, since you want to GROUP BY minute slices.

count() 通常用作普通的 聚合函数.附加 OVER 子句使其成为 窗口函数.省略窗口定义中的 PARTITION BY - 您需要对所有行的运行计数.默认情况下,从 ORDER BY 定义的当前行的第一行到最后一个同行计数.手册:

count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. The manual:

默认的成帧选项是RANGE UNBOUNDED PRECEDING,它是与无界前行和当前行之间的范围相同.使用 ORDER BY,这将框架设置为分区启动时的所有行通过当前行的最后一个 ORDER BY peer.

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

而这恰好正是您所需要的.

使用 count(*) 而不是 count(id).它更适合您的问题(行数").它通常比 count(id) 稍微.而且,虽然我们可能假设 idNOT NULL,但问题中并没有指定它,所以 count(id)错误,严格来说,因为NULL值没有用count(id)计算.

Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).

您不能在同一查询级别GROUP BY 分钟切片.聚合函数窗口函数之前应用,这样窗口函数count(*)每分钟只能看到1行.
但是,您可以 SELECT DISTINCT,因为 DISTINCT 应用在 窗口函数之后.

You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

ORDER BY 1 只是此处 ORDER BY date_trunc('minute', "when") 的简写.
1 是对 SELECT 列表中第一个表达式的位置引用.

ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.

使用to_char()如果您需要格式化结果.喜欢:

Use to_char() if you need to format the result. Like:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

最快

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

很像上面的,但是:

我使用子查询来聚合和计算每分钟的行数.这样我们每分钟得到 1 行,而在外部 SELECT 中没有 DISTINCT.

I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.

现在使用 sum() 作为窗口聚合函数来将来自子查询的计数相加.

Use sum() as window aggregate function now to add up the counts from the subquery.

我发现这在每分钟多行的情况下要快得多.

I found this to be substantially faster with many rows per minute.

@GabiMe 询问评论如何在时间范围内 分钟获得一行,包括没有发生事件的那些(基表中没有行):

@GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;

使用 generate_series() - 这里直接基于来自子查询的聚合值.

Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.

LEFT JOIN 到所有被截断到分钟和计数的时间戳.NULL 值(不存在行)不会添加到运行计数中.

LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

使用 CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;

再次,在第一步中每分钟聚合和计数行,它省略了后面DISTINCT的需要.

Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

count()不同,sum()可以返回NULL.默认为 0COALESCE.

Different from count(), sum() can return NULL. Default to 0 with COALESCE.

"when" 上有很多行和索引,这个带有子查询的版本在我用 Postgres 9.1 - 9.4 测试的几个变体中是最快的:

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;

这篇关于PostgreSQL:按分钟运行查询的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆