PostgreSQL:运行查询行的计数“分钟” [英] PostgreSQL: running count of rows for a query 'by minute'

查看:147
本文介绍了PostgreSQL:运行查询行的计数“分钟”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要查询每分钟的总计数,直到那一分钟。

I need to query for each minute the total count of rows up to that minute.

我迄今为止所能达到的最好的并不是这样。它返回每分钟的计数,而不是每分钟的总计数:

The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute


推荐答案

只有活动时间



最短



不会比这更简单:

Only minutes with activity

Shortest

Won't get much simpler than this:

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
      ,count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;




  • 使用 date_trunc()

    不要在查询中包含 id ,因为你想要 GROUP BY 分片。

    Don't include id in the query, since you want to GROUP BY minute slices.

    count()主要用作简单的聚合功能。附加 OVER 子句使其成为窗口功能。在窗口定义中省略 PARTITION BY - 您希望在所有行上运行计数。默认情况下,从 ORDER BY 定义的当前行的第一行到最后一个对等体。 我引用手册

    count() is mostly used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. I quote the manual:


    默认框架选项是 RANGE UNBOUNDED PRECEDING ,这是
    相同作为 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW; 它将
    框架设置为从分区开始的所有行,通过当前
    行的最后一个对等体 ORDER BY 订购。

    这恰好恰好是

    使用 count(*)而不是 count(id)。它更适合你的问题(行数)。通常比 count(id)更快。而且,虽然我们可能假设 id NOT NULL ,但是在问题中还没有指定,所以<$严格来说,c $ c> count(id)是错误

    Use count(*) rather than count(id). It fits your question better ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking.

    你不能 GROUP BY 在相同查询级别的分片。在窗口函数之前应用聚合函数,窗口函数 count(*)每分钟只能看到1行。

    然而,您可以 SELECT DISTINCT ,因为窗口函数后应用 DISTINCT

    You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
    You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

    ORDER BY 1 只是的缩写ORDER BY date_trunc('分钟',何时)这里。

    1 用作参考<$ c中的第一个表达式的位置参数$ c> SELECT 子句

    ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
    1 serves as positional parameter referencing the 1st expression in the SELECT clause.

    使用 to_char(),如果您需要美化结果。像这样:

    Use to_char() if you need to beautify the result. Like this:

    SELECT DISTINCT
           to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
          ,count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
    FROM   mytable
    ORDER  BY date_trunc('minute', "when");
    



    最快



    Fastest

    SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
    FROM  (
       SELECT date_trunc('minute', "when") AS minute
             ,count(*) AS minute_ct
       FROM   tbl
       GROUP  BY 1
       ) sub
    ORDER  BY 1;
    

    很像上述,但是:


    • 我使用一个子查询来折叠每分钟的行数。

    • I use a subquery to fold and count rows per minute.

    这样我们每分钟可以得到不同的行外部查询和 DISTINCT 步骤是不需要的。

    This way we get distinct rows per minute in the outer query and the DISTINCT step is not needed.

    使用 sum ()作为窗口聚合函数现在从子查询中添加计数。

    Use sum() as window aggregate function now to add up the counts from the subquery.

    我发现这个速度要比每分钟很多行快得多。

    I found this to be substantially faster with many rows per minute.

    @GabiMe在评论中询问如何在每个 分钟中获得一行框架,包括没有事件发生的地方(基表中没有行):

    @GabiMe asked in a comment how to get one row for every minute in the time frame, including those where no event occurs (no row in base table):

    SELECT DISTINCT
           m.minute, count(c.minute) OVER (ORDER BY m.minute) AS running_ct
    FROM  (SELECT generate_series(date_trunc('minute', min("when"))
                                , max(minute), '1 min') AS minute FROM tbl) m
    LEFT   JOIN (SELECT date_trunc('minute', "when") AS minute FROM tbl) c
                                                            USING (minute)
    ORDER  BY 1;
    




    • 在每个分钟之间生成一行,第一个和最后一个事件与 generate_series() 。在一个子查询中合并 generate_series()与聚合函数。

      • Generate a row for every minute in the time frame between the first and the last event with generate_series(). Combine generate_series() with aggregate functions in one subquery.

        code> LEFT JOIN 将所有时间戳缩短到分钟和计数。 NULL 值(没有行存在)不添加到运行计数。

        LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

        使用CTE:

        WITH cte AS (
           SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
           FROM   tbl
           GROUP  BY 1
           ) 
        SELECT m.minute
             , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
        FROM  (SELECT generate_series(date_trunc('minute', min("when"))
                                     ,max(minute), '1 min') AS minute FROM cte) m
        LEFT   JOIN cte c USING (minute)
        ORDER  BY 1;
        

        很像上述,但是:


        • 同样,在第一步中每分钟折叠和计数行数,省略以后 DISTINCT 的需要。

        count()不同,sum()可以返回 NULL 。所以我把它包裹在COALESCE中以获得0。

        Different than count(), sum() can return NULL. So I wrapped it in COALESCE to get 0 instead.

        每分钟有很多行和几行,当 此版本与子查询相比,上的索引应该更快:

        With many rows and few rows per minute, and with an index on "when" this version with a subquery should be even faster:

        SELECT m.minute
             , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
        FROM  (SELECT generate_series(date_trunc('minute', min("when"))
                                    , max("when"), '1 min') AS minute FROM tbl) m
        LEFT   JOIN (
           SELECT date_trunc('minute', "when") AS minute
                 ,count(*) AS minute_ct
           FROM   tbl
           GROUP  BY 1
           ) c USING (minute)
        ORDER  BY 1;
        




        • 这是我用Postgres 9.1测试的几个变体中最快的 - 9.4。

        • 这篇关于PostgreSQL:运行查询行的计数“分钟”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆