在 PostgreSQL 中计算累积和 [英] Calculating Cumulative Sum in PostgreSQL

查看:43
本文介绍了在 PostgreSQL 中计算累积和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到字段的累积或运行量并将其从登台插入到表中.我的分期结构是这样的:

I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this:

ea_month    id       amount    ea_year    circle_id
April       92570    1000      2014        1
April       92571    3000      2014        2
April       92572    2000      2014        3
March       92573    3000      2014        1
March       92574    2500      2014        2
March       92575    3750      2014        3
February    92576    2000      2014        1
February    92577    2500      2014        2
February    92578    1450      2014        3          

我希望我的目标表看起来像这样:

I want my target table to look something like this:

ea_month    id       amount    ea_year    circle_id    cum_amt
February    92576    1000      2014        1           1000 
March       92573    3000      2014        1           4000
April       92570    2000      2014        1           6000
February    92577    3000      2014        2           3000
March       92574    2500      2014        2           5500
April       92571    3750      2014        2           9250
February    92578    2000      2014        3           2000
March       92575    2500      2014        3           4500
April       92572    1450      2014        3           5950

我真的很困惑如何去实现这个结果.我想用 PostgreSQL 实现这个结果.

I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL.

谁能建议如何去实现这个结果集?

Can anyone suggest how to go about achieving this result-set?

推荐答案

基本上,你需要一个 窗口函数.这是当今的标准功能.除了真正的窗口函数之外,您还可以通过附加一个 OVER 子句,将 any 聚合函数用作 Postgres 中的窗口函数.

Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER clause.

这里的特殊困难是获得正确的分区和排序顺序:

The special difficulty here is to get partitions and sort order right:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id
                         ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, month;

而且没有 GROUP BY.

每行的总和是从分区的第一行到当前行计算的——或者引用手册 准确地说:

The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:

默认的成帧选项是RANGE UNBOUNDED PRECEDING,即与 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 相同.和ORDER BY,这将框架设置为分区中的所有行从当前行的最后一个ORDER BY peer开始.

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

... 这是您所追求的累计或运行总和.大胆强调我的.

... which is the cumulative or running sum you are after. Bold emphasis mine.

具有相同 (circle_id, ea_year, ea_month) 的行在此查询中是 peers".所有这些都显示相同的运行总和,所有对等点都添加到总和中.但我假设你的表在 (circle_id, ea_year, ea_month) 上是 UNIQUE,那么排序顺序是确定性的,没有行有对等点.

Rows with the same (circle_id, ea_year, ea_month) are "peers" in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE on (circle_id, ea_year, ea_month), then the sort order is deterministic and no row has peers.

Postgres 11 添加了工具来包含/排除具有新 frame_exclusion 选项.见:

Postgres 11 added tools to include / exclude peers with the new frame_exclusion options. See:

现在,ORDER BY ... ea_month 不能处理月份名称的字符串.Postgres 会根据区域设置按字母顺序排序.

Now, ORDER BY ... ea_month won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.

如果您的表中存储了实际的 date 值,则可以正确排序.如果没有,我建议用 date 在您的表格中.

If you have actual date values stored in your table you can sort properly. If not, I suggest to replace ea_year and ea_month with a single column mon of type date in your table.

  to_date(ea_year || ea_month , 'YYYYMonth') AS mon

  • 为了显示,您可以使用 to_char 获取原始字符串():

      to_char(mon, 'Month') AS ea_month
      to_char(mon, 'YYYY') AS ea_year
    

  • 虽然坚持不幸的设计,但这会起作用:

    While stuck with the unfortunate design, this will work:

    SELECT ea_month, id, amount, ea_year, circle_id
         , sum(amount) OVER (PARTITION BY circle_id ORDER BY mon) AS cum_amt
    FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS mon FROM tbl)
    ORDER  BY circle_id, mon;
    

    这篇关于在 PostgreSQL 中计算累积和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆