在PostgreSQL中计算累计和 [英] Calculating Cumulative Sum in PostgreSQL
问题描述
我想查找字段的累积或运行量并将其从暂存插入表中.我的登台结构是这样的:
I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this:
ea_month id amount ea_year circle_id
April 92570 1000 2014 1
April 92571 3000 2014 2
April 92572 2000 2014 3
March 92573 3000 2014 1
March 92574 2500 2014 2
March 92575 3750 2014 3
February 92576 2000 2014 1
February 92577 2500 2014 2
February 92578 1450 2014 3
我希望目标表看起来像这样:
I want my target table to look something like this:
ea_month id amount ea_year circle_id cum_amt
February 92576 1000 2014 1 1000
March 92573 3000 2014 1 4000
April 92570 2000 2014 1 6000
February 92577 3000 2014 2 3000
March 92574 2500 2014 2 5500
April 92571 3750 2014 2 9250
February 92578 2000 2014 3 2000
March 92575 2500 2014 3 4500
April 92572 1450 2014 3 5950
我真的很困惑如何达到这个结果.我想使用PostgreSQL达到这个结果.
I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL.
有人可以建议如何实现这一结果集吗?
Can anyone suggest how to go about achieving this result-set?
推荐答案
基本上,您需要窗口功能在这里.如今,这是标准功能.除了正版窗口函数外,还可以通过添加OVER
子句,在Postgres中将 any 聚合函数用作窗口函数.
Basically, you need a window function here. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER
clause.
这里的特殊困难是正确获取分区和排序顺序:
The special difficulty here is to get partitions and sort order right:
SELECT ea_month, id, amount, ea_year, circle_id
, sum(amount) OVER (PARTITION BY circle_id ORDER BY ea_year, ea_month) AS cum_amt
FROM tbl
ORDER BY circle_id, month;
而否 GROUP BY
在这里.
The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:
默认成帧选项为
RANGE UNBOUNDED PRECEDING
,即 与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
相同.和ORDER BY
,这会将框架设置为分区中的所有行 从当前行的最后一个ORDER BY
对等点开始.
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer.
...这是您要累积或累积的总和.大胆强调我的观点.
... which is the cumulative or running sum you are after. Bold emphasis mine.
具有相同(circle_id, ea_year, ea_month)
的行在此查询中为对等" .所有这些都显示相同的运行总和,所有对等项都添加到总和中.但是我假设您的表是(circle_id, ea_year, ea_month)
上的UNIQUE
,那么排序顺序是确定性的,并且没有行具有对等项.
Rows with the same (circle_id, ea_year, ea_month)
are "peers" in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE
on (circle_id, ea_year, ea_month)
, then the sort order is deterministic and no row has peers.
现在,ORDER BY ... ea_month
不适用于月份名称的字符串 . Postgres会根据语言环境设置按字母顺序排序.
Now, ORDER BY ... ea_month
won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.
如果表中存储了实际的date
值,则可以正确排序.如果没有,我建议用ea_year
和ea_month
. rel ="noreferrer">表中的date
.
If you have actual date
values stored in your table you can sort properly. If not, I suggest to replace ea_year
and ea_month
with a single column mon
of type date
in your table.
-
使用
to_date()
:
Transform what you have with
to_date()
:
to_date(ea_year || ea_month , 'YYYYMonth') AS mon
要显示,可以使用 to_char()
:
For display you can get original strings with to_char()
:
to_char(mon, 'Month') AS ea_month
to_char(mon, 'YYYY') AS ea_year
虽然不幸的布局仍然存在,但这将起作用:
While stuck with the unfortunate layout, this will work:
SELECT ea_month, id, amount, ea_year, circle_id
, sum(amount) OVER (PARTITION BY circle_id ORDER BY mon) AS cum_amt
FROM (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS mon FROM tbl)
ORDER BY circle_id, mon;
这篇关于在PostgreSQL中计算累计和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!