Postgres窗口函数和按异常分组 [英] Postgres window function and group by exception
问题描述
我正在尝试建立一个查询,该查询将检索一段时间内用户的统计信息(利润/亏损)作为累积结果.
I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time.
这是我到目前为止的查询:
Here's the query I have so far:
SELECT p.name, e.date,
sum(sp.payout) OVER (ORDER BY e.date)
- sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss"
FROM result r
JOIN game g ON r.game_id = g.game_id
JOIN event e ON g.event_id = e.event_id
JOIN structure s ON g.structure_id = s.structure_id
JOIN structure_payout sp ON g.structure_id = sp.structure_id
AND r.position = sp.position
JOIN player p ON r.player_id = p.player_id
WHERE p.player_id = 17
GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin
ORDER BY p.name, e.date ASC
查询将运行.但是,结果略有不正确.原因是event
可以具有多个游戏(具有不同的sp.payouts
).因此,如果用户在具有不同支出的事件中有2个结果(例如,每个事件有4个游戏,并且用户从一个事件中获得20英镑,而从另一个事件中获得40英镑),则上面的结果会显示为多行.
The query will run. However, the result is slightly incorrect. The reason is that an event
can have multiple games (with different sp.payouts
). Therefore, the above comes out with multiple rows if a user has 2 results in an event with different payouts (i.e. there are 4 games per event, and a user gets £20 from one, and £40 from another).
显而易见的解决方案是将GROUP BY
修改为:
The obvious solution would be to amend the GROUP BY
to:
GROUP BY p.name, e.date, e.event_id
但是,Postgres抱怨这一点,因为它似乎没有认识到sp.payout
和s.buyin
在聚合函数中.我收到错误消息:
However, Postgres complains at this as it doesn't appear to be recognizing that sp.payout
and s.buyin
are inside an aggregate function. I get the error:
"sp.payout"列必须出现在GROUP BY子句中或在 聚合函数
column "sp.payout" must appear in the GROUP BY clause or be used in an aggregate function
我在Ubuntu Linux服务器上运行9.1.
我是否缺少某些东西,或者这可能是Postgres中的真正缺陷?
I'm running 9.1 on Ubuntu Linux server.
Am I missing something, or could this be a genuine defect in Postgres?
推荐答案
实际上,您不是不是,而是使用聚合函数.您正在使用 窗口功能 .这就是PostgreSQL要求sp.payout
和s.buyin
包含在GROUP BY
子句中的原因.
You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands sp.payout
and s.buyin
to be included in the GROUP BY
clause.
通过添加OVER
子句,聚合函数sum()
变为窗口函数,该窗口函数在保留所有行的同时对每个分区的值进行聚合.
By appending an OVER
clause, the aggregate function sum()
is turned into a window function, which aggregates values per partition while keeping all rows.
您可以组合窗口功能和聚合功能.首先应用聚合.从您的描述中我不明白您希望如何处理每个事件的多个支出/买入.推测一下,我计算每个事件的总和. 现在我可以从GROUP BY
子句中删除sp.payout
和s.buyin
,并在player
和event
中获得一行:
You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove sp.payout
and s.buyin
from the GROUP BY
clause and get one row per player
and event
:
SELECT p.name
, e.event_id
, e.date
, sum(sum(sp.payout)) OVER w
- sum(sum(s.buyin )) OVER w AS "Profit/Loss"
FROM player p
JOIN result r ON r.player_id = p.player_id
JOIN game g ON g.game_id = r.game_id
JOIN event e ON e.event_id = g.event_id
JOIN structure s ON s.structure_id = g.structure_id
JOIN structure_payout sp ON sp.structure_id = g.structure_id
AND sp.position = r.position
WHERE p.player_id = 17
GROUP BY e.event_id
WINDOW w AS (ORDER BY e.date, e.event_id)
ORDER BY e.date, e.event_id;
在以下表达式中:sum(sum(sp.payout)) OVER w
,外部sum()
是窗口函数,内部sum()
是聚合函数.
In this expression: sum(sum(sp.payout)) OVER w
, the outer sum()
is a window function, the inner sum()
is an aggregate function.
假定p.player_id
和e.event_id
在各自的表中为PRIMARY KEY
.
Assuming p.player_id
and e.event_id
are PRIMARY KEY
in their respective tables.
我在WINDOW
子句的ORDER BY
中添加了e.event_id
以得出确定的排序顺序. (同一日期可能有多个事件.)结果中还包含event_id
,以区分每天的多个事件.
I added e.event_id
to the ORDER BY
of the WINDOW
clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included event_id
in the result to distinguish multiple events per day.
虽然查询限制为单个播放器(WHERE p.player_id = 17
),但我们不需要在GROUP BY
和ORDER BY
中添加p.name
或p.player_id
.如果联接之一会使行过多地相乘,则结果总和将是不正确的(部分或完全相乘).然后,按p.name
分组无法修复查询.
While the query restricts to a single player (WHERE p.player_id = 17
), we don't need to add p.name
or p.player_id
to GROUP BY
and ORDER BY
. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by p.name
could not repair the query then.
我还从GROUP BY
子句中删除了e.date
.主键e.event_id
覆盖输入行自PostgreSQL 9.1起.
I also removed e.date
from the GROUP BY
clause. The primary key e.event_id
covers all columns of the input row since PostgreSQL 9.1.
如果 ,您将查询更改为一次返回多个玩家,请调整:
If you change the query to return multiple players at once, adapt:
...
WHERE p.player_id < 17 -- example - multiple players
GROUP BY p.name, p.player_id, e.date, e.event_id -- e.date and p.name redundant
WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id)
ORDER BY p.name, p.player_id, e.date, e.event_id;
除非p.name
被player_id
定义为唯一(?),组和顺序,否则才能以确定的排序顺序获得正确的结果.
Unless p.name
is defined unique (?), group and order by player_id
additionally to get correct results in a deterministic sort order.
我只在所有子句中将e.date
和p.name
保留在GROUP BY
中,以具有相同的排序顺序,以期希望获得性能上的好处.否则,您可以在那里删除列. (与第一个查询中的e.date
类似.)
I only kept e.date
and p.name
in GROUP BY
to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just e.date
in the first query.)
这篇关于Postgres窗口函数和按异常分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!