Postgres 窗口函数和按异常分组 [英] Postgres window function and group by exception

查看:17
本文介绍了Postgres 窗口函数和按异常分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试整理一个查询,该查询将在一段时间内检索用户的统计数据(利润/亏损)作为累积结果.

I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time.

这是我目前的查询:

SELECT p.name, e.date, 
    sum(sp.payout) OVER (ORDER BY e.date)
    - sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss" 
FROM result r 
    JOIN game g ON r.game_id = g.game_id 
    JOIN event e ON g.event_id = e.event_id 
    JOIN structure s ON g.structure_id = s.structure_id 
    JOIN structure_payout sp ON g.structure_id = sp.structure_id
                            AND r.position = sp.position 
    JOIN player p ON r.player_id = p.player_id 
WHERE p.player_id = 17 
GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin
ORDER BY p.name, e.date ASC

查询将运行.但是,结果略有错误.原因是一个 event 可以有多个游戏(具有不同的 sp.payouts).因此,如果用户在具有不同支出的事件中有 2 个结果(即每个事件有 4 场比赛,并且用户从一场比赛中获得 20 英镑,从另一个获得 40 英镑),则上述内容会出现多行.

The query will run. However, the result is slightly incorrect. The reason is that an event can have multiple games (with different sp.payouts). Therefore, the above comes out with multiple rows if a user has 2 results in an event with different payouts (i.e. there are 4 games per event, and a user gets £20 from one, and £40 from another).

显而易见的解决方案是将 GROUP BY 修改为:

The obvious solution would be to amend the GROUP BY to:

GROUP BY p.name, e.date, e.event_id

然而,Postgres 对此表示不满,因为它似乎没有认识到 sp.payouts.buyin 位于聚合函数内.我收到错误:

However, Postgres complains at this as it doesn't appear to be recognizing that sp.payout and s.buyin are inside an aggregate function. I get the error:

列sp.payout"必须出现在 GROUP BY 子句中或用于聚合函数

column "sp.payout" must appear in the GROUP BY clause or be used in an aggregate function

我在 Ubuntu Linux 服务器上运行 9.1.
我是否遗漏了什么,或者这可能是 Postgres 中的真正缺陷?

I'm running 9.1 on Ubuntu Linux server.
Am I missing something, or could this be a genuine defect in Postgres?

推荐答案

不是,事实上,使用聚合函数.您正在使用窗口函数.这就是为什么 PostgreSQL 要求 sp.payouts.buyin 包含在 GROUP BY 子句中的原因.

You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands sp.payout and s.buyin to be included in the GROUP BY clause.

通过附加一个 OVER 子句,聚合函数 sum() 变成了一个窗口函数,它聚合每个分区的值,同时保持所有行.

By appending an OVER clause, the aggregate function sum() is turned into a window function, which aggregates values per partition while keeping all rows.

您可以组合窗口函数和聚合函数.首先应用聚合.从您的描述中,我不明白您希望如何处理每个事件的多次支付/买入.作为猜测,我计算了每个事件的总和.现在我可以从 GROUP BY 子句中删除 sp.payouts.buyin 并且每个 <代码>玩家和<代码>事件:

You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove sp.payout and s.buyin from the GROUP BY clause and get one row per player and event:

SELECT p.name
     , e.event_id
     , e.date
     , sum(sum(sp.payout)) OVER w
     - sum(sum(s.buyin  )) OVER w AS "Profit/Loss" 
FROM   player            p
JOIN   result            r ON r.player_id     = p.player_id  
JOIN   game              g ON g.game_id       = r.game_id 
JOIN   event             e ON e.event_id      = g.event_id 
JOIN   structure         s ON s.structure_id  = g.structure_id 
JOIN   structure_payout sp ON sp.structure_id = g.structure_id
                          AND sp.position     = r.position
WHERE  p.player_id = 17 
GROUP  BY e.event_id
WINDOW w AS (ORDER BY e.date, e.event_id)
ORDER  BY e.date, e.event_id;

在这个表达式中:sum(sum(sp.payout)) OVER w,外面的sum()是一个窗口函数,里面的sum() 是一个聚合函数.

In this expression: sum(sum(sp.payout)) OVER w, the outer sum() is a window function, the inner sum() is an aggregate function.

假设 p.player_ide.event_id 在它们各自的表中是 PRIMARY KEY.

Assuming p.player_id and e.event_id are PRIMARY KEY in their respective tables.

我将 e.event_id 添加到 WINDOW 子句的 ORDER BY 以达到确定性的排序顺序.(同一日期可能有多个事件.)还在结果中包含 event_id 以区分每天多个事件.

I added e.event_id to the ORDER BY of the WINDOW clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included event_id in the result to distinguish multiple events per day.

虽然查询限制为单个玩家(WHERE p.player_id = 17),但我们不需要添加p.namep.player_idGROUP BYORDER BY.如果连接之一会过度地乘以行,则结果总和将不正确(部分或完全相乘).按 p.name 分组无法修复查询.

While the query restricts to a single player (WHERE p.player_id = 17), we don't need to add p.name or p.player_id to GROUP BY and ORDER BY. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by p.name could not repair the query then.

我还从 GROUP BY 子句中删除了 e.date.主键 e.event_id 覆盖输入行 自 PostgreSQL 9.1 起.

I also removed e.date from the GROUP BY clause. The primary key e.event_id covers all columns of the input row since PostgreSQL 9.1.

如果您将查询更改为一次返回多个玩家,请进行调整:

If you change the query to return multiple players at once, adapt:

...
WHERE  p.player_id < 17  -- example - multiple players
GROUP  BY p.name, p.player_id, e.date, e.event_id  -- e.date and p.name redundant
WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id)
ORDER  BY p.name, p.player_id, e.date, e.event_id;

除非 p.name 被定义为唯一的 (?),否则按 player_id 分组和排序以获得确定性排序顺序的正确结果.

Unless p.name is defined unique (?), group and order by player_id additionally to get correct results in a deterministic sort order.

我只在GROUP BY中保留了e.datep.name,以便在所有子句中具有相同的排序顺序,希望有性能益处.否则,您可以删除那里的列.(类似于第一个查询中的 e.date.)

I only kept e.date and p.name in GROUP BY to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just e.date in the first query.)

这篇关于Postgres 窗口函数和按异常分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆