如何使用SQL窗口函数计算合计百分比 [英] How to use a SQL window function to calculate a percentage of an aggregate

查看:441
本文介绍了如何使用SQL窗口函数计算合计百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算表格中各个维度的百分比。我想通过使用窗口函数来计算分母来简化事情,但是我遇到了一个问题,因为分子也必须是一个聚合。



举个简单的例子,使用下表:

 创建临时表测试(d1文本,d2文本,v数字); 
插入测试值('a','x',5),('a','y',5),('a','y',10),('b',' x',20);

如果我只想计算d1中每一行的份额,那么窗口函数就可以正常工作:

 从测试中选择d1,d2,v / sum(v)超过(除以d1) 

b; x; 1.00
a; x; 0.25
a; y; 0.25
a ; y; 0.50

但是,我需要做的是计算总和d1中的d2。我正在寻找的输出是这样的:

  b; x; 1.00 
a; x; 0.25
a; y; 0.75

所以我尝试

 选择d1,d2,sum(v)/ sum(v)超过(除以d1)
从测试
组到d1,d2;

但是,现在我得到一个错误:

 错误: test.v列必须出现在GROUP BY子句中或在聚合函数


我假设这是因为它抱怨window函数未在grouping子句中说明,但是windowing函数无论如何都不能放在grouping子句中。



这使用的是Greenplum 4.1,它是Postgresql 8.4的一个分支,并具有相同的窗口功能。请注意,Greenplum无法执行相关的子查询。

解决方案

我认为您真正要寻找的是:

 选择d1,d2,sum(v)/ sum(sum(v))超过(按d1分配)AS股份
FROM test
GROUP BY d1,d2;

产生请求的结果。



Window函数在集合函数之后 被应用。 sum(sum(v))中的外部 sum()在此示例中是窗口函数,并附加到 OVER ... 子句,而内部 sum()是一个聚合。



与以下内容完全相同:

 有x AS(
选择d1,d2, sum(v)AS sv
FROM test
GROUP BY d1,d2

选择d1,d2,sv / sum(sv)OVER(PARTITION BY d1)AS share
从x;

或(无CTE):

 选择d1,d2,sv / sum(sv)超过(按d1划分)AS共享
FROM(
选择d1,d2,sum(v)AS sv
FROM test
GROUP BY d1,d2
)x;

或者@Mu的变体。



在旁边:Greenplum引入了与4.2版本相关的子查询。 请参阅发行说明。


I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well.

As a simple example, take the following table:

create temp table test (d1 text, d2 text, v numeric);
insert into test values ('a','x',5), ('a','y',5), ('a','y',10), ('b','x',20);

If I just want to calculate the share of each individual row out of d1, then windowing functions work fine:

select d1, d2, v/sum(v) over (partition by d1)
from test;

"b";"x";1.00
"a";"x";0.25
"a";"y";0.25
"a";"y";0.50

However, what I need to do is calculate the overall share for the sum of d2 out of d1. The output I am looking for is this:

"b";"x";1.00
"a";"x";0.25
"a";"y";0.75

So I try this:

select d1, d2, sum(v)/sum(v) over (partition by d1)
from test
group by d1, d2;

However, now I get an error:

ERROR:  column "test.v" must appear in the GROUP BY clause or be used in an aggregate function

I'm assuming this is because it is complaining that the window function is not accounted for in the grouping clause, however windowing functions cannot be put in the grouping clause anyway.

This is using Greenplum 4.1, which is a fork of Postgresql 8.4 and shares the same windowing functions. Note that Greenplum cannot do correlated subqueries.

解决方案

I think what you are actually looking for is this:

SELECT d1, d2, sum(v)/sum(sum(v)) OVER (PARTITION BY d1) AS share
FROM   test
GROUP  BY d1, d2;

Produces the requested result.

Window functions are applied after aggregate functions. The outer sum() in sum(sum(v)) is a window function in this example and is attached to the OVER ... clause, while the inner sum() is an aggregate.

Effectively the same as:

WITH x AS (
    SELECT d1, d2, sum(v) AS sv
    FROM   test
    GROUP  BY d1, d2
    )
SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM   x;

Or (without CTE):

SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM   (
    SELECT d1, d2, sum(v) AS sv
    FROM   test
    GROUP  BY d1, d2
    ) x;

Or @Mu's variant.

Aside: Greenplum introduced correlated subqueries with version 4.2. See release notes.

这篇关于如何使用SQL窗口函数计算合计百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆