SQL Server-具有关联的条件聚合 [英] SQL Server - conditional aggregation with correlation

查看:108
本文介绍了SQL Server-具有关联的条件聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:



原始案例非常简单。计算从最高收入到最低收入的每位用户总运行量:

 创建表t(客户整数非空主键
, 用户 VARCHAR(5)非空
,收入整数不为空);

插入t(Customer, User,Revenue)值
(001,'James',500),(002,'James',750),(003,'James ',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah', 600),(008,'James',150),(009,'James',100);

查询:

  SELECT *,
1.0 *收入/总和(收入)超过(按用户划分)AS百分比,
1.0 *总和(收入)超过(按(用户)划分)ORDER BY收入DESC)
/总和(收入)超过(按用户划分的部分)AS running_percentage
FROM t;

LiveDemo



输出:

 ╔════╦═══════╦═════════╦══ ══════════╦════════════════════╗
║ID║用户║收入║百分比║running_percentage║$ b $ ╠════╬═══════╬═════════╬════════════╬════════════ ════════╣
║2║詹姆斯║750║0.38║0.38║
║1║詹姆斯║500║0.26║0.64║
║3║詹姆斯║450║ 0.23║0.87║
║8║James║150║0.08║0.95║
║9║James║100║0.05║1║
║7║Sarah║600║0.44║0.44║
║5║Sarah║500║0.37║0.81║
║6║Sarah║150║0.11║0.93║
║4║Sarah║100║0.07║1║
╚═ ═══牛皮═══════牛皮═ b牛皮════════════牛皮════════════════════╝

可以使用特定的窗口函数来进行不同的计算。






现在让我们假设我们不能使用窗口化的 SUM 并重写它:

  SELECT c。客户,c。用户,c。收入 
,1.0 *收入/ NULLIF(c3.s,0)AS百分比
,1.0 * c2.s / NULLIF(c3.s,0)AS running_percentage
从tc
交叉应用
(SELECT c。用户 = c2。用户
AND c2。收入> = c。收入)AS c2
从t c2
的SUM(Revenue)AS s
交叉应用
(选择SUM(Revenue)AS s
,从t c2
c。 User = c2。 User)作为c3
ORDER BY User,收入DESC;

LiveDemo



我有使用交叉应用是因为我不喜欢 SELECT 列列表和 c3 中的相关子查询code>使用两次。



一切正常。但是当我们仔细观察时, c2 c3 非常相似。那么,为什么不合并它们并使用简单的条件聚合呢?

  SELECT c.Customer,c。 User,c。 Revenue  
,1.0 *收入/ NULLIF(c2.sum_total,0)AS百分比
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0)AS running_percentage
FROM tc
交叉应用
(选择总和(收入)AS sum_total,
总和(以c2.Revenue> = c.Revenue THESE Revenue ELSE 0 END为例)
AS sum_running
FROM t c2
在哪里c。用户 = c2。用户)AS c2
按用户订购,收入DESC;

不幸的是,这是不可能的。


在包含外部引用的聚合表达式中指定了多个列。如果要聚合的表达式包含外部引用,则该外部引用必须是表达式中引用的唯一列。


可能会绕过它与另一个子查询的包装,但是会变得有些丑陋:

  SELECT c.Customer,c。用户,c。收入 
,1.0 *收入/ NULLIF(c2.sum_total,0)AS百分比
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0)AS running_percentage
FROM tc
交叉应用
(SELECT SUM(Revenue)AS sum_total,
SUM(running_revenue)AS sum_running
FROM(SELECT Revenue,
在c2.Revenue的情况下> = c。收入然后其他收入0结束
AS running_revenue
从t c2
在哪里c。用户 = c2。用户)AS sub
)AS c2
按用户订购,收入DESC

LiveDemo






PostgreSQL 版本。唯一的区别是 LATERAL 而不是 CROSS APPLY

  SELECT c。客户,c。用户,c。收入
,1.0 *收入/ NULLIF(c2.sum_total,0)AS百分比
,1.0 * c2.running_sum / NULLIF(c2.sum_total,0)AS running_percentage
从tc
,横向(SELECT SUM(Revenue)AS sum_total,
SUM(在c2.Revenue> = c的情况下。收入然后c2。收入其他0结束)
AS running_sum
从t c2
在哪里c。用户 = c2。用户)c2
按用户订购,收益DESC;

SqlFiddleDemo



它非常好用。






SQLite / MySQL 版本(这就是为什么我更喜欢横向/交叉应用)的原因:

 选择c.Customer,c。 User,c.Revenue,
1.0 *收入/(SELECT SUM(Revenue)
FROM t c2
W. c。用户 = c2。用户)AS百分比,
1.0 *(选择总和(在c2.Revenue> = c.Revenue THEN c2.Revenue ELSE 0 END的情况下) )
从t c2
在哪里c。用户 = c2。用户)/
(SELECT SUM(c2.Revenue)
从t c2
在哪里c。用户 = c2。用户)AS running_percentage
从tc
按用户订购,收益DESC;

SQLFiddleDemo-SQLite < a href = http://sqlfiddle.com/#!9/c85e83/5/0 rel = nofollow noreferrer> SQLFiddleDemo-MySQL






我已阅读具有外部引用的聚合


限制在 SQL-92 标准中,并且 SQL Server Sybase继承代码库。问题是SQL Server需要弄清楚哪个查询将计算聚合。


我不搜索显示如何规避它。



问题是:


  1. 标准的哪个部分不允许或干扰它?

  2. 为什么其他RDBMS对此外部依赖项没有问题?

  3. 它们扩展 SQL Standard SQL Server 的行为或 SQL Server 不能完全实现(正确吗?)?。

我非常感谢您引用:




  • ISO标准(92或更高版本)

  • SQL Server标准支持

  • 任何解释它的RDBMS的官方文档( SQL Server / Postgresql / Oracle /...)。



编辑:



我知道 SQL-92 没有 LATERAL 的概念。但是带有子查询的版本(例如在 SQLite / MySQL 中)也不起作用。



LiveDemo p>

编辑2:



为了简化一点,让我们仅检查相关子查询:

  SELECT c.Customer,c。 User,c.Revenue,
1.0 *(SELECT SUM(在c2.Revenue> = c.Revenue THEN c2.Revenue ELSE 0 END的情况下c。 User = c2。 User)
/(SELECT SUM(c2.Revenue)
FROM t c2
c。 User = c2。 User)AS running_percentage
FROM tc
ORDER BY User,收益DESC ;

以上版本在 MySQL / SQLite / Postgresql



SQL Server 中,我们会收到错误消息。将其与子查询一起包装以将其展平到一个级别后,它就会起作用:

  SELECT c.Customer,c。用户 ,c.Revenue,
1.0 *(
SELECT SUM(r1> = r2然后r1 ELSE 0 END的情况)
FROM(SELECT c2.Revenue AS r1,c.Revenue r2
从t c2
在哪里c。用户 = c2。用户)AS S)/
(SELECT SUM(c2.Revenue)
从t c2
c。用户 = c2。用户)AS running_percentage
从tc
按用户订购,收入DESC;

这个问题的重点是 SQL标准调节它。



LiveDemo

解决方案

LATERAL 的SQL标准。 CROSS APPLY 是Microsoft的特定于供应商的扩展(Oracle后来为了兼容性而采用了该扩展),并且由于MS功能,其限制显然不是由于ISO / IEC SQL标准引起的早于标准。



LATERAL 基本上只是联接的修饰符,允许联接树中的横向引用。可以引用的列数没有限制。



我看不出出现奇数限制的原因。也许是因为 CROSS APPLY 最初旨在允许使用表值函数,后来又扩展为允许子 SELECT s。



Postgres 手册解释 LATERAL 像这样:


LATERAL 关键字可以在子< SELECT FROM 项之前。这允许
SELECT 引用出现在$ b $之前的 FROM 项的列b在 FROM 列表中。 (在没有 LATERAL 的情况下,每个子< SELECT 都会被独立评估
,因此不能交叉引用任何其他 FROM 项目。)


查询的Postgres版本(没有更优雅的样式)窗口函数)可以更简单:

  SELECT c。* 
,round(revenue / c2.sum_total,2) END AS百分比
,round(c2.running_sum / c2.sum_total,2)END AS running_percentage
从tc,横向(
SELECT NULLIF(SUM(revenue),0):: numeric AS sum_total-NULLIF,强制转换一次
,SUM(收入)过滤器(收入> c =收入)AS running_sum
FROM t
WHERE User = c。 User
)c2
订单由c。用户,c.revenue DESC;




  • Postgres 9.4+具有更优雅的总体 FILTER 表示条件聚合。


  • NULLIF 很有意义,我只建议稍作简化。


  • sum_total 转换为数字一次。


  • 舍入结果与您想要的结果相匹配。



Background:

The original case was very simple. Calculate running total per user from highest revenue to lowest:

CREATE TABLE t(Customer INTEGER  NOT NULL PRIMARY KEY 
              ,"User"   VARCHAR(5) NOT NULL
              ,Revenue  INTEGER  NOT NULL);

INSERT INTO t(Customer,"User",Revenue) VALUES
(001,'James',500),(002,'James',750),(003,'James',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah',600),(008,'James',150),(009,'James',100);

Query:

SELECT *,
    1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,
    1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)
         /SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage
FROM t;

LiveDemo

Output:

╔════╦═══════╦═════════╦════════════╦════════════════════╗
║ ID ║ User  ║ Revenue ║ percentage ║ running_percentage ║
╠════╬═══════╬═════════╬════════════╬════════════════════╣
║  2 ║ James ║     750 ║ 0.38       ║ 0.38               ║
║  1 ║ James ║     500 ║ 0.26       ║ 0.64               ║
║  3 ║ James ║     450 ║ 0.23       ║ 0.87               ║
║  8 ║ James ║     150 ║ 0.08       ║ 0.95               ║
║  9 ║ James ║     100 ║ 0.05       ║ 1                  ║
║  7 ║ Sarah ║     600 ║ 0.44       ║ 0.44               ║
║  5 ║ Sarah ║     500 ║ 0.37       ║ 0.81               ║
║  6 ║ Sarah ║     150 ║ 0.11       ║ 0.93               ║
║  4 ║ Sarah ║     100 ║ 0.07       ║ 1                  ║
╚════╩═══════╩═════════╩════════════╩════════════════════╝

It could be calculated differently using specific windowed functions.


Now let's assume that we cannot use windowed SUM and rewrite it:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue / NULLIF(c3.s,0) AS percentage
    ,1.0 * c2.s    / NULLIF(c3.s,0) AS running_percentage
FROM t c
CROSS APPLY
        (SELECT SUM(Revenue) AS s
        FROM t c2
        WHERE c."User" = c2."User"
            AND c2.Revenue >= c.Revenue) AS c2
CROSS APPLY
        (SELECT SUM(Revenue) AS s
        FROM t c2
        WHERE c."User" = c2."User") AS c3
ORDER BY "User", Revenue DESC;

LiveDemo

I have used CROSS APPLY because I do not like correlated subqueries in SELECT colums list and c3 is used twice.

Everything work as it should. But when we look closer c2 and c3 are very similiar. So why not combine them and use simple conditional aggregation:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage
    ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
        (SELECT SUM(Revenue) AS sum_total,
                SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END) 
                AS sum_running
        FROM t c2
        WHERE c."User" = c2."User") AS c2
ORDER BY "User", Revenue DESC;

Unfortunately it is not possible.

Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

Of course I could circumvent it wrapping with another subquery, but it becomes a bit "ugly":

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage
    ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
(   SELECT SUM(Revenue) AS sum_total,
           SUM(running_revenue) AS sum_running
     FROM (SELECT Revenue,
                  CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END 
                  AS running_revenue
           FROM t c2
           WHERE c."User" = c2."User") AS sub
) AS c2
ORDER BY "User", Revenue DESC

LiveDemo


Postgresql version. The only difference is LATERAL instead of CROSS APPLY.

SELECT c.Customer, c."User", c.Revenue
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage 
    ,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage 
FROM t c
,LATERAL (SELECT SUM(Revenue) AS sum_total,
                 SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END) 
                 AS running_sum
        FROM t c2
        WHERE c."User" = c2."User") c2
ORDER BY "User", Revenue DESC;

SqlFiddleDemo

It works very nice.


SQLite/MySQL version (that is why I preferLATERAL/CROSS APPLY):

SELECT c.Customer, c."User", c.Revenue,
    1.0 * Revenue / (SELECT SUM(Revenue)
                     FROM t c2
                     WHERE c."User" = c2."User") AS percentage,
    1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
           FROM t c2
          WHERE c."User" = c2."User")  / 
          (SELECT SUM(c2.Revenue)
           FROM t c2
           WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

SQLFiddleDemo-SQLite SQLFiddleDemo-MySQL


I've read Aggregates with an Outer Reference:

The source for the restriction is in the SQL-92 standard, and SQL Server inherited it from the Sybase codebase. The problem is that SQL Server needs to figure out which query will compute the aggregate.

I do not search for answers that only show how to circumvent it.

The questions are:

  1. Which part of standard disallow or interfere with it?
  2. Why other RDBMSes do not have problem with this kind of outer dependency?
  3. Do they extend SQL Standard and SQL Server behaves as it should or SQL Server does not implement it fully(correctly?)?.

I would be very grateful for references to:

  • ISO standard (92 or newer)
  • SQL Server Standards Support
  • official documenation from any RDBMS that explains it (SQL Server/Postgresql/Oracle/...).

EDIT:

I know that SQL-92 does not have concept of LATERAL. But version with subqueries (like in SQLite/MySQL) does not work too.

LiveDemo

EDIT 2:

To simplify it a bit, let's check only correlated subquery only:

SELECT c.Customer, c."User", c.Revenue,
       1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
              FROM t c2
              WHERE c."User" = c2."User") 
       / (SELECT SUM(c2.Revenue)
          FROM t c2
          WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

The version above works fine in MySQL/SQLite/Postgresql.

In SQL Server we get error. After wraping it with subquery to "flatten" it to one level it works:

SELECT c.Customer, c."User", c.Revenue,
      1.0 * (
              SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)
              FROM (SELECT c2.Revenue AS r1, c.Revenue r2
                    FROM t c2
                    WHERE c."User" = c2."User") AS S)  / 
             (SELECT SUM(c2.Revenue)
              FROM t c2
              WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

The point of this question is how does SQL standard regulate it.

LiveDemo

解决方案

There is no such limitation in the SQL standard for LATERAL. CROSS APPLY is a vendor-specific extension from Microsoft (Oracle adopted it later for compatibility) and its limitations are obviously not owed to the ISO/IEC SQL standard, since the MS feature pre-dates the standard.

LATERAL in standard SQL is basically just a modifier for joins to allow lateral references in the join tree. There is no limit to the number of columns that can be referenced.

I wouldn't see a reason for the odd restriction to begin with. Maybe it's because CROSS APPLY was originally intended to allow table-valued functions, which was later extended to allow sub-SELECTs.

The Postgres manual explains LATERAL like this:

The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROM item.)

The Postgres version of your query (without the more elegant window functions) can be simpler:

SELECT c.*
     , round(revenue        / c2.sum_total, 2) END AS percentage 
     , round(c2.running_sum / c2.sum_total, 2) END AS running_percentage 
FROM   t c, LATERAL (
   SELECT NULLIF(SUM(revenue), 0)::numeric AS sum_total  -- NULLIF, cast once
        , SUM(revenue) FILTER (WHERE revenue >= c.revenue) AS running_sum
   FROM   t
   WHERE  "User" = c."User"
   ) c2
ORDER  BY c."User", c.revenue DESC;

  • Postgres 9.4+ has the more elegant aggregate FILTER for conditional aggregates.

  • NULLIF makes sense, I only suggest a minor simplification.

  • Cast sum_total to numeric once.

  • Round result to match your desired result.

这篇关于SQL Server-具有关联的条件聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆