子查询或 leftjoin with group by 哪个更快? [英] subquery or leftjoin with group by which one is faster?

查看:47
本文介绍了子查询或 leftjoin with group by 哪个更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在我的应用程序中显示总列的运行总和......所以我已使用以下查询来查找运行总数...我发现两者都按我的需要工作.在一个中,我使用了左联接和 group by,在另一个中,我使用了子查询.

i have to show running total with the total column in my application ... so i have used the following queries for finding the running total... and i find that both are working as per my need . in one i used the left join with group by and in another one i used the sub query .

现在我的问题是,当我的数据每天增长数千时,哪一种更快,如果数据将限制在 1000 或 2000 行范围内,那么哪一种更好……以及任何其他比这些更快的方法两个????

and now my question is which one is faster when my data grow in thousands daily and if data will be in limit of 1000 or 2000 rows then which one is better ... and any other method by which is more faster then these two ????

declare @tmp table(ind int identity(1,1),col1 int)
insert into @tmp
select 2
union
select 4
union
select 7
union 

select 5
union
select 8
union 
select 10



SELECT t1.col1,sum( t2.col1)
FROM @tmp AS t1 LEFT JOIN @tmp t2 ON t1.ind>=t2.ind
group by t1.ind,t1.col1


select t1.col1,(select sum(col1) from  @tmp as t2 where t2.ind<=t1.ind)
from @tmp as t1

推荐答案

在 SQL Server 中计算运行总计的一个很好的资源是 这个文档 由 Itzik Ben Gan 提交,作为他的活动的一部分提交给 SQL Server 团队,以使 OVER 子句从其最初的 SQL Server 2005 实现进一步扩展.在其中他展示了一旦​​进入数万行游标如何执行基于集合的解决方案.SQL Server 2012 确实扩展了 OVER 子句,使这种查询更容易.

A great resource on calculating running totals in SQL Server is this document by Itzik Ben Gan that was submitted to the SQL Server Team as part of his campaign to have the OVER clause extended further from its initial SQL Server 2005 implementation. In it he shows how once you get into tens of thousands of rows cursors out perform set based solutions. SQL Server 2012 did indeed extend the OVER clause making this sort of query much easier.

SELECT col1,
       SUM(col1) OVER (ORDER BY ind ROWS UNBOUNDED PRECEDING)
FROM   @tmp 

因为您使用的是 SQL Server 2005,但是您无法使用它.

As you are on SQL Server 2005 however this is not available to you.

Adam Machanic 在这里展示如何使用 CLR 来提高标准 TSQL 游标的性能.

Adam Machanic shows here how the CLR can be used to improve on the performance of standard TSQL cursors.

对于这个表定义

CREATE TABLE RunningTotals
(
ind int identity(1,1) primary key,
col1 int
)

我在 ALLOW_SNAPSHOT_ISOLATION ON 的数据库中创建了包含 2,000 行和 10,000 行的表,其中一个关闭了此设置(原因是因为我的初始结果是在一个设置了该设置的数据库中导致结果令人费解).

I create tables with both 2,000 and 10,000 rows in a database with ALLOW_SNAPSHOT_ISOLATION ON and one with this setting off (The reason for this is because my initial results were in a DB with the setting on that led to a puzzling aspect of the results).

所有表的聚集索引只有 1 个根页.每个叶子页面的数量如下所示.

The clustered indexes for all tables just had 1 root page. The number of leaf pages for each is shown below.

+-------------------------------+-----------+------------+
|                               | 2,000 row | 10,000 row |
+-------------------------------+-----------+------------+
| ALLOW_SNAPSHOT_ISOLATION OFF  |         5 |         22 |
| ALLOW_SNAPSHOT_ISOLATION ON   |         8 |         39 |
+-------------------------------+-----------+------------+

我测试了以下案例(链接显示执行计划)

I tested the following cases (Links show execution plans)

  1. 左加入和分组方式
  2. 相关子查询2000 行计划,10000行计划
  3. 来自 Mikael(更新)答案的 CTE
  4. CTE 下面

包含附加 CTE 选项的原因是为了提供一个 CTE 解决方案,如果 ind 列不能保证连续.

The reason for inclusion of the additional CTE option was in order to provide a CTE solution that would still work if the ind column was not guaranteed sequential.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @col1 int, @sumcol1 bigint;

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 ind, col1, CAST(col1 AS BIGINT) AS Total
        FROM RunningTotals
        ORDER BY ind
        UNION   ALL
        SELECT  R.ind, R.col1, R.Total
        FROM    (
                SELECT  T.*,
                        T.col1 + Total AS Total,
                        rn = ROW_NUMBER() OVER (ORDER BY T.ind)
                FROM    RunningTotals T
                JOIN    RecursiveCTE R
                        ON  R.ind < T.ind
                ) R
        WHERE   R.rn = 1
        )
SELECT  @col1 =col1, @sumcol1=Total
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

所有查询都添加了 CAST(col1 AS BIGINT) 以避免在运行时出现溢出错误.此外,对于所有这些,我将结果分配给上述变量,以消除将结果发回考虑所花费的时间.

All of the queries had a CAST(col1 AS BIGINT) added in order to avoid overflow errors at runtime. Additionally for all of them I assigned the results to variables as above in order to eliminate time spent sending back results from consideration.

+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  |          |        |          Base Table        |         Work Table         |     Time        |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  | Snapshot | Rows   | Scan count | logical reads | Scan count | logical reads | cpu   | elapsed |
| Group By         | On       | 2,000  | 2001       | 12709         |            |               | 1469  | 1250    |
|                  | On       | 10,000 | 10001      | 216678        |            |               | 30906 | 30963   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 1140  | 1160    |
|                  | Off      | 10,000 | 10001      | 130089        |            |               | 29906 | 28306   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| Sub Query        | On       | 2,000  | 2001       | 12709         |            |               | 844   | 823     |
|                  | On       | 10,000 | 2          | 82            | 10000      | 165025        | 24672 | 24535   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 766   | 999     |
|                  | Off      | 10,000 | 2          | 48            | 10000      | 165025        | 25188 | 23880   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE No Gaps      | On       | 2,000  | 0          | 4002          | 2          | 12001         | 78    | 101     |
|                  | On       | 10,000 | 0          | 20002         | 2          | 60001         | 344   | 342     |
|                  | Off      | 2,000  | 0          | 4002          | 2          | 12001         | 62    | 253     |
|                  | Off      | 10,000 | 0          | 20002         | 2          | 60001         | 281   | 326     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE Alllows Gaps | On       | 2,000  | 2001       | 4009          | 2          | 12001         | 47    | 75      |
|                  | On       | 10,000 | 10001      | 20040         | 2          | 60001         | 312   | 413     |
|                  | Off      | 2,000  | 2001       | 4006          | 2          | 12001         | 94    | 90      |
|                  | Off      | 10,000 | 10001      | 20023         | 2          | 60001         | 313   | 349     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+

相关子查询和 GROUP BY 版本都使用由 RunningTotals 表(T1) 并且对于该扫描返回的每一行,在 T2.ind<=T1.ind 上搜索表 (T2) 自加入.

Both the correlated subquery and the GROUP BY version use "triangular" nested loop joins driven by a clustered index scan on the RunningTotals table (T1) and, for each row returned by that scan, seeking back into the table (T2) self joining on T2.ind<=T1.ind.

这意味着重复处理相同的行.当处理 T1.ind=1000 行时,自联接检索并求和具有 ind <= 1000 的所有行,然后对于 T1 的下一行.ind=1001再次检索相同的 1000 行,并与另外一行相加,以此类推.

This means that the same rows get processed repeatedly. When the T1.ind=1000 row is processed the self join retrieves and sums all rows with an ind <= 1000, then for the next row where T1.ind=1001 the same 1000 rows are retrieved again and summed along with one additional row and so on.

对于 2,000 行的表,此类操作的总数为 2,001,000,对于 10k 行,通常为 50,005,000 或更多(n² + n)/2 明显呈指数增长.

The total number of such operations for a 2,000 row table is 2,001,000, for 10k rows 50,005,000 or more generally (n² + n) / 2 which clearly grows exponentially.

在 2,000 行的情况下,GROUP BY 和子查询版本之间的主要区别在于前者在连接之后有流聚合,因此有三列馈入其中(T1.indT2.col1T2.col1) 和 T1.indGROUP BY 属性code> 而后者被计算为一个标量聚合,在连接之前的流聚合,只有 T2.col1 馈入它并且没有 GROUP BY 属性设置为全部.可以看出,这种更简单的安排在减少 CPU 时间方面具有明显的好处.

In the 2,000 row case the main difference between the GROUP BY and the subquery versions is that the former has the stream aggregate after the join and so has three columns feeding into it (T1.ind, T2.col1, T2.col1) and a GROUP BY property of T1.ind whereas the latter is calculated as a scalar aggregate, with the stream aggregate before the join, only has T2.col1 feeding into it and has no GROUP BY property set at all. This simpler arrangement can be seen to have a measurable benefit in terms of reduced CPU time.

对于 10,000 行的情况,子查询计划存在额外差异.它添加了一个 eager spool 复制所有 ind,cast(col1 as bigint) 值到 tempdb.在快照隔离的情况下,它比聚集索引结构更紧凑,最终效果是减少了大约 25% 的读取次数(因为基表为版本信息保留了相当多的空白空间),当此选项关闭时,它会变得不那么紧凑(可能是由于 bigintint 的差异)和更多的读取结果.这减少了子查询和按版本分组之间的差距,但子查询仍然获胜.

For the 10,000 row case there is an additional difference in the sub query plan. It adds an eager spool which copies all the ind,cast(col1 as bigint) values into tempdb. In the case that snapshot isolation is on this works out more compact than the clustered index structure and the net effect is to reduce the number of reads by about 25% (as the base table preserves quite a lot of empty space for versioning info), when this option is off it works out less compact (presumably due to the bigint vs int difference) and more reads result. This reduces the gap between the sub query and group by versions but the sub query still wins.

然而,明显的赢家是递归 CTE.对于无间隙"版本,从基表读取的逻辑现在是 2 x (n + 1) 反映 n 索引搜索到 2 级索引以检索所有行加上末尾的附加行,不返回任何内容并终止递归.然而,这仍然意味着要处理 22 页表需要 20,002 次读取!

The clear winner however was the Recursive CTE. For the "no gaps" version logical reads from the base table are now 2 x (n + 1) reflecting the n index seeks into the 2 level index to retrieve all of the rows plus the additional one at the end that returns nothing and terminates the recursion. That still meant 20,002 reads to process a 22 page table however!

递归 CTE 版本的逻辑工作表读取非常高.每个源行似乎可以读取 6 个工作表.这些来自存储前一行输出的索引假脱机,然后在下一次迭代中再次读取(Umachandar Jayachandran 对此进行了很好的解释 这里).尽管数量众多,但它仍然是表现最好的.

Logical work table reads for the recursive CTE version are very high. It seems to work out at 6 worktable reads per source row. These come from the index spool that stores the output of the previous row then is read from again in the next iteration (good explanation of this by Umachandar Jayachandran here). Despite the high number this is still the best performer.

这篇关于子查询或 leftjoin with group by 哪个更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆