子查询或 leftjoin 与组哪个更快? [英] subquery or leftjoin with group by which one is faster?

查看:105
本文介绍了子查询或 leftjoin 与组哪个更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须用我的应用程序中的总列显示运行总计..​​....所以我已使用以下查询来查找运行总数......我发现两者都按我的需要工作.在一个我使用左连接和 group by 而在另一个我使用子查询.

i have to show running total with the total column in my application ... so i have used the following queries for finding the running total... and i find that both are working as per my need . in one i used the left join with group by and in another one i used the sub query .

现在我的问题是,当我的数据每天增长数千时,哪个更快,如果数据限制在 1000 或 2000 行,那么哪个更好......以及任何其他方法比这些更快两个????

and now my question is which one is faster when my data grow in thousands daily and if data will be in limit of 1000 or 2000 rows then which one is better ... and any other method by which is more faster then these two ????

declare @tmp table(ind int identity(1,1),col1 int)
insert into @tmp
select 2
union
select 4
union
select 7
union 

select 5
union
select 8
union 
select 10



SELECT t1.col1,sum( t2.col1)
FROM @tmp AS t1 LEFT JOIN @tmp t2 ON t1.ind>=t2.ind
group by t1.ind,t1.col1


select t1.col1,(select sum(col1) from  @tmp as t2 where t2.ind<=t1.ind)
from @tmp as t1

推荐答案

在 SQL Server 中计算运行总计的一个很好的资源是 本文档 由 Itzik Ben Gan 提交给 SQL Server 团队,作为其活动的一部分,使 OVER 子句从其最初的 SQL Server 2005 实现进一步扩展.在其中,他展示了一旦​​您进入数万行游标,如何执行基于集合的解决方案.SQL Server 2012 确实扩展了 OVER 子句,使此类查询更加容易.

A great resource on calculating running totals in SQL Server is this document by Itzik Ben Gan that was submitted to the SQL Server Team as part of his campaign to have the OVER clause extended further from its initial SQL Server 2005 implementation. In it he shows how once you get into tens of thousands of rows cursors out perform set based solutions. SQL Server 2012 did indeed extend the OVER clause making this sort of query much easier.

SELECT col1,
       SUM(col1) OVER (ORDER BY ind ROWS UNBOUNDED PRECEDING)
FROM   @tmp 

因为您使用的是 SQL Server 2005,但是这对您不可用.

As you are on SQL Server 2005 however this is not available to you.

Adam Machanic 此处显示如何使用 CLR 来改进标准 TSQL 游标的性能.

Adam Machanic shows here how the CLR can be used to improve on the performance of standard TSQL cursors.

对于这个表定义

CREATE TABLE RunningTotals
(
ind int identity(1,1) primary key,
col1 int
)

我在具有 ALLOW_SNAPSHOT_ISOLATION ON 的数据库中创建了包含 2,000 行和 10,000 行的表,其中一个表关闭了此设置(原因是因为我的初始结果是在一个数据库中,设置为导致结果的一个令人费解的方面).

I create tables with both 2,000 and 10,000 rows in a database with ALLOW_SNAPSHOT_ISOLATION ON and one with this setting off (The reason for this is because my initial results were in a DB with the setting on that led to a puzzling aspect of the results).

所有表的聚集索引只有 1 个根页.每个叶页的数量如下所示.

The clustered indexes for all tables just had 1 root page. The number of leaf pages for each is shown below.

+-------------------------------+-----------+------------+
|                               | 2,000 row | 10,000 row |
+-------------------------------+-----------+------------+
| ALLOW_SNAPSHOT_ISOLATION OFF  |         5 |         22 |
| ALLOW_SNAPSHOT_ISOLATION ON   |         8 |         39 |
+-------------------------------+-----------+------------+

我测试了以下案例(链接显示执行计划)

I tested the following cases (Links show execution plans)

  1. 左加入和分组
  2. 相关子查询2000行计划,10000 行计划
  3. 来自 Mikael(更新)答案的 CTE
  4. 下面的 CTE

包含额外 CTE 选项的原因是为了提供一个 CTE 解决方案,如果 ind 列不能保证连续,该解决方案仍然有效.

The reason for inclusion of the additional CTE option was in order to provide a CTE solution that would still work if the ind column was not guaranteed sequential.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @col1 int, @sumcol1 bigint;

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 ind, col1, CAST(col1 AS BIGINT) AS Total
        FROM RunningTotals
        ORDER BY ind
        UNION   ALL
        SELECT  R.ind, R.col1, R.Total
        FROM    (
                SELECT  T.*,
                        T.col1 + Total AS Total,
                        rn = ROW_NUMBER() OVER (ORDER BY T.ind)
                FROM    RunningTotals T
                JOIN    RecursiveCTE R
                        ON  R.ind < T.ind
                ) R
        WHERE   R.rn = 1
        )
SELECT  @col1 =col1, @sumcol1=Total
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

所有查询都添加了 CAST(col1 AS BIGINT) 以避免在运行时出现溢出错误.此外,对于所有这些,我都将结果分配给了上述变量,以消除从考虑发送回结果所花费的时间.

All of the queries had a CAST(col1 AS BIGINT) added in order to avoid overflow errors at runtime. Additionally for all of them I assigned the results to variables as above in order to eliminate time spent sending back results from consideration.

+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  |          |        |          Base Table        |         Work Table         |     Time        |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  | Snapshot | Rows   | Scan count | logical reads | Scan count | logical reads | cpu   | elapsed |
| Group By         | On       | 2,000  | 2001       | 12709         |            |               | 1469  | 1250    |
|                  | On       | 10,000 | 10001      | 216678        |            |               | 30906 | 30963   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 1140  | 1160    |
|                  | Off      | 10,000 | 10001      | 130089        |            |               | 29906 | 28306   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| Sub Query        | On       | 2,000  | 2001       | 12709         |            |               | 844   | 823     |
|                  | On       | 10,000 | 2          | 82            | 10000      | 165025        | 24672 | 24535   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 766   | 999     |
|                  | Off      | 10,000 | 2          | 48            | 10000      | 165025        | 25188 | 23880   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE No Gaps      | On       | 2,000  | 0          | 4002          | 2          | 12001         | 78    | 101     |
|                  | On       | 10,000 | 0          | 20002         | 2          | 60001         | 344   | 342     |
|                  | Off      | 2,000  | 0          | 4002          | 2          | 12001         | 62    | 253     |
|                  | Off      | 10,000 | 0          | 20002         | 2          | 60001         | 281   | 326     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE Alllows Gaps | On       | 2,000  | 2001       | 4009          | 2          | 12001         | 47    | 75      |
|                  | On       | 10,000 | 10001      | 20040         | 2          | 60001         | 312   | 413     |
|                  | Off      | 2,000  | 2001       | 4006          | 2          | 12001         | 94    | 90      |
|                  | Off      | 10,000 | 10001      | 20023         | 2          | 60001         | 313   | 349     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+

相关子查询和GROUP BY 版本都使用三角"嵌套循环连接,由RunningTotals 表(T1code>) 并且,对于该扫描返回的每一行,在 T2.ind<=T1.ind 上自行加入表 (T2).

Both the correlated subquery and the GROUP BY version use "triangular" nested loop joins driven by a clustered index scan on the RunningTotals table (T1) and, for each row returned by that scan, seeking back into the table (T2) self joining on T2.ind<=T1.ind.

这意味着重复处理相同的行.当处理 T1.ind=1000 行时,自连接检索并求和具有 ind <= 1000 的所有行,然后对于 T1 的下一行.ind=1001再次检索相同的 1000 行,并与另外一行相加,依此类推.

This means that the same rows get processed repeatedly. When the T1.ind=1000 row is processed the self join retrieves and sums all rows with an ind <= 1000, then for the next row where T1.ind=1001 the same 1000 rows are retrieved again and summed along with one additional row and so on.

对于 2,000 行的表,此类操作的总数为 2,001,000,对于 10k 行,通常为 50,005,000 或更多(n² + n)/2 显然呈指数增长.

The total number of such operations for a 2,000 row table is 2,001,000, for 10k rows 50,005,000 or more generally (n² + n) / 2 which clearly grows exponentially.

在 2,000 行的情况下,GROUP BY 和子查询版本之间的主要区别在于前者在连接后具有流聚合,因此有三列馈入其中(T1.ind, T2.col1, T2.col1) 和 T1.ind 的一个 GROUP BY 属性code> 而后者被计算为标量聚合,在加入之前的流聚合,只有 T2.col1 馈入其中并且没有 GROUP BY 属性设置在全部.可以看出,这种更简单的安排在减少 CPU 时间方面具有显着的好处.

In the 2,000 row case the main difference between the GROUP BY and the subquery versions is that the former has the stream aggregate after the join and so has three columns feeding into it (T1.ind, T2.col1, T2.col1) and a GROUP BY property of T1.ind whereas the latter is calculated as a scalar aggregate, with the stream aggregate before the join, only has T2.col1 feeding into it and has no GROUP BY property set at all. This simpler arrangement can be seen to have a measurable benefit in terms of reduced CPU time.

对于 10,000 行的情况,子查询计划存在额外的差异.它添加了一个 eager spool 来复制所有的 ind,cast(col1 as bigint) 值转换为 tempdb.在快照隔离的情况下,它比聚集索引结构更紧凑,最终效果是将读取次数减少了约 25%(因为基表为版本信息保留了相当多的空白空间),当这个选项关闭时,它会变得不那么紧凑(大概是由于 bigintint 的差异)和更多的读取结果.这减少了子查询和按版本分组之间的差距,但子查询仍然获胜.

For the 10,000 row case there is an additional difference in the sub query plan. It adds an eager spool which copies all the ind,cast(col1 as bigint) values into tempdb. In the case that snapshot isolation is on this works out more compact than the clustered index structure and the net effect is to reduce the number of reads by about 25% (as the base table preserves quite a lot of empty space for versioning info), when this option is off it works out less compact (presumably due to the bigint vs int difference) and more reads result. This reduces the gap between the sub query and group by versions but the sub query still wins.

然而,明显的赢家是递归 CTE.对于无间隙"版本,从基表中读取的逻辑现在是 2 x (n + 1) 反映了 n 索引寻找到 2 级索引以检索所有行加上末尾的附加行,不返回任何内容并终止递归.然而,这仍然意味着要处理 22 个页表需要 20,002 次读取!

The clear winner however was the Recursive CTE. For the "no gaps" version logical reads from the base table are now 2 x (n + 1) reflecting the n index seeks into the 2 level index to retrieve all of the rows plus the additional one at the end that returns nothing and terminates the recursion. That still meant 20,002 reads to process a 22 page table however!

递归 CTE 版本的逻辑工作表读取非常高.每个源行的工作表读取次数似乎为 6 次.这些来自存储前一行输出的索引假脱机,然后在下一次迭代中再次读取(Umachandar Jayachandran 对此进行了很好的解释 此处).尽管数量众多,但这仍然是表现最好的.

Logical work table reads for the recursive CTE version are very high. It seems to work out at 6 worktable reads per source row. These come from the index spool that stores the output of the previous row then is read from again in the next iteration (good explanation of this by Umachandar Jayachandran here). Despite the high number this is still the best performer.

这篇关于子查询或 leftjoin 与组哪个更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆