子查询或与组的leftjoin哪个更快? [英] subquery or leftjoin with group by which one is faster?

查看:182
本文介绍了子查询或与组的leftjoin哪个更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在我的应用程序中显示带有total列的运行总计..​​.所以已经使用以下查询来查找运行总计..​​.,我发现两者都可以按照我的需要工作.在其中一个中,我使用了与group by组成的左联接,在另一个中,我使用了子查询.

i have to show running total with the total column in my application ... so i have used the following queries for finding the running total... and i find that both are working as per my need . in one i used the left join with group by and in another one i used the sub query .

现在我的问题是,当我的数据每天增长成千上万时,哪一个更快?如果数据将限制在1000或2000行中,那么哪一个更好……而其他方法比这些更快?两个????

and now my question is which one is faster when my data grow in thousands daily and if data will be in limit of 1000 or 2000 rows then which one is better ... and any other method by which is more faster then these two ????

declare @tmp table(ind int identity(1,1),col1 int)
insert into @tmp
select 2
union
select 4
union
select 7
union 

select 5
union
select 8
union 
select 10



SELECT t1.col1,sum( t2.col1)
FROM @tmp AS t1 LEFT JOIN @tmp t2 ON t1.ind>=t2.ind
group by t1.ind,t1.col1


select t1.col1,(select sum(col1) from  @tmp as t2 where t2.ind<=t1.ind)
from @tmp as t1

推荐答案

有关在SQL Server中计算运行总计的好资源是

A great resource on calculating running totals in SQL Server is this document by Itzik Ben Gan that was submitted to the SQL Server Team as part of his campaign to have the OVER clause extended further from its initial SQL Server 2005 implementation. In it he shows how once you get into tens of thousands of rows cursors out perform set based solutions. SQL Server 2012 did indeed extend the OVER clause making this sort of query much easier.

SELECT col1,
       SUM(col1) OVER (ORDER BY ind ROWS UNBOUNDED PRECEDING)
FROM   @tmp 

就像您在使用SQL Server 2005一样,此功能对您不可用.

As you are on SQL Server 2005 however this is not available to you.

Adam Machanic

Adam Machanic shows here how the CLR can be used to improve on the performance of standard TSQL cursors.

对于此表定义

CREATE TABLE RunningTotals
(
ind int identity(1,1) primary key,
col1 int
)

我在具有ALLOW_SNAPSHOT_ISOLATION ON的数据库中创建了包含2,000行和10,000行的表,并且其中一个设置已关闭(原因是因为我的初始结果是在DB中设置了该设置,这导致了令人费解的方面结果).

I create tables with both 2,000 and 10,000 rows in a database with ALLOW_SNAPSHOT_ISOLATION ON and one with this setting off (The reason for this is because my initial results were in a DB with the setting on that led to a puzzling aspect of the results).

所有表的聚集索引只有1个根页.每个页面的页数如下所示.

The clustered indexes for all tables just had 1 root page. The number of leaf pages for each is shown below.

+-------------------------------+-----------+------------+
|                               | 2,000 row | 10,000 row |
+-------------------------------+-----------+------------+
| ALLOW_SNAPSHOT_ISOLATION OFF  |         5 |         22 |
| ALLOW_SNAPSHOT_ISOLATION ON   |         8 |         39 |
+-------------------------------+-----------+------------+

我测试了以下情况(链接显示了执行计划)

I tested the following cases (Links show execution plans)

  1. 左加入和分组依据
  2. 相关子查询 2000行计划来自Mikael(更新后的答案)的CTE
  3. 下方的CTE
  1. Left Join and Group By
  2. Correlated subquery 2000 row plan,10000 row plan
  3. CTE from Mikael's (updated) answer
  4. CTE below

之所以加入其他CTE选项,是为了提供一种CTE解决方案,如果不能保证按顺序提供ind列,该解决方案仍然可以使用.

The reason for inclusion of the additional CTE option was in order to provide a CTE solution that would still work if the ind column was not guaranteed sequential.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @col1 int, @sumcol1 bigint;

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 ind, col1, CAST(col1 AS BIGINT) AS Total
        FROM RunningTotals
        ORDER BY ind
        UNION   ALL
        SELECT  R.ind, R.col1, R.Total
        FROM    (
                SELECT  T.*,
                        T.col1 + Total AS Total,
                        rn = ROW_NUMBER() OVER (ORDER BY T.ind)
                FROM    RunningTotals T
                JOIN    RecursiveCTE R
                        ON  R.ind < T.ind
                ) R
        WHERE   R.rn = 1
        )
SELECT  @col1 =col1, @sumcol1=Total
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

所有查询都添加了CAST(col1 AS BIGINT),以避免在运行时出现溢出错误.另外,对于所有这些人,我都将结果分配给上述变量,以消除花费在考虑中发回结果的时间.

All of the queries had a CAST(col1 AS BIGINT) added in order to avoid overflow errors at runtime. Additionally for all of them I assigned the results to variables as above in order to eliminate time spent sending back results from consideration.

+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  |          |        |          Base Table        |         Work Table         |     Time        |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  | Snapshot | Rows   | Scan count | logical reads | Scan count | logical reads | cpu   | elapsed |
| Group By         | On       | 2,000  | 2001       | 12709         |            |               | 1469  | 1250    |
|                  | On       | 10,000 | 10001      | 216678        |            |               | 30906 | 30963   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 1140  | 1160    |
|                  | Off      | 10,000 | 10001      | 130089        |            |               | 29906 | 28306   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| Sub Query        | On       | 2,000  | 2001       | 12709         |            |               | 844   | 823     |
|                  | On       | 10,000 | 2          | 82            | 10000      | 165025        | 24672 | 24535   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 766   | 999     |
|                  | Off      | 10,000 | 2          | 48            | 10000      | 165025        | 25188 | 23880   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE No Gaps      | On       | 2,000  | 0          | 4002          | 2          | 12001         | 78    | 101     |
|                  | On       | 10,000 | 0          | 20002         | 2          | 60001         | 344   | 342     |
|                  | Off      | 2,000  | 0          | 4002          | 2          | 12001         | 62    | 253     |
|                  | Off      | 10,000 | 0          | 20002         | 2          | 60001         | 281   | 326     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE Alllows Gaps | On       | 2,000  | 2001       | 4009          | 2          | 12001         | 47    | 75      |
|                  | On       | 10,000 | 10001      | 20040         | 2          | 60001         | 312   | 413     |
|                  | Off      | 2,000  | 2001       | 4006          | 2          | 12001         | 94    | 90      |
|                  | Off      | 10,000 | 10001      | 20023         | 2          | 60001         | 313   | 349     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+

相关子查询和GROUP BY版本均使用三角形"嵌套循环联接,该嵌套联接由对RunningTotals表(T1)的聚集索引扫描驱动,并且对于该扫描返回的每一行,都回溯到T2.ind<=T1.ind上的表(T2)自联接.

Both the correlated subquery and the GROUP BY version use "triangular" nested loop joins driven by a clustered index scan on the RunningTotals table (T1) and, for each row returned by that scan, seeking back into the table (T2) self joining on T2.ind<=T1.ind.

这意味着将重复处理相同的行.当处理T1.ind=1000行时,自我联接将检索所有行并用ind <= 1000求和,然后对于下一行T1.ind=1001再次检索相同的1000行 并将其与一个相加其他行等等.

This means that the same rows get processed repeatedly. When the T1.ind=1000 row is processed the self join retrieves and sums all rows with an ind <= 1000, then for the next row where T1.ind=1001 the same 1000 rows are retrieved again and summed along with one additional row and so on.

对于2,000行表,此类操作的总数为2,001,000,对于1万行,此类操作总数为50,005,000,或更常见的是 (n² + n) / 2 显然呈指数增长.

The total number of such operations for a 2,000 row table is 2,001,000, for 10k rows 50,005,000 or more generally (n² + n) / 2 which clearly grows exponentially.

在2,000行的情况下,GROUP BY和子查询版本之间的主要区别在于,前者在联接之后具有流聚合,因此有三列输入其中(T1.indT2.col1)和GROUP BY属性,而T1.ind作为标量聚合计算,流聚合在连接之前,仅加入T2.col1且完全没有设置GROUP BY属性.从减少CPU时间方面来看,这种更简单的安排具有明显的好处.

In the 2,000 row case the main difference between the GROUP BY and the subquery versions is that the former has the stream aggregate after the join and so has three columns feeding into it (T1.ind, T2.col1, T2.col1) and a GROUP BY property of T1.ind whereas the latter is calculated as a scalar aggregate, with the stream aggregate before the join, only has T2.col1 feeding into it and has no GROUP BY property set at all. This simpler arrangement can be seen to have a measurable benefit in terms of reduced CPU time.

对于10,000行的情况,子查询计划存在其他差异.它会添加一个急切后台处理程序,它将所有ind,cast(col1 as bigint)值复制到tempdb.在快照隔离的情况下,它比聚簇索引结构更紧凑,并且最终效果是将读取次数减少了约25%(因为基本表为版本控制信息保留了大量空白空间),禁用此选项后,它的紧凑程度会降低(可能是由于bigintint的差异),并且读取结果会更多.这样可以减少子查询和按版本分组之间的差距,但是子查询仍然可以胜出.

For the 10,000 row case there is an additional difference in the sub query plan. It adds an eager spool which copies all the ind,cast(col1 as bigint) values into tempdb. In the case that snapshot isolation is on this works out more compact than the clustered index structure and the net effect is to reduce the number of reads by about 25% (as the base table preserves quite a lot of empty space for versioning info), when this option is off it works out less compact (presumably due to the bigint vs int difference) and more reads result. This reduces the gap between the sub query and group by versions but the sub query still wins.

然而,明显的赢家是递归CTE.对于无间隙"版本,现在从基表中进行逻辑读取2 x (n + 1),反映了n索引将搜索2级索引以检索所有行以及末尾的其他行(不返回任何内容并终止递归) .但这仍然意味着有20002次读取来处理22页的表格!

The clear winner however was the Recursive CTE. For the "no gaps" version logical reads from the base table are now 2 x (n + 1) reflecting the n index seeks into the 2 level index to retrieve all of the rows plus the additional one at the end that returns nothing and terminates the recursion. That still meant 20,002 reads to process a 22 page table however!

递归CTE版本的逻辑工作表读取非常多.似乎可以对每个源行进行6次工作表读取.这些来自存储上一行输出的索引假脱机,然后在下一次迭代中再次读取(Umachandar Jayachandran 此处).尽管人数众多,但仍然是表现最好的人.

Logical work table reads for the recursive CTE version are very high. It seems to work out at 6 worktable reads per source row. These come from the index spool that stores the output of the previous row then is read from again in the next iteration (good explanation of this by Umachandar Jayachandran here). Despite the high number this is still the best performer.

这篇关于子查询或与组的leftjoin哪个更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆