SQL为什么SELECT COUNT(*),MIN(col),MAX(col)比SELECT MIN(col),MAX(col)更快 [英] SQL why is SELECT COUNT(*) , MIN(col), MAX(col) faster then SELECT MIN(col), MAX(col)

查看:112
本文介绍了SQL为什么SELECT COUNT(*),MIN(col),MAX(col)比SELECT MIN(col),MAX(col)更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们发现这些查询之间存在巨大差异。

We're seeing a huge difference between these queries.

慢速查询

The slow query

SELECT MIN(col) AS Firstdate, MAX(col) AS Lastdate 
FROM table WHERE status = 'OK' AND fk = 4193

表 table。扫描计数2,逻辑读2458969,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。

Table 'table'. Scan count 2, logical reads 2458969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server执行时间: CPU时间= 1966毫秒,经过的时间= 1955毫秒。

SQL Server Execution Times: CPU time = 1966 ms, elapsed time = 1955 ms.

快速查询

The fast query

SELECT count(*), MIN(col) AS Firstdate, MAX(col) AS Lastdate 
FROM table WHERE status = 'OK' AND fk = 4193

表'table'。扫描计数1,逻辑读5803,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。

Table 'table'. Scan count 1, logical reads 5803, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server执行时间: CPU时间= 0毫秒,经过的时间= 9毫秒。

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 9 ms.

问题

Question

查询之间的巨大性能差异是什么原因?

What is the reason between the huge performance difference between the queries?

更新
根据注释中给出的问题进行一点更新:

Update A little update based on questions given as comments:

执行顺序或重复执行不会改变性能。
没有使用额外的参数,并且(test)数据库在执行过程中没有做其他任何事情。

The order of execution or repeated execution changes nothing performance wise. There are no extra parameters used and the (test)database is not doing anything else during execution.

慢速查询

|--Nested Loops(Inner Join)
 |--Stream Aggregate(DEFINE:([Expr1003]=MIN([DBTest].[dbo].[table].[startdate])))
   |    |--Top(TOP EXPRESSION:((1)))
   |         |--Nested Loops(Inner Join, OUTER REFERENCES:([DBTest].[dbo].[table].[id], [Expr1008]) WITH ORDERED PREFETCH)
   |              |--Index Scan(OBJECT:([DBTest].[dbo].[table].[startdate]), ORDERED FORWARD)
   |              |--Clustered Index Seek(OBJECT:([DBTest].[dbo].[table].[PK_table]), SEEK:([DBTest].[dbo].[table].[id]=[DBTest].[dbo].[table].[id]),  WHERE:([DBTest].[dbo].[table].[FK]=(5806) AND [DBTest].[dbo].[table].[status]<>'A') LOOKUP ORDERED FORWARD)
   |--Stream Aggregate(DEFINE:([Expr1004]=MAX([DBTest].[dbo].[table].[startdate])))
        |--Top(TOP EXPRESSION:((1)))
             |--Nested Loops(Inner Join, OUTER REFERENCES:([DBTest].[dbo].[table].[id], [Expr1009]) WITH ORDERED PREFETCH)
                  |--Index Scan(OBJECT:([DBTest].[dbo].[table].[startdate]), ORDERED BACKWARD)
                  |--Clustered Index Seek(OBJECT:([DBTest].[dbo].[table].[PK_table]), SEEK:([DBTest].[dbo].[table].[id]=[DBTest].[dbo].[table].[id]),  WHERE:([DBTest].[dbo].[table].[FK]=(5806) AND [DBTest].[dbo].[table].[status]<>'A') LOOKUP ORDERED FORWARD)

快速查询

 |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1012],0)))
   |--Stream Aggregate(DEFINE:([Expr1012]=Count(*), [Expr1004]=MIN([DBTest].[dbo].[table].[startdate]), [Expr1005]=MAX([DBTest].[dbo].[table].[startdate])))
        |--Nested Loops(Inner Join, OUTER REFERENCES:([DBTest].[dbo].[table].[id], [Expr1011]) WITH UNORDERED PREFETCH)
             |--Index Seek(OBJECT:([DBTest].[dbo].[table].[FK]), SEEK:([DBTest].[dbo].[table].[FK]=(5806)) ORDERED FORWARD)
             |--Clustered Index Seek(OBJECT:([DBTest].[dbo].[table].[PK_table]), SEEK:([DBTest].[dbo].[table].[id]=[DBTest].[dbo].[table].[id]),  WHERE:([DBTest].[dbo].[table].[status]<'A' OR [DBTest].[dbo].[table].[status]>'A') LOOKUP ORDERED FORWARD)

答案

Answer

马丁·史密斯(Martin Smith)在下面给出的答案似乎可以解释问题。超级简短的版本是MS-SQL查询分析器在慢速查询中错误地使用了查询计划,从而导致了完整的表扫描。

The answer given below by Martin Smith seems to explain the problem. The super short version is that the MS-SQL query-analyser wrongly uses a query plan in the slow query which causes a complete table scan.

在开始日期,FK和状态列上添加一个Count(*),带有(FORCESCAN)的查询提示或组合索引可解决性能问题。

Adding a Count(*), the query hint with(FORCESCAN) or a combined index on the startdate,FK and status columns fixes the performance issue.

推荐答案

SQL Server基数估计器做出各种建模假设,例如

The SQL Server cardinality estimator makes various modelling assumptions such as



  • 独立性:除非可获得相关信息,否则不同列上的数据分布是独立的。

  • 均匀性:在每个统计对象直方图步骤中,不同的值是均匀的传播,每个值具有相同的频率。

表中有810,064行。

There are 810,064 rows in the table.

您有查询

SELECT COUNT(*),
       MIN(startdate) AS Firstdate,
       MAX(startdate) AS Lastdate
FROM   table
WHERE  status <> 'A'
       AND fk = 4193 

1,893(0.23%)行满足 fk = 4193 谓词,并且其中两个未通过 status<> A 部分,因此总共需要1,891个匹配项,因此需要进行汇总。

1,893 (0.23%) rows meet the fk = 4193 predicate, and of those two fail the status <> 'A' part so overall 1,891 match and need to be aggregated.

您还有两个索引,都无法覆盖整个查询。

You also have two indexes neither of which cover the whole query.

对于您的快速查询,它使用 fk 上的索引直接查找 fk所在的行= 4193 然后需要执行1,893 密钥查找查找聚簇索引中的每一行,以检查状态谓词并检索开始日期进行聚合。

For your fast query it uses an index on fk to directly find rows where fk = 4193 then needs to do 1,893 key lookups to find each row in the clustered index to check the status predicate and retrieve the startdate for aggregation.

SELECT 列表中删除 COUNT(*) SQL Server不再具有来处理每个符合条件的行。因此,它考虑了另一种选择。

When you remove the COUNT(*) from the SELECT list SQL Server no longer has to process every qualifying row. As a result it considers another option.

您在 startdate 上有一个索引,因此它可以从首先,将键查找返回基表,并在找到第一个匹配的行停靠点后立即找到 MIN(startdate),与此类似, MAX 可以在索引的另一端开始并向后工作的另一次扫描中找到。

You have an index on startdate so it could start scanning that from the beginning, doing key lookups back to the base table and as soon as it finds the first matching row stop as it has found the MIN(startdate), Similarly the MAX can be found with another scan starting the other end of the index and working backwards.

SQL Server估计每个扫描最终将处理590行,然后再命中与谓词匹配的行。总共要进行1,180次查找,而要进行1,893次查找,因此它选择了此计划。

SQL Server estimates that each of these scans will end up processing 590 rows before they hit upon one that matches the predicate. Giving 1,180 total lookups vs 1,893 so it chooses this plan.

590的数字只是 table_size /估算的_rows_that_match 。也就是说,基数估计量假设匹配的行将均匀分布在整个表中。

The 590 figure is just table_size / estimated_number_of_rows_that_match. i.e. the cardinality estimator assumes that the matching rows will be evenly distributed throughout the table.

不幸的是,满足谓词的1,891行不是不是关于开始日期进行分配。实际上,它们都被压缩到索引末尾的单个8,205行段中,这意味着到达 MIN(startdate)的扫描最终在进行801,859键查找之前它可以停止。

Unfortunately the 1,891 rows that meet the predicate are not randomly distributed with respect to startdate. In fact they are all condensed into a single 8,205 row segment towards the end of the index meaning that the scan to get to the MIN(startdate) ends up doing 801,859 key lookups before it can stop.

下面可以复制。

CREATE TABLE T
(
id int identity(1,1) primary key,
startdate datetime,
fk int,
[status] char(1),
Filler char(2000)
)

CREATE NONCLUSTERED INDEX ix ON T(startdate)

INSERT INTO T
SELECT TOP 810064 Getdate() - 1,
                  4192,
                  'B',
                  ''
FROM   sys.all_columns c1,
       sys.all_columns c2  


UPDATE T 
SET fk = 4193, startdate = GETDATE()
WHERE id BETWEEN 801859 and 803748 or id = 810064

UPDATE T 
SET  startdate = GETDATE() + 1
WHERE id > 810064


/*Both queries give the same plan. 
UPDATE STATISTICS T WITH FULLSCAN
makes no difference*/

SELECT MIN(startdate) AS Firstdate, 
       MAX(startdate) AS Lastdate 
FROM T
WHERE status <> 'A' AND fk = 4192


SELECT MIN(startdate) AS Firstdate, 
       MAX(startdate) AS Lastdate 
FROM T
WHERE status <> 'A' AND fk = 4193

您可以考虑使用查询提示来强制计划使用索引在 fk 而不是 startdate 上添加,或者在( fk,status)包含(开始日期)可以避免此问题。

You could consider using query hints to force the plan to use the index on fk rather than startdate or add the suggested missing index highlighted in the execution plan on (fk,status) INCLUDE (startdate) to avoid this issue.

这篇关于SQL为什么SELECT COUNT(*),MIN(col),MAX(col)比SELECT MIN(col),MAX(col)更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆