同一索引操作上不同的估计行? [英] different estimated rows on same index operation?

查看:101
本文介绍了同一索引操作上不同的估计行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介和背景

我不得不优化一个简单的查询(下面的示例).重写几次后,我认识到一个索引和相同索引操作的估计行数根据查询的编写方式而有所不同.

I had to optimize a simple query (example below). After rewriting it several times I recognized that the estimated row count on the one and same index operation differs depending on the way the query is written.

该查询最初进行了聚集索引扫描,因为生产中的表包含一个二进制列,所以该表非常大(约100 GB),并且全表扫描需要太多时间来执行.

Originally the query did a clustered index scan, as the table in production contains a binary column the table is quite large (about 100 GB) and the full table scan takes too much time to execute.

问题

为什么在同一索引操作上估计的行数不同(将显示示例)?优化器在这里做什么?

Why is the estimated row count different on the same index operation (example will show)? What is the optimizer doing here?

示例数据库-我正在使用SQL Server 2008 R2

我试图为生产表创建一个非常简单的版本,以显示该行为.

I tried to create a very simplyfied version of my production tables that shows the behaviour.

-- CREATE THE SAMPLE TABLES
----------------------------
CREATE TABLE dbo.MasterTable(
    MasterId    smallint NOT NULL,
    Name        varchar(5) NOT NULL,
    CONSTRAINT PK_MasterTable PRIMARY KEY CLUSTERED (MasterId ASC)
) ON  [PRIMARY]

GO

CREATE TABLE dbo.DetailTable(
    DetailId    bigint IDENTITY(1,1) NOT NULL,
    MasterId    smallint NOT NULL,
    Name        nvarchar(50) NOT NULL,
    CreateDate  datetime NOT NULL,
    CONSTRAINT PK_DetailTable PRIMARY KEY CLUSTERED (DetailId ASC)
) ON  [PRIMARY]

GO

ALTER TABLE dbo.DetailTable
    ADD  CONSTRAINT FK1
    FOREIGN KEY(MasterId) REFERENCES dbo.MasterTable (MasterId)

GO

CREATE NONCLUSTERED INDEX IX_DetailTable
    ON dbo.DetailTable( MasterId ASC, Name ASC )

GO

-- INSERT SOME SAMPLE DATA
----------------------------
SET NOCOUNT ON
GO

-- These are some Codes. In our system we always use these codes to search for "types" of data.

INSERT INTO dbo.MasterTable (MasterId, Name)
VALUES (1, 'N1'), (2, 'N2'), (3, 'N3'), (4, 'N4'), (5, 'N5'), (6, 'N6'), (7, 'N7'), (8, 'N8')

GO

-- ADD ROWS TO THE DETAIL TABLE
-- Takes about 1 minute to run
-- Don't care about the logic, it's just to get a distribution similar to production system
----------------------------
declare @x int = 1
DECLARE @MasterID INT
while (@x <= 400000)
begin
    SET @MasterID = ABS(CHECKSUM(NEWID())) % 8 + 1

    INSERT INTO dbo.DetailTable(MasterId,Name,CreateDate)
    VALUES(
        CASE
            WHEN @MasterID IN (1, 3, 4) AND @x % 20 != 0 THEN 2
            WHEN @MasterID IN (5, 6) AND @x % 20 != 0 THEN 7
            WHEN @MasterID = 8 AND @x % 100 != 0 THEN 7
            ELSE @MasterID
        END,
        NEWID(),
        DATEADD(DAY, - ABS(CHECKSUM(NEWID())) % 1000, GETDATE())
)

SET @x = @x + 1
end

go
-- DO THE INDEX AND STATISTIC MAINTENANCE
----------------------------
alter index all on dbo.DetailTable reorganize
alter index all on dbo.MasterTable reorganize
update statistics dbo.DetailTable WITH FULLSCAN
update statistics dbo.MasterTable WITH FULLSCAN
go

准备工作已经完成,让我们从查询开始

首先让我们看一下统计数据,再来看RANGE_HI_KEY=8,有489个EQ_ROWS

Let's have a look at the statistics first, look at RANGE_HI_KEY=8, there are 489 EQ_ROWS

-- CHECK THE STATISTICS
----------------------------
dbcc show_statistics ('dbo.DetailTable', IX_DetailTable)
GO

现在,我们进行查询.第一个是我必须优化的原始查询. 执行时请激活当前的执行计划. 看一下索引查找(非聚集)[DetailTable].[IX_DetailTable]"操作

Now we do the query. The first one is the original query I had to optimize. Please activate the current execution plan when executing. Have a look at the operation "index seek (nonclustered) [DetailTable].[IX_DetailTable]"

-- ORIGINAL QUERY
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'

GO

-- FORCESEEK
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'

GO

-- Actual: 489, Estimated 50.000


-- TABLE VARIABLE
----------------------------
DECLARE @MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO @MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN @MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'

GO

-- Actual: 489, Estimated 40.000

-- TEMP TABLE
----------------------------
CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
    SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'

SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'

-- Actual 489, Estimated 489

DROP TABLE #MasterId

GO

分析和最终问题

请查看索引查找(非聚集)[DetailTable].[IX_DetailTable]"操作

Please have a look at the operation "index seek (nonclustered) [DetailTable].[IX_DetailTable]"

上面脚本中的注释向您显示了我获得的估计行数和实际行数的值.

The comments in the script above show you the values I got for estimated and actual row count.

在我们的生产环境中,该表有3,300万行,上述查询中的估计行从300万到1600万不等.

In our production environment this table has 33 million rows, the estimated rows in the queries above differ from 3 million to 16 million.

总结:

  1. 在DetailTable和MasterTable之间建立连接时,估计的行计数为12,5%(主表中有8个值,这很有意义,有点...)

  1. when a join between the DetailTable and the MasterTable is made, the estimated rowcount is 12,5% (there are 8 values in the master table, it makes sense, kind of...)

在DetailTable和table变量之间建立连接时,估计的行数为10%

when a join between the DetailTable and the table variable is made, the estimated rowcount is 10%

在DetailTable和temp表之间建立连接时,估计的行数与实际的行数完全相同

when a join between the DetailTable and the temp table is made, the estimated rowcount is exactly the same as the actual row count

问题是为什么这些值不同?

The question is why do these values differ?

统计信息是最新的,估计起来确实很容易.

The statistics are up to date and making an estimation should really be easy.

我只是想了解这一点.

推荐答案

由于没人回答,我会尝试给出答案:

请不要强迫优化器关注您

Please don`t force optimizer to follow you

(1)关于您原始查询的说明:

SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'

为什么查询速度慢?

此查询速度很慢,因为您的索引未涵盖此查询, 这两个查询都使用索引扫描,而不是使用哈希连接"进行连接:

this query is slow because your indexes are not covering this query, both query are using index scan and than joining with "Hash join":

为什么要扫描整行以获取可控表? 因为主表上的索引在MasterId列上,而不在Name列上.

WHY scanning entire row for mastertable ? Because index on Master table is on column MasterId , not on column Name.

为什么要扫描整行以获取Detailtable?因为这里索引也打开了 (详细信息)聚集"和(MasterId ASC,名称ASC)非聚集"
不在创建日期"列上.

WHY scanning entire row for Detailtable? Because here as well index is on (DETAILID) "CLUSTERED" AND ( MasterId ASC, Name ASC ) "NON CLUSTERED"
not on Createdate column.

具有一个NONCLUSTERED索引将有助于此查询针对此特定查询的ON列(CREATEDATE,MasterId).

having one NONCLUSTERED index will help this query ON column (CREATEDATE,MasterId ) for this particular Query.

如果主表也很大,则可以在(名称)列上创建NONCLUSTERED索引.

If your Master table is huge as well you can create NONCLUSTERED index on (Name) column.

(2)关于FORCESEEK的说明:

SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO

为什么优化器估计有50,000行?

Why optimizer estimated 50,000 row ?

这里您要加入d.MasterId = m.MasterId列,并且您正在FORFORING优化器中选择在Detail表上查找. 选择器使用INDEX IX_DetailTable()通过LOOP join加入您的Mastertable.

Here you are joining on column d.MasterId = m.MasterId and you are FORCING optimizer to choose seek on Detail table, so optizer using INDEX IX_DetailTable () to join your Mastertable using LOOP join .

由于Optimizer选择了Loop联接,因此将MAster表的所有行(实际上是1个)联接到Detail表 因此它将从主表中选择一个键,然后寻找整个索引,然后将匹配的值传递给进一步的迭代器.

Since Optimizer chooosing Loop join to join all rows (Actually ONE) of MAster table to Detail table so it will choose one key from master table then seek for entire index and then pass the matching value to further iterator.

因此优化器选择每个值的平均行数. 列40000表的基数(行)中有8个唯一值,因此 40000/8估计有50,000行(足够公平).

so optimizer chooses Average of rows per value . 8 unique values in column 40000 table cardinality (rows) so 40000 / 8 Is 50,000 rows estimated (fair enough).

这是您的查询:

DECLARE @MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO @MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN @MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'

GO

Statistictic不会对表变量进行维护,因此optimzer不会发现该gonaa要处理一个好的计划要处理多少行(因此它估计为1行), 这里还有估计的行是1,而实际的第1行也是congr !!

Statatictic does not maintain on table variable so optimzer has no idia how many rows( so it estimate 1 row )it gonaa deal with to produce a good plan, here as well estimated rows are 1 and actual row 1 aswell congrates!!

但是优化程序如何估算"40.000" ROWS

but how optimizer Estimated "40.000" ROWS

我个人从来没有检查过这个问题,由于这个问题,我没有进行过服务测试,但是我不知道优化程序如何计算估计的行数,所以如果有人来启发我们,那将是很好的.

Personally i never checked this and because of this question i did servels testing, but have no idia how optimzer calculating estimated rows, so it will be great if someone come up and enlight us.

(4)-温度表

您的查询

CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
    SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'

SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'

-- Actual 489, Estimated 489
DROP TABLE #MasterId

在这里,优化器选择的查询计划与表变量中选择的查询计划相同,但差异是 统计信息确实会保留在临时表上,因此在查询优化器中,这里有一个公平的提示,它实际上要加入哪一行. "N8"键在dbo中具有8和8个估计行.DetailTable为489.

here as well optimizer is choosing same query plan as was choosing in table variable but diffrence is Statistics does maintain on temp tables, So Here in query optimizer has a fair idia what row it actually going to join. "N8" key has 8, and 8`s estimated rows in dbo.DetailTable is 489.

这篇关于同一索引操作上不同的估计行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆