当WHERE子句包含参数化值时,为什么SQL Server使用索引扫描而不是索引查找 [英] Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

查看:146
本文介绍了当WHERE子句包含参数化值时,为什么SQL Server使用索引扫描而不是索引查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果 where 子句包含参数化值而不是字符串文字,我们发现SQL Server正在使用索引扫描而不是索引搜索。

We have found that SQL Server is using an index scan instead of an index seek if the where clause contains parametrized values instead of string literal.

以下是一个示例:

SQL Server在以下情况下执行索引扫描(where子句中的参数)

SQL Server performs index scan in following case (parameters in where clause)

declare @val1 nvarchar(40), @val2 nvarchar(40);
set @val1 = 'val1';
set @val2 = 'val2';

select 
    min(id) 
from 
    scor_inv_binaries 
where 
    col1 in (@val1, @val2) 
group by 
    col1

另一方面,以下查询执行索引搜索:

On the other hand, the following query performs an index seek:

select 
    min(id) 
from 
    scor_inv_binaries 
where 
    col1 in ('val1', 'val2') 
group by 
    col1

有没有人观察到类似的行为,以及他们如何解决这个问题以确保查询执行索引搜索而不是索引扫描?

Has any one observed similar behavior, and how they have fixed this to ensure that query performs index seek instead of index scan?

我们无法使用forceseek表提示,因为SQL Sserver 2005支持forceseek。

We are not able to use forceseek table hint, because forceseek is supported on SQL Sserver 2005.

我也更新了统计信息。
非常感谢您的帮助。

I have updated the statistics as well. Thank you very much for help.

推荐答案

很好地回答您的问题为什么SQL Server正在这样做,答案是不是按逻辑顺序编译查询,每个语句都是根据它自己的优点编译的,
所以当生成select语句的查询计划时,优化器不知道@ val1和@Val2会分别成为'val1'和'val2'。

Well to answer your question why SQL Server is doing this, the answer is that the query is not compiled in a logical order, each statement is compiled on it's own merit, so when the query plan for your select statement is being generated, the optimiser does not know that @val1 and @Val2 will become 'val1' and 'val2' respectively.

当SQL Server不知道该值时,它必须最好地猜测该变量出现的次数。该表有时会导致次优计划。我的主要观点是,具有不同值的相同查询可以生成不同的计划。想象一下这个简单的例子:

When SQL Server does not know the value, it has to make a best guess about how many times that variable will appear in the table, which can sometimes lead to sub-optimal plans. My main point is that the same query with different values can generate different plans. Imagine this simple example:

IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
    DROP TABLE #T;

CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT  TOP 991 1
FROM    sys.all_objects a
UNION ALL
SELECT  TOP 9 ROW_NUMBER() OVER(ORDER BY a.object_id) + 1
FROM    sys.all_objects a;

CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);

我在这里所做的只是创建一个简单的表,并添加值为1-10的1000行列 val ,但是1出现991次,而另外9只出现一次。前提是这个查询:

All I have done here is create a simple table, and add 1000 rows with values 1-10 for the column val, however 1 appears 991 times, and the other 9 only appear once. The premise being this query:

SELECT  COUNT(Filler)
FROM    #T
WHERE   Val = 1;

仅扫描整个表会比使用索引进行搜索更有效,然后执行991书签查找以获取 Filler 的值,但是只有1行以下查询:

Would be more efficient to just scan the entire table, than use the index for a seek, then do 991 bookmark lookups to get the value for Filler, however with only 1 row the following query:

SELECT  COUNT(Filler)
FROM    #T
WHERE   Val = 2;

进行索引搜索会更有效率,单个书签查找可以获得填充(运行这两个查询将批准这个)

will be more efficient to do an index seek, and a single bookmark lookup to get the value for Filler (and running these two queries will ratify this)

我很确定搜索和书签的截止查找实际上取决于具体情况,但它相当低。使用示例表,通过一些试验和错误,我发现我需要 Val 列,在优化器获取完整表之前需要38行值2扫描索引查找和书签查找:

I am pretty certain the cut off for a seek and bookmark lookup actually varies depending on the situation, but it is fairly low. Using the example table, with a bit of trial and error, I found that I needed the Val column to have 38 rows with the value 2 before the optimiser went for a full table scan over an index seek and bookmark lookup:

IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
    DROP TABLE #T;

DECLARE @I INT = 38;

CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT  TOP (991 - @i) 1
FROM    sys.all_objects a
UNION ALL
SELECT  TOP (@i) 2
FROM    sys.all_objects a
UNION ALL
SELECT  TOP 8 ROW_NUMBER() OVER(ORDER BY a.object_id) + 2
FROM    sys.all_objects a;

CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);

SELECT  COUNT(Filler), COUNT(*)
FROM    #T
WHERE   Val = 2;

因此,对于此示例,限制是匹配行的3.7%。

So for this example the limit is 3.7% of matching rows.

由于查询在使用变量时不知道将匹配多少行,因此最简单的方法是查找总行数,并将其除以列中不同值的总数,因此在此示例中, WHERE val = @Val 的估计行数为1000/10 = 100,实际算法比这更复杂,但是例如这样做。因此,当我们查看执行计划时:

Since the query does not know the how many rows will match when you are using a variable it has to guess, and the simplest way is by finding out the total number rows, and dividing this by the total number of distinct values in the column, so in this example the estimated number of rows for WHERE val = @Val is 1000 / 10 = 100, The actual algorithm is more complex than this, but for example's sake this will do. So when we look at the execution plan for:

DECLARE @i INT = 2;
SELECT  COUNT(Filler)
FROM    #T
WHERE   Val = @i;

我们可以在此处看到(使用原始数据)估计的行数为100,但实际行数为1。从前面的步骤我们知道,超过38行,优化器将选择对索引搜索进行聚簇索引扫描,因此,由于对行数的最佳猜测高于此,因此未知变量的计划是聚簇的索引扫描。

We can see here (with the original data) that the estimated number of rows is 100, but the actual rows is 1. From the previous steps we know that with more than 38 rows the optimiser will opt for a clustered index scan over an index seek, so since the best guess for the number of rows is higher than this, the plan for an unknown variable is a clustered index scan.

为了进一步证明理论,如果我们创建的表格中包含1000行数字1-27均匀分布(因此估计的行数大约为1000 / 27 = 37.037)

Just to further prove the theory, if we create the table with 1000 rows of numbers 1-27 evenly distributed (so the estimated row count will be approximately 1000 / 27 = 37.037)

IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
    DROP TABLE #T;

CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT  TOP 27 ROW_NUMBER() OVER(ORDER BY a.object_id)
FROM    sys.all_objects a;

INSERT #T (val)
SELECT  TOP 973 t1.Val
FROM    #T AS t1
        CROSS JOIN #T AS t2
        CROSS JOIN #T AS t3
ORDER BY t2.Val, t3.Val;

CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);

然后再次运行查询,我们得到一个索引搜索计划:

Then run the query again, we get a plan with an index seek:

DECLARE @i INT = 2;
SELECT  COUNT(Filler)
FROM    #T
WHERE   Val = @i;

所以希望能够全面涵盖您获得该计划的原因。现在我想下一个问题是你如何强制一个不同的计划,答案是,使用查询提示 OPTION(RECOMPILE),强制查询在参数值已知时的执行时间。恢复到原始数据,其中 Val = 2 的最佳计划是查找,但使用变量产生带有索引扫描的计划,我们可以运行:

So hopefully that pretty comprehensively covers why you get that plan. Now I suppose the next question is how do you force a different plan, and the answer is, to use the query hint OPTION (RECOMPILE), to force the query to compile at execution time when the value of the parameter is known. Reverting to the original data, where the best plan for Val = 2 is a lookup, but using a variable yields a plan with an index scan, we can run:

DECLARE @i INT = 2;
SELECT  COUNT(Filler)
FROM    #T
WHERE   Val = @i;

GO

DECLARE @i INT = 2;
SELECT  COUNT(Filler)
FROM    #T
WHERE   Val = @i
OPTION (RECOMPILE);

我们可以看到后者使用索引搜索和键查找,因为它在执行时检查了变量的值,并且选择最适合该特定值的计划。 OPTION(RECOMPILE)的问题在于这意味着您无法利用缓存的查询计划,因此每次编译查询都需要额外的费用。

We can see that the latter uses the index seek and key lookup because it has checked the value of variable at execution time, and the most appropriate plan for that specific value is chosen. The trouble with OPTION (RECOMPILE) is that means you can't take advantage of cached query plans, so there is an additional cost of compiling the query each time.

这篇关于当WHERE子句包含参数化值时,为什么SQL Server使用索引扫描而不是索引查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆