SQL Server 如何计算估计的行数? [英] How does SQL server work out the estimated number of rows?

查看:24
本文介绍了SQL Server 如何计算估计的行数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试调试一个相当复杂的存储过程,该过程连接多个表 (10-11).我看到对于树的一部分,估计的行数与实际的行数大不相同 - 在最坏的情况下,SQL 服务器估计将返回 1 行,而实际上返回 55,000 行!

I'm trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I'm seeing that for a part of the tree the estimated number of rows drasticly differs from the actual number of rows - at its worst SQL server estimates that 1 row will be returned, when in actuality 55,000 rows are returned!

我正在尝试弄清楚为什么会这样 - 我的所有统计数据都是最新的,并且我已经在几个表上使用 FULLSCAN 更新了统计数据.我没有使用任何用户定义的函数或表变量.就我所见,SQL Server 应该能够准确估计将要返回的行数,但它继续选择一个计划,在这种情况下它执行数万次 RDI 查找(当它期望只执行 1或 2).

I'm trying to work out why this is - all of my statistics are up-to-date, and I've updated statistics with a FULLSCAN on several tables. I'm not using any user defined functions or table variables. As far as I can see SQL server should be able to exactly estimate how many rows are going to be returned, but it continues to choose a plan which cases it to perform tens of thousands of RDI lookups (when it is expecting to only perform 1 or 2).

我能做些什么来尝试理解为什么估计的行数会超出这么多?

What can I do to try and understand why the estimated number of rows is out by so much?

更新:所以查看计划,我发现一个特别可疑的节点 - 它使用以下前置词对表进行表扫描:

UPDATE: So looking at the plan I've found one node in particular which seems suspicous - its a table scan on a table using the following predecate:

status <> 5
AND [type] = 1
OR [type] = 2

该谓词返回整个表(630 行 - 表本身扫描它不是性能不佳的根源)但是 SQL Server 的估计行数仅为 37.然后 SQL Server 继续执行多个嵌套循环这适用于 RDI 查找、索引扫描和索引查找.这会不会是我严重失算的根源?我如何让它估计更合理的行数?

This predicate returns the entire table (630 rows - the table scan itself it NOT the source of the poor performance) however SQL server has the estimated number of rows at just 37. SQL server then goes on to do several nested loops with this onto RDI lookups, index scans and index seeks. Could this be the source of my massive miscalculation? How do I get it to estimate a more sensible number of rows?

推荐答案

SQL Server 使用以下数据(来自 此处):

SQL Server splits each index into up to 200 ranges with the following data (from here):

  • RANGE_HI_KEY

显示直方图步骤上边界的键值.

A key value showing the upper boundary of a histogram step.

RANGE_ROWS

指定范围内有多少行(它们比这个RANGE_HI_KEY小,但比之前较小的RANGE_HI_KEY大).

Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).

EQ_ROWS

指定多少行完全等于 RANGE_HI_KEY.

Specifies how many rows are exactly equal to RANGE_HI_KEY.

AVG_RANGE_ROWS

范围内每个不同值的平均行数.

Average number of rows per distinct value inside the range.

DISTINCT_RANGE_ROWS

指定在这个范围内有多少不同的键值(不包括RANGE_HI_KEYRANGE_HI_KEY本身之前的前一个键);

Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);

通常,大多数填充的值进入RANGE_HI_KEY.

Usually, most populated values go into RANGE_HI_KEY.

但是,它们可以进入范围内,这会导致分布偏斜.

However, they can get into the range and this can lead to the skew in distribution.

想象一下这些数据(以及其他数据):

Imagine these data (among the others):

键值行数

1          1
2          1
3          10000
4          1

SQL Server 通常构建两个范围:134 到下一个填充值,这进行这些统计:

SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
3             2           10000    1               2

,这意味着搜索时,比如说,2,只有1行,最好使用索引访问.

, which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

但如果 3 进入范围内,统计如下:

But if 3 goes inside the range, the statistics are these:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
4             10002       1        3334            3

优化器认为键 23334 行,索引访问成本太高.

The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.

这篇关于SQL Server 如何计算估计的行数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆