为什么SQL Server执行计划取决于比较顺序 [英] Why SQL Server execution plan depends on comparison order

查看:88
本文介绍了为什么SQL Server执行计划取决于比较顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在优化SQL Server上的查询,然后遇到了我没想到的事情.数据库中有一个表tblEvent,其中的其他列还有IntegrationEventStateIdModifiedDateUtc.这些列有一个索引:

I was optimising a query on SQL Server and ran into something I was not expecting. There is a table tblEvent in the database, among other columns it has IntegrationEventStateId and ModifiedDateUtc. There is an index by these columns:

create index IX_tblEvent_IntegrationEventStateId_ModifiedDateUtc
on dbo.tblEvent (
    IntegrationEventStateId,
    ModifiedDateUtc
)

当我执行以下语句时:

select *
from dbo.tblEvent e
where
    e.IntegrationEventStateId = 1
    or e.IntegrationEventStateId = 2
    or e.IntegrationEventStateId = 5
    or (e.IntegrationEventStateId = 4 and e.ModifiedDateUtc >= dateadd(minute, -5, getutcdate()))

我得到了这个执行计划(注意索引没有被使用):

I get this execution plan (note the index does NOT get used):

但是当我执行以下语句时:

But when I execute this statement:

select *
from dbo.tblEvent e
where
    1 = e.IntegrationEventStateId
    or 2 = e.IntegrationEventStateId
    or 5 = e.IntegrationEventStateId
    or (4 = e.IntegrationEventStateId and e.ModifiedDateUtc >= dateadd(minute, -5, getutcdate()))

我得到了这个执行计划(请注意确实使用了索引):

I get this execution plan (note the index DOES get used):

这两个语句之间的唯一区别是where子句中的比较顺序.谁能解释为什么我得到不同的执行计划?

The only difference between the two statements is the order of comparisons in the where clause. Can anyone please explain why I get different execution plans?

更新1-下面是完整的复制脚本

CREATE TABLE dbo.tblEvent
(
   EventId                 INT IDENTITY PRIMARY KEY,
   IntegrationEventStateId INT,
   ModifiedDateUtc         DATETIME,
   OtherCol                CHAR(1),
   index IX_tblEvent_IntegrationEventStateId_ModifiedDateUtc(IntegrationEventStateId, ModifiedDateUtc)
);

INSERT INTO dbo.tblEvent
SELECT TOP 356525 3,
                  DATEADD(SECOND, ROW_NUMBER() OVER (ORDER BY @@SPID)%63424, GETUTCDATE()),
                  'A'
FROM   sys.all_objects o1,
       sys.all_objects o2;

UPDATE STATISTICS dbo.tblEvent WITH FULLSCAN


select *
from dbo.tblEvent e 
where
    e.IntegrationEventStateId = 1
    or e.IntegrationEventStateId = 2
    or e.IntegrationEventStateId = 5
    or (e.IntegrationEventStateId = 4 and e.ModifiedDateUtc >= dateadd(minute, -5, getutcdate()))


select *
from dbo.tblEvent e
where
    1 = e.IntegrationEventStateId
    or 2 = e.IntegrationEventStateId
    or 5 = e.IntegrationEventStateId
    or (4 = e.IntegrationEventStateId and e.ModifiedDateUtc >= dateadd(minute, -5, getutcdate()))

更新2-原始表的DDL

CREATE TABLE [dbo].[tblEvent]
(
[EventId] [int] NOT NULL IDENTITY(1, 1),
[EventTypeId] [int] NOT NULL,
[ScorecardId] [int] NULL,
[ScorecardAreaId] [int] NULL,
[AreaId] [int] NULL,
[ScorecardTopicId] [int] NULL,
[TopicId] [int] NULL,
[ScorecardRequirementId] [int] NULL,
[RequirementId] [int] NULL,
[DocumentId] [int] NULL,
[FileId] [int] NULL,
[TopicTitle] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ScorecardTopicStatus] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[RequirementText] [nvarchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ScorecardRequirementStatus] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DocumentName] [nvarchar] (260) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CreatedByUserId] [int] NOT NULL,
[CreatedByUserSessionId] [int] NOT NULL,
[CreatedDateUtc] [datetime2] (4) NOT NULL CONSTRAINT [DF__tblEvent__Create__0737E4A2] DEFAULT (sysutcdatetime()),
[CreatedDateLocal] [datetime2] (4) NOT NULL CONSTRAINT [DF__tblEvent__Create__082C08DB] DEFAULT (sysdatetime()),
[ModifiedByUserId] [int] NOT NULL,
[ModifiedByUserSessionId] [int] NOT NULL,
[ModifiedDateUtc] [datetime2] (4) NOT NULL CONSTRAINT [DF__tblEvent__Modifi__09202D14] DEFAULT (sysutcdatetime()),
[ModifiedDateLocal] [datetime2] (4) NOT NULL CONSTRAINT [DF__tblEvent__Modifi__0A14514D] DEFAULT (sysdatetime()),
[IsDeleted] [bit] NOT NULL,
[RowVersion] [timestamp] NOT NULL,
[ScorecardRequirementPriority] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[AffectedUserId] [int] NULL,
[UserId] [int] NULL,
[CorrelationId] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[IntegrationEventStateId] [int] NULL,
[IntegrationEventId] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[IntegrationEventContent] [nvarchar] (max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[IntegrationEventType] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[IntegrationEventTryCount] [int] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblEvent] ADD CONSTRAINT [PK_dbo.tblEvent] PRIMARY KEY CLUSTERED ([EventId]) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblEvent_IntegrationEventStateId_ModifiedDateUtc] ON [dbo].[tblEvent] ([IntegrationEventStateId], [ModifiedDateUtc]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblEvent] ADD CONSTRAINT [FK_dbo.tblEvent_dbo.tblEventType_EventTypeId] FOREIGN KEY ([EventTypeId]) REFERENCES [dbo].[tblEventType] ([EventTypeId])
GO
ALTER TABLE [dbo].[tblEvent] ADD CONSTRAINT [FK_dbo.tblEvent_dbo.tblIntegrationEventState_IntegrationEventStateId] FOREIGN KEY ([IntegrationEventStateId]) REFERENCES [dbo].[tblIntegrationEventState] ([IntegrationEventStateId])
GO

推荐答案

这里有很多问题,但是最重要的是基数估计(CE).

There are a number of issues here, but the most important is cardinality estimation (CE).

较新的(默认")CE模型很难在没有匹配步骤的情况下尝试针对直方图计算选择性时使用谓词.

The newer ("default") CE model has a hard time with the predicates when it tries to compute selectivity against the histogram with no matching steps.

例如,初始基数估计对以下项返回1的选择性:

For example, the initial cardinality estimate returns a selectivity of 1 for:

select *
from dbo.tblEvent e
where
    1 = e.IntegrationEventStateId
    or 2 = e.IntegrationEventStateId
    or 5 = e.IntegrationEventStateId
    or (4 = e.IntegrationEventStateId and e.ModifiedDateUtc >= dateadd(minute, -5, getutcdate()))

如使用跟踪标志3604和2363所示:

as shown using trace flags 3604 and 2363:

Begin selectivity computation

Input tree:

  LogOp_Select
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)
      ScaOp_Logical x_lopOr
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
          ScaOp_Logical x_lopAnd
              ScaOp_Comp x_cmpGe
                  ScaOp_Identifier QCOL: [e].ModifiedDateUtc
                  ScaOp_Identifier COL: ConstExpr1001 
              ScaOp_Comp x_cmpEq
                  ScaOp_Identifier QCOL: [e].IntegrationEventStateId
                  ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=4)

Plan for computation:

  CSelCalcCombineFilters_ExponentialBackoff (OR)
      CSelCalcCombineFilters_ExponentialBackoff (AND)
          CSelCalcColumnInInterval
              Column: QCOL: [e].ModifiedDateUtc
          CSelCalcColumnInInterval
              Column: QCOL: [e].IntegrationEventStateId
      CSelCalcColumnInInterval
          Column: QCOL: [e].IntegrationEventStateId

Loaded histogram for column QCOL: [e].ModifiedDateUtc from stats with id 3
Loaded histogram for column QCOL: [e].IntegrationEventStateId from stats with id 2

Selectivity: 1

Stats collection generated: 

  CStCollFilter(ID=2, CARD=356525)
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)

End selectivity computation

当基于成本的优化开始并且输入树的形式略有不同时,要求CE计算更简单的谓词的选择性:

When cost-based optimization starts, and the input tree is in a slightly different form, the CE is asked to compute the selectivity of the simpler predicates:

Begin selectivity computation

Input tree:

  LogOp_Select
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)
      ScaOp_Logical x_lopOr
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)

Plan for computation:

  CSelCalcColumnInInterval
      Column: QCOL: [e].IntegrationEventStateId

Selectivity: 1

Stats collection generated: 

  CStCollFilter(ID=3, CARD=356525)
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)

End selectivity computation

这等效于:

SELECT *
FROM dbo.tblEvent AS TE 
WHERE TE.IntegrationEventStateId IN (1, 2, 5);

在两种情况下,CE均会评估100%的行将匹配,尽管没有针对值1、2或5的直方图步长(样本数据的值仅为3).试图将CSelCalcColumnInInterval计算器归咎于此,因为它似乎将{1、2、5}视为单个间隔{1:5}.

In both cases, the CE assesses that 100% of the rows will match, despite there being no histogram steps for the values 1, 2, or 5 (the sample data has values of 3 only). It is tempting to blame the CSelCalcColumnInInterval calculator for this, as it seems to treat {1, 2, 5} as a single interval {1:5}.

通常,旧式" CE在这里做得更好(更详细),因此您应该找到以下提示会产生更好的计划:

As is often the case, the "legacy" CE does a better (more detailed) job here, so you should find the following hint will produce much better plans:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

有了repro数据,这将产生一个希望的单次查找和关键查找.

With the repro data, this produces a single seek and key lookup as one would hope.

请注意,搜索执行四个搜索操作,每个不连续谓词一个.

Note that the seek performs four seeking operations, one for each disjoint predicate.

[1] Seek Keys[1]: Prefix: IntegrationEventStateId = 1
[2] Seek Keys[1]: Prefix: IntegrationEventStateId = 2
[3] Seek Keys[1]: Prefix: IntegrationEventStateId = 4, Start: ModifiedDateUtc >= dateadd(minute,(-5),getutcdate())
[4] Seek Keys[1]: Prefix: IntegrationEventStateId = 5

与原始CE相比,新CE的设计更具可预测性,并且易于维护/扩展. 传统"产品经过长时间的磨合,并在上面进行了细化.这种复杂性既有好处也有陷阱.较新的CE可能会带来回归和较低质量的估计.随着时间的流逝,这种情况应该会有所改善,但是我们还没有到那儿.我认为这里显示的行为是计算器的局限性.也许他们会解决它.

The new CE is designed to be more predictable, and easier to maintain/extend than the original CE. The "legacy" one had bits bolted on to it and refinements made over a long period of time. That complexity has benefits and pitfalls. Regressions and lower-quality estimates are somewhat expected with the newer CE. This ought to improve over time, but we are not there yet. I would view the behaviour shown here as a limitation of the calculator. Perhaps they will fix it.

请参见使用SQL优化查询计划Server 2014基数估计器.

为什么计划形状取决于文本表示形式的问题更多是附带问题.编译过程确实包含用于将谓词重写为规范化形式的逻辑(例如,规则SelPredNorm),并且两个repro查询都已成功重写为同一棵树.这样做是出于各种内部目的,包括索引和计算列匹配,并使逻辑简化更易于处理.

The question of why the plan shape depends on textual representation is more of a side issue. The compilation process does contain logic (e.g. rule SelPredNorm) to rewrite predicates into a normalized form, and both repro queries are successfully rewritten to the same tree. This is done for various internal purposes, including index and computed column matching, and to make logical simplification easier to process.

不幸的是,重写的表格仅在基于成本的优化之前使用.基于成本的优化器的输入将保留原始查询中存在的文本顺序差异.我相信这是有意的,并且这样做是为了防止意外的计划更改.人们有时以略微不同且不寻常的方式编写查询,以实现特定的计划形状.如果优化器突然开始以逻辑上多余的方式击败那些尝试,那么人们会很沮丧.可以说,这与查询存储之类的问题无关紧要,而对计划的强制执行则更为有效,但这是相对较新的创新.

Unfortunately, the rewritten form is only used prior to cost-based optimization. The input to the cost-based optimizer retains the differences in textual order present in the original query. I believe this is intentional, and done to prevent unexpected plan changes. People sometimes write queries in slightly different, and unusual, ways to achieve a particular plan shape. If the optimizer suddenly started defeating those attempts as logically redundant, people would be upset. This is arguably less of an issue with things like query store, and more efficient plan forcing, but these are relatively recent innovations.

换句话说,计划是不同的,因为人们过去依靠不同的文本来产生不同的计划,而现在更改它会带来破坏性的后果.

In other words, the plans are different because people have relied on different text producing different plans in the past, and changing that now would be too disruptive.

这篇关于为什么SQL Server执行计划取决于比较顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆