为什么 SQL Server 突然决定使用这么可怕的执行计划? [英] Why did SQL Server suddenly decide to use such a terrible execution plan?

查看:40
本文介绍了为什么 SQL Server 突然决定使用这么可怕的执行计划?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我们最近遇到了 sql server 在我们的一个较大的表(大约 175,000,000 行)上使用的查询计划的问题.表的列和索引结构已经 5 年多没有改变.

We recently had an issue with query plans sql server was using on one of our larger tables (around 175,000,000 rows). The column and index structure of the table has not changed for 5+ years.

表和索引如下所示:

create table responses (
    response_uuid uniqueidentifier not null,
    session_uuid uniqueidentifier not null,
    create_datetime datetime not null,
    create_user_uuid uniqueidentifier not null,
    update_datetime datetime not null,
    update_user_uuid uniqueidentifier not null,
    question_id int not null,
    response_data varchar(4096) null,
    question_type_id varchar(3) not null,
    question_length tinyint null,
    constraint pk_responses primary key clustered (response_uuid),
    constraint idx_responses__session_uuid__question_id unique nonclustered (session_uuid asc, question_id asc) with (fillfactor=80),
    constraint fk_responses_sessions__session_uuid foreign key(session_uuid) references dbo.sessions (session_uuid),
    constraint fk_responses_users__create_user_uuid foreign key(create_user_uuid) references dbo.users (user_uuid),
    constraint fk_responses_users__update_user_uuid foreign key(update_user_uuid) references dbo.users (user_uuid)
)

create nonclustered index idx_responses__session_uuid_fk on responses(session_uuid) with (fillfactor=80)

性能不佳的查询(约 2.5 分钟,而不是正常的 <1 秒性能)如下所示:

The query that was performing poorly (~2.5 minutes instead of the normal <1 second performance) looks like this:

SELECT 
[Extent1].[response_uuid] AS [response_uuid], 
[Extent1].[session_uuid] AS [session_uuid], 
[Extent1].[create_datetime] AS [create_datetime], 
[Extent1].[create_user_uuid] AS [create_user_uuid], 
[Extent1].[update_datetime] AS [update_datetime], 
[Extent1].[update_user_uuid] AS [update_user_uuid], 
[Extent1].[question_id] AS [question_id], 
[Extent1].[response_data] AS [response_data], 
[Extent1].[question_type_id] AS [question_type_id], 
[Extent1].[question_length] AS [question_length]
FROM [dbo].[responses] AS [Extent1]
WHERE [Extent1].[session_uuid] = @f6_p__linq__0;

(查询由实体框架生成并使用 sp_executesql 执行)

(The query is generated by entity framework and executed using sp_executesql)

性能不佳期间的执行计划如下所示:

The execution plan during the poor performance period looked like this:

数据的一些背景——运行上面的查询永远不会返回超过 400 行.换句话说,对 session_uuid 的过滤确实减少了结果集.

Some background on the data- running the query above would never return more than 400 rows. In other words, filtering on session_uuid really pares down the result set.

有关计划维护的一些背景知识 - 计划作业每周运行一次,以重建数据库的统计信息并重建表的索引.该作业运行的脚本如下所示:

Some background on scheduled maintenance- a scheduled job runs on a weekly basis to rebuild the database's statistics and rebuild the table's indexes. The job runs a script that looks like this:

alter index all on responses rebuild with (fillfactor=80)

<小时>

性能问题的解决方案是在该表上运行重建索引脚本(如上).


The resolution for the performance problem was to run the rebuild index script (above) on this table.

其他可能相关的信息花絮...自上次索引重建以来,数据分布根本没有改变.查询中没有联接.我们是一家 SAAS 商店,我们有 50 到 100 个具有完全相同架构的实时生产数据库,有些数据更多,有些数据更少,所有数据库都对它们执行相同的查询,分布在几个 sql 服务器上.

Other possibly relevant tidbits of information... The data distribution didn't change at all since the last index rebuild. There are no joins in the query. We're a SAAS shop, we have at 50 - 100 live production databases with exactly the same schema, some with more data, some with less, all with the same queries executing against them spread across a few sql servers.

问题:

会发生什么事情让 sql server 开始在这个特定的数据库中使用这个可怕的执行计划?

What could have happened that would make sql server start using this terrible execution plan in this particular database?

请记住,只需重建表上的索引即可解决问题.

Keep in mind the problem was solved by simply rebuilding the indexes on the table.

也许更好的问题是sql server 在什么情况下会停止使用索引?"

Maybe a better question is "what are the circumstances where sql server would stop using an index?"

另一种看待它的方式是为什么优化器不使用几天前重建的索引,然后在我们注意到错误的查询计划后紧急重建索引后再次开始使用它?"

Another way of looking at it is "why would the optimizer not use an index that was rebuilt a few days ago and then start using it again after doing an emergency rebuild of the index once we noticed the bad query plan?"

推荐答案

评论太长了.

原因很简单:优化器改变了最佳计划的想法.这可能是由于数据分布的细微变化(或其他原因,例如 join 键中的类型不兼容).我希望有一种工具不仅可以给出查询的执行计划,还可以显示与另一个执行计划的接近程度的阈值.或者是一种工具,可以让您存储执行计划并在同一查询开始使用不同计划时发出警报.

The reason is simple: the optimizer changes its mind on what the best plan is. This can be due to subtle changes in the distribution of the data (or other reasons, such as a type incompatibility in a join key). I wish there were a tool that not only gave the execution plan for a query but also showed thresholds for how close you are to another execution plan. Or a tool that would let you stash an execution plan and give an alert if the same query starts using a different plan.

我不止一次问过自己这个完全相同的问题.您有一个每晚运行数月的系统.它使用非常复杂的查询处理大量数据.然后,有一天,你早上进来,通常在晚上 11:00 完成的工作.仍在运行.呜呜呜!

I've asked myself this exact same question on more than one occasion. You have a system that's running nightly, for months on end. It processes lots of data using really complicated queries. Then, one day, you come in in the morning and the job that normally finishes by 11:00 p.m. is still running. Arrrggg!

我们提出的解决方案是对失败的连接使用显式join 提示.(option (merge join, hash join)).我们还开始保存所有复杂查询的执行计划,以便我们可以比较一夜之间的变化.最后,这更多的是学术兴趣而不是实际兴趣——当计划改变时,我们已经遭受了糟糕的执行计划.

The solution that we came up with was to use explicit join hints for the failed joins. (option (merge join, hash join)). We also started saving the execution plans for all our complex queries, so we could compare changes from one night to the next. In the end, this was of more academic interest than practical interest -- when the plans changed, we were already suffering from a bad execution plan.

这篇关于为什么 SQL Server 突然决定使用这么可怕的执行计划?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆