避免在SQL Server上进行急切假脱机操作的方法 [英] Ways to avoid eager spool operations on SQL Server

查看:167
本文介绍了避免在SQL Server上进行急切假脱机操作的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个ETL流程,其中涉及一个存储过程,该存储过程大量使用了SELECT INTO语句(最少记录,因此速度更快,因为它们生成的日志流量较少).在一个特定的存储过程中进行的那批工作中,一些最昂贵的操作是急切的假脱机,它们似乎只是缓冲查询结果,然后将其复制到刚制成的表中.

急切的线轴上的MSDN文档非常稀疏.有没有人对这些是否真的必要(以及在什么情况下)有更深入的了解?我有一些理论可能有意义,也可能没有意义,但无法成功地从查询中消除这些理论.

.sqlplan文件很大(160kb),所以我认为将它们直接发布到论坛上可能不合理.

因此,以下是一些适用于特定答案的理论:

  • 查询使用一些UDF进行数据转换,例如解析格式化的日期.这种数据转换是否需要在构造表之前使用急切的线轴为表分配明智的类型(例如varchar长度)?
  • 作为上述问题的扩展,是否有人对查询中驱动该操作的原因有更深入的了解?

解决方案

我对假脱机的理解是,这对您的执行计划会产生一些影响.是的,它占用了您大量的查询成本,但是实际上,这是SQL Server自动进行的一项优化,从而可以避免进行昂贵的重新扫描.如果要避免假脱机,则位于其上的执行树的成本将会上升,几乎可以肯定,整个查询的成本都会增加.我对什么可能导致数据库的查询优化器以这种方式解析执行没有什么特别的了解,尤其是在没有看到SQL代码的情况下,但是您最好还是信任它的行为.

但是,这并不意味着不能优化执行计划,具体取决于您要做什么以及源数据的不稳定程度.在执行SELECT INTO时,通常会在执行计划中看到假脱机项目,这可能与读取隔离有关.如果适合您的特定情况,则可以尝试将事务隔离级别降低到成本更低的程度,和/或使用NOLOCK提示.我发现在复杂的性能关键型查询中,NOLOCK如果安全且适合您的数据,即使似乎没有任何理由,它也可以极大地提高查询的执行速度.

在这种情况下,如果尝试使用READ UNCOMMITTEDNOLOCK提示,则可以消除某些假脱机. (显然,如果这可能会使您处于不一致状态,但是每个人的数据隔离要求都不同,则您不希望这样做.) TOP运算符和OR运算符有时可能会导致假脱机,但我怀疑您是否正在ETL流程中进行任何此类操作...

您说对了,您的UDF也可能是罪魁祸首.如果您只使用每个UDF一次,尝试将它们内联以查看您是否获得了较大的性能优势,这将是一个有趣的实验. (而且,如果您无法找到一种将它们与查询内联地编写的方法,则可能就是它们可能导致假脱机的原因.)

我要看的最后一件事是,如果您要进行任何可以重新排序的联接,请尝试使用提示来强制联接顺序以您所知道的最有选择性的顺序发生.这是可以达到的,但是如果您已经坚持进行优化,那么尝试它也没有什么害处.

I have an ETL process that involves a stored procedure that makes heavy use of SELECT INTO statements (minimally logged and therefore faster as they generate less log traffic). Of the batch of work that takes place in one particular stored the stored procedure several of the most expensive operations are eager spools that appear to just buffer the query results and then copy them into the table just being made.

The MSDN documentation on eager spools is quite sparse. Does anyone have a deeper insight into whether these are really necessary (and under what circumstances)? I have a few theories that may or may not make sense, but no success in eliminating these from the queries.

The .sqlplan files are quite large (160kb) so I guess it's probably not reasonable to post them directly to a forum.

So, here are some theories that may be amenable to specific answers:

  • The query uses some UDFs for data transformation, such as parsing formatted dates. Does this data transformation necessitate the use of eager spools to allocate sensible types (e.g. varchar lengths) to the table before it constructs it?
  • As an extension of the question above, does anyone have a deeper view of what does or does not drive this operation in a query?

解决方案

My understanding of spooling is that it's a bit of a red herring on your execution plan. Yes, it accounts for a lot of your query cost, but it's actually an optimization that SQL Server undertakes automatically so that it can avoid costly rescanning. If you were to avoid spooling, the cost of the execution tree it sits on will go up and almost certainly the cost of the whole query would increase. I don't have any particular insight into what in particular might cause the database's query optimizer to parse the execution that way, especially without seeing the SQL code, but you're probably better off trusting its behavior.

However, that doesn't mean your execution plan can't be optimized, depending on exactly what you're up to and how volatile your source data is. When you're doing a SELECT INTO, you'll often see spooling items on your execution plan, and it can be related to read isolation. If it's appropriate for your particular situation, you might try just lowering the transaction isolation level to something less costly, and/or using the NOLOCK hint. I've found in complicated performance-critical queries that NOLOCK, if safe and appropriate for your data, can vastly increase the speed of query execution even when there doesn't seem to be any reason it should.

In this situation, if you try READ UNCOMMITTED or the NOLOCK hint, you may be able to eliminate some of the Spools. (Obviously you don't want to do this if it's likely to land you in an inconsistent state, but everyone's data isolation requirements are different). The TOP operator and the OR operator can occasionally cause spooling, but I doubt you're doing any of those in an ETL process...

You're right in saying that your UDFs could also be the culprit. If you're only using each UDF once, it would be an interesting experiment to try putting them inline to see if you get a large performance benefit. (And if you can't figure out a way to write them inline with the query, that's probably why they might be causing spooling).

One last thing I would look at is that, if you're doing any joins that can be re-ordered, try using a hint to force the join order to happen in what you know to be the most selective order. That's a bit of a reach but it doesn't hurt to try it if you're already stuck optimizing.

这篇关于避免在SQL Server上进行急切假脱机操作的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆