SQL IN 对性能有害吗? [英] Is SQL IN bad for performance?

查看:41
本文介绍了SQL IN 对性能有害吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似的查询:

SELECT FieldX, FieldY FROM A
WHERE FieldW IN (108, 109, 113, 138, 146, 160,
307, 314, 370, 371, 441, 454 ,457, 458, 479, 480,
485, 488, 490, 492, 519, 523, 525, 534, 539, 543,
546, 547, 550, 564, 573, 629, 642, 643, 649, 650,
651, 694, 698, 699, 761, 762, 768, 772, 773, 774,
775, 778, 784, 843, 844, 848, 851, 852, 853, 854,
855, 856, 857, 858, 859, 860, 861, 862, 863, 864,
865, 868, 869, 871, 872, 873, 891) 

有这么多选项的 IN 子句,对查询性能有影响吗?我在我的应用程序中遇到了很多超时,我相信这可能是此类问题的根源.我可以使用任何好的 SQL 提示在不删除数字的情况下优化查询吗?

Having an IN clause with so many options, is it bad for query performance? I'm experiencing many timeouts in my application, and I believe it could be a source of this kind of problem. Can I optimize the query without removing the numbers, using any good SQL Hint?

@KM 这些是不同表中的键.这是一个论坛应用程序,简单解释一下:c#从数据库中获取所有论坛并将其存储在应用程序缓存中.在 C# 调用获取这些论坛和此用户的线程的过程之前,C# 会执行一些逻辑过滤所有论坛"集合,考虑权限和一些业务逻辑.超时发生在数据库而不是应用程序本身.在查询上执行所有这些逻辑将需要大量内部联接,我不能 100% 确定我可以在过程中完成所有这些.

@KM these are keys in a different table. This is a forum application, explaining briefly: c# gets all forums from database and stores it in app cache. Before C# calls a procedure that gets the threads for these forums and for this user, c# does some logic filtering the "all forums" collection, considering permissions and some business logic. The timeout happens on database and not on application itself. Doing all this logic on the query will require a lot of inner joins and I'm not 100% sure I can do all this inside the procedure.

我使用的是 SQL Server 2000

推荐答案

使用 IN 运算符编写查询时,有几个注意事项可能会影响性能.

There are several considerations when writing a query using the IN operator that can have an affect on performance.

首先,大多数数据库通常在内部重写 IN 子句以使用 OR 逻辑连接词. 所以 col IN ('a','b','c') 重写为:(COL = 'a') OR (COL = 'b') or (COL = 'c').假设您在 col 上有索引,则两个查询的执行计划可能是等效的.

First, IN clauses are generally internally rewritten by most databases to use the OR logical connective. So col IN ('a','b','c') is rewritten to: (COL = 'a') OR (COL = 'b') or (COL = 'c'). The execution plan for both queries will likely be equivalent assuming that you have an index on col.

第二,当使用具有可变数量参数的 IN 或 OR 时,您会导致数据库在每次参数更改时都必须重新解析查询并重建执行计划.查询的执行计划可能是一个昂贵的步骤.大多数数据库使用 EXACT 查询文本作为键来缓存它们运行的​​查询的执行计划.如果您执行类似的查询,但在谓词中使用不同的参数值 - 您很可能会导致数据库花费大量时间来解析和构建执行计划.这就是为什么 绑定变量是强烈推荐以确保最佳查询性能.

Second, when using either IN or OR with a variable number of arguments, you are causing the database to have to re-parse the query and rebuild an execution plan each time the arguments change. Building the execution plan for a query can be an expensive step. Most databases cache the execution plans for the queries they run using the EXACT query text as a key. If you execute a similar query but with different argument values in the predicate - you will most likely cause the database to spend a significant amount of time parsing and building execution plans. This is why bind variables are strongly recommended as a way to ensure optimal query performance.

第三,许多数据库对它们可以执行的查询的复杂性有限制 - 这些限制之一是谓词中可以包含的逻辑连接词的数量.在您的情况下,有几个十几个值不太可能达到数据库的内置限制,但是如果您希望将数百或数千个值传递给 IN 子句 - 它肯定会发生.在这种情况下,数据库将简单地取消查询请求.

Third, many database have a limit on the complexity of queries they can execute - one of those limits is the number of logical connectives that can be included in the predicate. In your case, a few dozen values are unlikely to reach the built-in limit of the database, but if you expect to pass hundreds or thousands of value to an IN clause - it can definitely happen. In which case the database will simply cancel the query request.

第四,在谓词中包含 IN 和 OR 的查询不能总是在并行环境中以最佳方式重写. 有多种情况未应用并行服务器优化 - MSDN 有一个不错的介绍来优化并行查询.不过,通常情况下,使用 UNION ALL 运算符的查询在大多数数据库中都可以简单地并行化 - 并且在可能的情况下优先于逻辑连接词(如 OR 和 IN).

Fourth, queries that include IN and OR in the predicate cannot always be optimally rewritten in a parallel environment. There are various cases where parallel server optimization do not get applied - MSDN has a decent introduction to optimizing queries for parallelism. Generally though, queries that use the UNION ALL operator are trivially parrallelizable in most databases - and are preferred to logical connectives (like OR and IN) when possible.

这篇关于SQL IN 对性能有害吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆