具有复杂条件的INNER JOIN大大增加了执行时间 [英] INNER JOIN with complex condition dramatically increases the execution time

查看:56
本文介绍了具有复杂条件的INNER JOIN大大增加了执行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个表,其中几个相同的字段需要在JOIN条件下进行链接.例如.每个表中都有字段:P1,P2.我想编写以下联接查询:

I have 2 tables with several identical fields needed to be linked in JOIN condition. E.g. in each table there are fields: P1, P2. I want to write the following join query:

SELECT ... FROM Table1
   INNER JOIN
   Table2
      ON    Table1.P1 = Table2.P1
         OR Table1.P2 = Table2.P2
         OR Table1.P1 = Table2.P2
         OR Table1.P2 = Table2.P1

在我有巨大表的情况下,此请求将执行很多时间.

In the case I have huge tables this request is executing a lot of time.

我试图测试仅具有一个条件的查询请求将持续多长时间.首先,我以这种方式修改了来自P2& amp;的所有数据表.P1作为新行复制到Table1&表2.所以我的查询很简单:

I tried to test how long will be the request of a query with one condition only. First, I have modified the tables in such way all data from P2 & P1 where copied as new rows into Table1 & Table2. So my query is simple:

SELECT ... FROM Table1 INNER JOIN Table2 ON Table1.P = Table2.P

然后,结果更加令人惊讶:执行时间从数小时(第一种情况)减少到2-3秒!

The result was more then surprised: the execution time from many hours (the 1st case) was reduced to 2-3 seconds!

为什么如此不同?这是否意味着复杂的条件总是会降低性能?我该如何改善这个问题?可能对P1,P2索引会有所帮助吗?我想保留第一个数据库架构,而不要移至一个字段P.

Why is it so different? Does it mean the complex conditions are always reduce performance? How can I improve the issue? May be P1,P2 indexing will help? I want to remain the 1st DB schema and not to move to one field P.

推荐答案

查询不同的原因是由于优化器正在使用联接策略.基本上可以通过四种方式来连接两个表:

The reason the queries are different is because of the join strategies being used by the optimizer. There are basically four ways that two tables can be joined:

  1. 哈希联接":在其中一个表上创建一个哈希表,用于在第二个表中查找值.
  2. 合并联接":对键上的两个表进行排序,然后顺序读取联接的结果.
  3. 索引查找":使用索引在一个表中查找值.
  4. 嵌套循环":将每个表中的每个值与另一个表中的所有值进行比较.

(并且在这些方面有变化,例如使用索引而不是表,使用分区以及处理多个处理器.)不幸的是,在SQL Server Management Studio中,(3)和(4)都显示为嵌套的循环联接.如果仔细观察,您可以分辨出节点中参数的不同之处.

(And there are variations on these, such as using an index instead of a table, working with partitions, and handling multiple processors.) Unfortunately, in SQL Server Management Studio both (3) and (4) are shown as nested loop joins. If you look more closely, you can tell the difference from the parameters in the node.

无论如何,您的原始联接是前三个联接之一-而且进展很快.这些联接基本上只能在等联接"上使用.也就是说,当连接两个表的条​​件包括相等运算符时.

In any case, your original join is one of the first three -- and it goes fast. These joins can basically only be used on "equi-joins". That is, when the condition joining the two tables includes an equality operator.

当您从单等式切换为"in"或"or"条件集时,联接条件已从等值联接更改为非等联接.我的观察是,在这种情况下,SQL Server在优化方面做得很差(而且,公平地讲,我认为其他数据库几乎做同样的事情).您的性能损失是从良好的联接算法过渡到嵌套循环算法的结果.

When you switch from a single equality to an "in" or set of "or" conditions, the join condition has changed from an equijoin to a non-equijoin. My observation is that SQL Server does a lousy job of optimization in this case (and, to be fair, I think other databases do pretty much the same thing). Your performance hit is the hit of going from a good join algorithm to the nested loops algorithm.

未经测试,我可能会建议以下一些策略.

Without testing, I might suggest some of the following strategies.

  1. 在两个表中的P1和P2上建立索引.SQL Server甚至可以为非等参使用索引.
  2. 使用另一个解决方案中建议的联合查询.每个查询都应正确优化.
  3. 假设这些是1-1联接,您也可以将其作为一组多个联接来完成:

  1. Build an index on P1 and P2 in both tables. SQL Server might use the index even for a non-equijoin.
  2. Use the union query suggested in another solution. Each query should be correctly optimized.
  3. Assuming these are 1-1 joins, you can also do this as a set of multiple joins:

左外部联接table2 t2_11在t1.p1 = t2_11.p1上左外部联接桌子2 t2_12在t1.p1 = t2_12.p2上左外部联接桌子2 t2_21在t1.p2 = t2_21.p2上左外部联接桌子2 t2_22在t1.p2 = t2_22.p2

from table1 t1 left outer join table2 t2_11 on t1.p1 = t2_11.p1 left outer join table2 t2_12 on t1.p1 = t2_12.p2 left outer join table2 t2_21 on t1.p2 = t2_21.p2 left outer join table2 t2_22 on t1.p2 = t2_22.p2

然后在SELECT中使用case/coalesce逻辑来获取您实际想要的值.尽管这看起来可能更复杂,但它应该非常有效.

And then use case/coalesce logic in the SELECT to get the value that you actually want. Although this may look more complicated, it should be quite efficient.

这篇关于具有复杂条件的INNER JOIN大大增加了执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆