性能差异:条件置于INNER JOIN与WHERE子句之间 [英] Performance difference: condition placed at INNER JOIN vs WHERE clause

查看:545
本文介绍了性能差异:条件置于INNER JOIN与WHERE子句之间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一张表 order as

id | clientid | type | amount | itemid | date
---|----------|------|--------|--------|-----------
23 | 258      | B    | 150    | 14     | 2012-04-03
24 | 258      | S    | 69     | 14     | 2012-04-03
25 | 301      | S    | 10     | 20     | 2012-04-03
26 | 327      | B    | 54     | 156    | 2012-04-04




  • clientid 是返回到 client 表的外键

  • itemid 是返回表的外键

  • type B S

  • amount 是整数

    • clientid is a foreign-key back to the client table
    • itemid is a foreign key back to an item table
    • type is only B or S
    • amount is an integer
    • 和一个表处理

      id | orderid | processed | date
      ---|---------|-----------|---------
      41 | 23      | true      | 2012-04-03
      42 | 24      | true      | 2012-04-03
      43 | 25      | false     | <NULL>
      44 | 26      | true      | 2012-04-05     
      

      我需要从 order <中获取所有行/ code>对于相同日期的相同 clientid 具有相反的类型值。请记住,类型只能具有两个值之一- B S 。在上面的示例中,这将是行 23 24

      I need to get all the rows from order that for the same clientid on the same date have opposing type values. Keep in mind type can only have one of two values - B or S. In the example above this would be rows 23 and 24.

      另一个约束是,对于已处理中的对应行必须为 true 。 orderid

      The other constraint is that the corresponding row in processed must be true for the orderid.

      到目前为止,我的查询

      SELECT c1.clientid,
             c1.date,
             c1.type,
             c1.itemid,
             c1.amount,
             c2.date,
             c2.type,
             c2.itemid,
             c2.amount
      
      FROM   order c1
      INNER JOIN order c2 ON c1.itemid    =  c2.itemid AND
                             c1.date      =  c2.date   AND
                             c1.clientid  =  c2.clientid AND
                             c1.type     <>  c2.type AND
                             c1.id        <  c2.id
      
      INNER JOIN processed p1 ON p1.orderid   =  c1.id AND
                               p1.processed =  true
      INNER JOIN processed p2 ON p2.orderid   =  c2.id AND
                               p2.processed =  true
      

      问题:保持 processed = true 作为join子句的一部分正在减慢查询速度。如果将其移至WHERE子句,则性能会更好。这引起了我的兴趣,并且 我想知道为什么

      QUESTION: Keeping the processed = true as part of the join clause is slowing the query down. If I move it to the WHERE clause then the performance is much better. This has piqued my interest and I'd like to know why.

      主键和相应的外键列被索引,而值列(已处理等)没有索引。

      The primary keys and respective foreign key columns are indexed while the value columns (value, processed etc) aren't.

      免责声明:我继承了此数据库结构,性能差异大约为6秒。

      推荐答案

      之所以会看到差异,是因为计划者将执行计划放在一起,这显然取决于查询(可以说,应该将两个查询优化为相同,这可能是一个错误)。这意味着计划者认为必须以特定的方式工作才能获得每个语句中的结果。

      The reason that you're seeing a difference is due to the execution plan that the planner is putting together, this is obviously different depending on the query (arguably, it should be optimising the 2 queries to be the same and this may be a bug). This means that the planner thinks it has to work in a particular way to get to the result in each statement.

      当您在JOIN中进行操作时,计划者可能会必须从表中选择,按 True部分过滤,然后加入结果集。我会想象这是一个大表,因此要浏览大量数据,并且不能有效地使用索引。

      When you do it within the JOIN, the planner will probably have to select from the table, filter by the "True" part, then join the result sets. I would imagine this is a large table, and therefore a lot of data to look through, and it can't use the indexes as efficiently.

      我怀疑如果您在WHERE子句中执行此操作,计划者将选择一条效率更高的路由(即基于索引的数据集或预先过滤的数据集)。

      I suspect that if you do it in a WHERE clause, the planner is choosing a route that is more efficient (ie. either index based, or pre filtered dataset).

      通过在两列上添加索引(如果不确定Postgres尚不支持包含的列和多列索引)来以最快的速度(如果不是更快的话)进行连接工作。

      You could probably make the join work as fast (if not faster) by adding an index on the two columns (not sure if included columns and multiple column indexes are supported on Postgres yet).

      In简而言之,计划者面临的问题是正在选择2条不同的路线来获得结果集,其中一条路线的效率不如另一条路线。没有完整的表格信息和EXPLAIN ANALYZE信息,我们不可能知道原因是什么。

      In short, the planner is the problem it is choosing 2 different routes to get to the result sets, and one of those is not as efficient as the other. It's impossible for us to know what the reasons are without the full table information and the EXPLAIN ANALYZE information.

      如果您想知道为什么特定查询会这样做,请提供更多信息。但是,原因是计划者选择了不同的路线。

      If you want specifics on why your specific query is doing this, you'll need to provide more information. However the reason is the planner choosing different routes.

      其他阅读材料:

      http://www.postgresql.org/docs/current/static/ explicit-joins.html

      只是略读了一下,似乎postgres规划器并没有对连接进行重新排序以对其进行优化。尝试更改语句中的联接顺序,以查看是否达到相同的性能……只是一个想法。

      Just skimmed, seems that the postgres planner doesn't re-order joins to optimise it. try changing the order of the joins in your statement to see if you then get the same performance... just a thought.

      这篇关于性能差异:条件置于INNER JOIN与WHERE子句之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆