SQL:速度提高-cond1或cond2上的左联接 [英] SQL: Speed Improvement - Left Join on cond1 or cond2

查看:158
本文介绍了SQL:速度提高-cond1或cond2上的左联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SELECT DISTINCT  a.*, b.*
FROM             current_tbl a
LEFT JOIN        import_tbl  b 
                 ON ( a.user_id = b.user_id 
                   OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
                 )

  • 两个基本相同的表
  • 我无权访问表结构或数据输入(因此无法清理主键)
  • 有时user_id填充在一个而不是另一个中
  • 有时名字是相等的,有时是不相等的
    • Two tables that are basically the same
    • I don't have access to the table structure or data input (thus no cleaning up primary keys)
    • Sometimes the user_id is populated in one and not the other
    • Sometimes names are equal, sometimes they are not
    • 我发现我可以通过匹配user_id或名字/姓氏来获取最多的数据.我在名称之间使用' '以避免出现这样的情况:一个用户的名字与另一个人的姓氏相同,并且两个用户都漏掉了另一个字段(不太可能,但看起来很合理).

      I've found that I can get the most of the data by matching on user_id or the first/last names. I'm using the ' ' between the names to avoid cases where one user has the same first name as another's last name and both are missing the other field (unlikely, but plausible).

      此查询运行时间为33000毫秒,而个性化设置则分别为200毫秒.

      This query runs in 33000ms, whereas individualized they are each about 200ms.

      • 我已经迟到了,现在无法直觉
      • 我想我可以做一个UNION,并且仅按名称查询不存在user_id的名称(默认联接为user_id,如果user_id不存在,那么我想按名称联接)
      • 这里有一些免费的要点给想要帮助的人
      • I've been up late and can't think straight right now
      • I'm thinking that I could do a UNION and only query by name where a user_id does not exist (the default join is the user_id, if a user_id doesn't exist then I want to join by the name)
      • Here is some free points to anyone that wants to help

      请不要要求执行计划.

      推荐答案

      如果人们的建议不能显着提高速度,则您真正的问题可能是针对两个可能的联接条件的最佳查询计划是不同的.对于这种情况,您可能要执行两个查询并以某种方式合并结果.这可能会使您的查询变得难看得多.

      If people's suggestions don't provide a major speed increase, there is a possibility that your real problem is that the best query plan for your two possible join conditions is different. For that situation you would want to do two queries and merge results in some way. This is likely to make your query much, much uglier.

      我在这种情况下使用的一个晦涩的技巧是在UNION ALL查询的基础上执行GROUP BY.这个想法看起来像这样:

      One obscure trick that I have used for that kind of situation is to do a GROUP BY off of a UNION ALL query. The idea looks like this:

      SELECT a_field1, a_field2, ...
        MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ...
      FROM (
            SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
            FROM current_tbl a
              LEFT JOIN import_tbl b
                ON a.user_id = b.user_id
          UNION ALL
            SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
            FROM current_tbl a
              LEFT JOIN import_tbl b
                ON a.f_name = b.f_name AND a.l_name = b.l_name
        )
      GROUP BY a_field1, a_field2, ...
      

      现在,数据库可以使用最有效的计划来执行两个联接中的每个联接.

      And now the database can do each of the two joins using the most efficient plan.

      (这种方法有一个警告.如果current_tbl中的行与import_tbl中的多行连接,那么您将以非常奇怪的方式合并数据.)

      (Warning of a drawback in this approach. If a row in current_tbl joins to multiple rows in import_tbl, then you'll wind up merging data in a very odd way.)

      偶然的随机性能提示.除非您有理由相信可能存在重复的行,否则请避免使用DISTINCT.它会强制使用隐式的GROUP BY,这可能会很昂贵.

      Incidental random performance tip. Unless you have reason to believe that there are potential duplicate rows, avoid DISTINCT. It forces an implicit GROUP BY, which can be expensive.

      这篇关于SQL:速度提高-cond1或cond2上的左联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆