优化 MySQL 自连接查询 [英] Optimize MySQL self-join query

查看:60
本文介绍了优化 MySQL 自连接查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包含重复行的 c_regs 表.我在 form_number 和 property_name 列上创建了索引.不幸的是,这个查询仍然需要很长时间才能完成,尤其是添加了 t10 和 t11 连接.有没有办法优化它?谢谢.

I have c_regs table that contains duplicate rows. I've created index on form_number and property_name columns. Unfortunately this query still taking to-o-o-o long to complete, especially with addition of t10 and t11 joins. Is there a way to optimize it? Thanks.

select 
    ifnull(x.form_datetime,'') reg_date,
    ifnull(x.property_value,'') amg_id,
    x.form_number,
    x.form_name,
    x.form_version,
    ifnull(t1.property_value,'') first_name,
    ifnull(t2.property_value,'') last_name,
    ifnull(t3.property_value,'') address, 
    ifnull(t4.property_value,'') address_2,
    ifnull(t5.property_value,'') city,
    ifnull(t6.property_value,'') state_code,
    ifnull(t7.property_value,'') zip,
    ifnull(t8.property_value,'') phone,
    ifnull(t9.property_value,'') email,
    ifnull(t10.property_value,'') registrant_type,
    t11.property_value auth_type_code
from 
    (select distinct form_datetime, form_number, form_name, form_version, property_value  from c_regs where property_name = 'field.frm_personID') as x
    inner join (select distinct * from c_regs) as t1 on t1.form_number = x.form_number and t1.property_name = 'field.frm_firstName'
    inner join (select distinct * from c_regs) as t2 on t2.form_number = x.form_number and t2.property_name = 'field.frm_lastName'
    inner join (select distinct * from c_regs) as t3 on t3.form_number = x.form_number and t3.property_name = 'field.frm_address'
    left join (select distinct * from c_regs) as t4 on t4.form_number = x.form_number and t4.property_name = 'field.frm_address2'
    inner join (select distinct * from c_regs) as t5 on t5.form_number = x.form_number and t5.property_name = 'field.frm_city'
    inner join (select distinct * from c_regs) as t6 on t6.form_number = x.form_number and t6.property_name = 'field.frm_state'
    inner join (select distinct * from c_regs) as t7 on t7.form_number = x.form_number and t7.property_name = 'field.frm_zip'
    inner join (select distinct * from c_regs) as t8 on t8.form_number = x.form_number and t8.property_name = 'field.frm_phone'
    inner join (select distinct * from c_regs) as t9 on t9.form_number = x.form_number and t9.property_name = 'field.frm_emailAddress'
    left join (select distinct * from c_regs) as t10 on t10.form_number = x.form_number and t10.property_name = 'field.frm_youAre'
    inner join (select distinct * from c_regs) as t11 on t11.form_number = x.form_number and t11.property_name = 'field.frm_authType'
;

推荐答案

你不应该一直使用 SELECT DISTINCT.请记住,如果您的选择列表中有任何唯一约束,则 DISTINCT 必然是空操作,因此可能没有必要.如果存在重复项,则 DISTINCT 成本很高,因为它会对表进行排序,以便将重复项排列在一起以进行重复数据删除.

You should not use SELECT DISTINCT all the time. Keep in mind that DISTINCT is bound to be a no-op if you have any unique constraints in your select-list, so there's probably no need. If there are duplicates, DISTINCT is costly because it sorts the table so duplicates are arranged together to be de-duped.

您也不应该对此类数据进行大量自联接.自联接中的每个子查询都在读取整个表.

You also shouldn't do lots of self-joins for this kind of data. Each of your subqueries in your self-join are reading the whole table.

SELECT form_number,
  MAX(form_datetime) AS reg_date,
  MAX(form_name) AS form_name,
  MAX(form_version) AS form_version,
  MAX(CASE property_name WHEN 'field.frm_personID' THEN property_value END) AS amg_id,
  MAX(CASE property_name WHEN 'field.frm_firstName' THEN property_value END) AS first_name,
  MAX(CASE property_name WHEN 'field.frm_lastName' THEN property_value END) AS last_name,
  MAX(CASE property_name WHEN 'field.frm_address' THEN property_value END) AS address,
  MAX(CASE property_name WHEN 'field.frm_address2' THEN property_value END) AS address_2,
  MAX(CASE property_name WHEN 'field.frm_city' THEN property_value END) AS city,
  MAX(CASE property_name WHEN 'field.frm_state' THEN property_value END) AS state_code,
  MAX(CASE property_name WHEN 'field.frm_zip' THEN property_value END) AS zip,
  MAX(CASE property_name WHEN 'field.frm_phone' THEN property_value END) AS phone,
  MAX(CASE property_name WHEN 'field.frm_emailAddress' THEN property_value END) AS email,
  MAX(CASE property_name WHEN 'field.frm_youAre' THEN property_value END) AS registrant_type,
  MAX(CASE property_name WHEN 'field.frm_authType' THEN property_value END) AS auth_type_code
FROM c_regs
GROUP BY form_number;

说明:GROUP BY 使给定 form_number 的所有行都被视为一组,结果每组有一行.

Explanation: The GROUP BY causes all rows for a given form_number to be treated as one group, and the result will have one row per group.

未在 GROUP BY 中命名的所有其他列必须在分组函数中.我选择了MAX().我假设对于表单日期时间、名称和版本,每个组应该只有一个不同的值.

All other columns that are not named in the GROUP BY must be in grouping functions. I chose MAX(). I assume there should be only one distinct value per group for the form datetime, name, and version.

对于属性,我们在 MAX() 函数中放置了一个表达式,以仅在属性具有特定值的行上返回值.在其他行上,表达式为 NULL,MAX() 将忽略该表达式.

For the properties, we put an expression inside the MAX() function to return the value only on rows where the property has a certain value. On other rows, the expression is NULL, which MAX() will ignore.

通过这种方式,您无需执行任何自连接或 DISTINCT 修饰符即可获得所需的结果.查询只扫描一次表,而且速度应该更快.

In this way, you get the result you want without having to do any self-joins or DISTINCT modifiers either. The query scans through the table just once, and it should be must faster.

这篇关于优化 MySQL 自连接查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆