优化一个奇怪的 MySQL 查询 [英] Optimizing a strange MySQL Query

查看:61
本文介绍了优化一个奇怪的 MySQL 查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人能帮忙解决这个问题.我有一个查询,它从 PHP 应用程序中提取数据并将其转换为用于 Ruby on Rails 应用程序的视图.PHP 应用程序的表是一个 E-A-V 样式的表,具有以下业务规则:

给定字段:名字、姓氏、电子邮件地址、电话号码和手机运营商:

  • 每个属性都定义了两个自定义字段:一个是必需的,一个不是必需的.客户端可以使用任何一种,不同的客户端根据自己的规则使用不同的(例如,客户端 A 可能不关心名字和姓氏,但客户端 B 可能)
  • RoR 应用必须将每对属性"视为一个属性.

现在,这是查询.问题是它运行良好,大约有 11,000 条记录.然而,真实的数据库有超过40,000个,查询速度极慢,大约需要125秒才能运行,这从业务角度来看是完全不可接受的.我们绝对需要拉取这些数据,并且需要与现有系统进行交互.

UserID 部分是伪造与 Rails 表相关的 Rails 风格的外键.我是 SQL Server 人员,而不是 MySQL 人员,所以也许有人可以指出如何改进此查询?他们(企业)要求加快速度,但我不知道如何加快速度,因为需要各种 group_concat 和 ifnull 调用,因为我需要每个客户端的每个字段,然后必须合并数据.

select `ls`.`subscriberid` AS `id`,left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) AS`user_id`,ifnull(min((case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end)),_utf8'') AS `first_name`,ifnull(min((case when (`s`.`fieldid` in (3,36)) then `s`.`data` else NULL end)),_utf8'') AS `last_name`,ifnull(`ls`.`emailaddress`,_utf8'') AS `email_address`,ifnull(group_concat((case when (`s`.`fieldid` = 81) then `s`.`data` when (`s`.`fieldid` = 154) then `s`.`data` else NULL end)分隔符 ''),_utf8'') AS `mobile_phone`,ifnull(group_concat((case when (`s`.`fieldid` = 100) then `s`.`data` else NULL end) separator ','),_utf8'') AS `sms_only`,ifnull(group_concat((case when (`s`.`fieldid` = 34) then `s`.`data` else NULL end) separator ','),_utf8'') AS `mobile_carrier`from ((`list_subscribers``ls`加入 `lists` `l` on((`ls`.`listid` = `l`.`listid`)))left join `subscribers_data` `s` on((`ls`.`subscriberid` = `s`.`subscriberid`)))where (left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) regexp _utf8'[[:digit:]]+')按`ls`.`subscriberid`、`l`.`name`、`ls`.`emailaddress`分组

编辑我删除了正则表达式,这将查询速度提高到大约 20 秒,而不是将近 120 秒.如果我可以在那时删除该组,它会更快,但我不能因为删除它会导致它为每个字段复制带有空白数据的行,而不是聚合它们.例如:

通过分组

<前>id user_id first_name last_name email_address mobile_phone sms_only mobile_carrier1 1 约翰·多伊 jdoe@example.com 5551234567 0 冲刺

无分组

<前>id user_id first_name last_name email_address mobile_phone sms_only mobile_carrier1 1 约翰 jdoe@xample.com1 1 Doe jdoe@example.com1 1 jdoe@example.com1 1 jdoe@example.com 5551234567

等等.我们需要的是第一个结果.

编辑 #2

查询似乎仍然需要很长时间,但今天早些时候它在生产数据库上只运行了大约 20 秒.在不改变任何事情的情况下,相同的查询现在再次花费 60 多秒.这仍然是不可接受的..关于如何改进这一点还有其他想法吗?

解决方案

毫无疑问,这是我见过的第二个最可怕的 SQL 查询 :-)

我的建议是用存储要求换取速度.当您发现查询有很多每行函数(ifnullcase 等)时,这是一个常用的技巧.随着表格变大,这些每行函数永远不会很好地扩展.

在表中创建新字段,用于保存您要提取的值,然后在插入/更新(使用触发器)而不是选择时计算这些值.这在技术上不会破坏 3NF,因为触发器保证了列之间的数据一致性.

绝大多数数据库表的读取频率远高于写入频率,因此这将分摊多次选择的计算成本.此外,几乎所有报告的数据库问题都是速度问题,而不是存储问题.

我的意思的一个例子.您可以替换:

case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end

与:

`s`.`data_2_35`

在您的查询中,如果您的插入/更新触发器只是根据 的值将 data_2_35 列设置为 dataNULL字段标识.然后你索引 data_2_35 并且,瞧,以一点存储为代价立即提高速度.

这个技巧可以对五个 case 子句、left/regexp 位和裸"ifnull 函数以及(包含 mingroup_concatifnull 函数可能更难做到.

Hoping someone can help with this. I have a query that pulls data from a PHP application and turns it into a view for use in a Ruby on Rails application. The PHP app's table is an E-A-V style table, with the following business rules:

Given fields: First Name, Last Name, Email Address, Phone Number and Mobile Phone Carrier:

  • Each property has two custom fields defined: one being required, one being not required. Clients can use either one, and different clients use different ones based on their own rules (e.g. Client A may not care about First and Last Name, but client B might)
  • The RoR app must treat each "pair" of properties as only a single property.

Now, here is the query. The problem is it runs beautifully with around 11,000 records. However, the real database has over 40,000 and the query is extremely slow, taking roughly 125 seconds to run which is totally unacceptable from a business perspective. It's absolutely required that we pull this data, and we need to interface with the existing system.

The UserID part is to fake out a Rails-esque foreign key which relates to a Rails table. I'm a SQL Server guy, not a MySQL guy, so maybe someone can point out how to improve this query? They (the business) demand that it be sped up but I'm not sure how to since the various group_concat and ifnull calls are required due to the fact that I need every field for every client and then have to combine the data.

select `ls`.`subscriberid` AS `id`,left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) AS `user_id`,
ifnull(min((case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end)),_utf8'') AS `first_name`,
ifnull(min((case when (`s`.`fieldid` in (3,36)) then `s`.`data` else NULL end)),_utf8'') AS `last_name`,
ifnull(`ls`.`emailaddress`,_utf8'') AS `email_address`,
ifnull(group_concat((case when (`s`.`fieldid` = 81) then `s`.`data` when (`s`.`fieldid` = 154) then `s`.`data` else NULL end) separator ''),_utf8'') AS `mobile_phone`,
ifnull(group_concat((case when (`s`.`fieldid` = 100) then `s`.`data` else NULL end) separator ','),_utf8'') AS `sms_only`,
ifnull(group_concat((case when (`s`.`fieldid` = 34) then `s`.`data` else NULL end) separator ','),_utf8'') AS `mobile_carrier` 
from ((`list_subscribers` `ls` 
    join `lists` `l` on((`ls`.`listid` = `l`.`listid`)))
    left join `subscribers_data` `s` on((`ls`.`subscriberid` = `s`.`subscriberid`)))  
where (left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) regexp _utf8'[[:digit:]]+') 
group by `ls`.`subscriberid`,`l`.`name`,`ls`.`emailaddress`

EDIT I removed the regexp and that sped the query up to about 20 seconds, instead of nearly 120 seconds. If I could remove the group by then it would be faster, but I cannot as removing this causes it to duplicate rows with blank data for each field, instead of aggregating them. For instance:

With group by

id     user_id     first_name     last_name     email_address     mobile_phone     sms_only     mobile_carrier
1         1          John           Doe        jdoe@example.com    5551234567       0          Sprint

Without group by

id      user_id      first_name      last_name      email_address      mobile_phone      sms_only      mobile_carrier
1       1            John                           jdoe@xample.com
1       1                             Doe           jdoe@example.com
1       1                                           jdoe@example.com
1       1                                           jdoe@example.com   5551234567

And so on. What we need is the first result.

EDIT #2

The query still seems to take a long time, but earlier today it was running in only about 20 seconds on the production database. Without changing a thing, the same query is now once again taking over 60 seconds. This is still unacceptable.. any other ideas on how to improve this?

解决方案

That is, without a doubt, the second most hideous SQL query I have ever laid my eyes on :-)

My advice is to trade storage requirements for speed. This is a common trick used when you find your queries have a lot of per-row functions (ifnull, case and so forth). These per-row functions never scale very well as the table becomes larger.

Create new fields in the table which will hold the values you want to extract and then calculate those values on insert/update (with a trigger) rather than select. This doesn't technically break 3NF since the triggers guarantee data consistency between columns.

The vast majority of database tables are read far more often than they're written so this will amortise the cost of the calculation across many selects. In addition, just about every reported problem with databases is one of speed, not storage.

An example of what I mean. You can replace:

case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end

with:

`s`.`data_2_35`

in your query if your insert/update trigger simply sets the data_2_35 column to data or NULL depending on the value of fieldid. Then you index data_2_35 and, voila, instant speed improvement at the cost of a little storage.

This trick can be done to the five case clauses, the left/regexp bit and the "naked" ifnull function as well (the ifnull functions containing min and group_concat may be harder to do).

这篇关于优化一个奇怪的 MySQL 查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆