为什么MySQL(InnoDB)的性能有很大的变化? [英] Why there's a lot of variation in MySQL (InnoDB) performance?

查看:94
本文介绍了为什么MySQL(InnoDB)的性能有很大的变化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始调查为什么Django管理员中的某些搜索真的很慢(见此处)。进一步挖掘我发现MySQL(5.1,InnoDB表)性能从一个查询到另一个类似的很多。例如:

I started to investigate why some searches in the Django admin where really slow (see here). Digging further I found that MySQL (5.1, InnoDB tables) performance vary a lot from one query to another one similar. For example:

这个查询(在4个字段中查找'c','d'和'e',2个相关)由Django生成需要89 ms并返回3093 rows:

This query (looking for 'c', 'd' and 'e' in 4 fields, 2 related) generated by Django take 89 ms and return 3093 rows:

SELECT DISTINCT `donnees_artiste`.`id`
    FROM `donnees_artiste`
LEFT OUTER JOIN `donnees_artiste_evenements`
    ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement`
    ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`)
LEFT OUTER JOIN `donnees_artiste_evenements` T4
    ON (`donnees_artiste`.`id` = T4.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement` T5
    ON (T4.`evenement_id` = T5.`id`)
LEFT OUTER JOIN `donnees_artiste_evenements` T6
    ON (`donnees_artiste`.`id` = T6.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement` T7
    ON (T6.`evenement_id` = T7.`id`)

WHERE (
    (`donnees_artiste`.`nom` LIKE '%c%'
  OR `donnees_artiste`.`prenom` LIKE '%c%'
  OR `donnees_evenement`.`cote` LIKE '%c%'
  OR `donnees_evenement`.`titre` LIKE '%c%' )
AND (`donnees_artiste`.`nom` LIKE '%d%'
  OR `donnees_artiste`.`prenom` LIKE '%d%'
  OR T5.`cote` LIKE '%d%'
  OR T5.`titre` LIKE '%d%' )
AND (`donnees_artiste`.`nom` LIKE '%e%'
  OR `donnees_artiste`.`prenom` LIKE '%e%'
  OR T7.`cote` LIKE '%e%'
  OR T7.`titre` LIKE '%e%' )
);

如果我用'k'替换'e',所以它大体上是相同的查询,

If I replace the 'e' by a 'k' so it's mostly the same query, it take 8720 ms (100x increase) and return 931 rows.

SELECT DISTINCT `donnees_artiste`.`id`
    FROM `donnees_artiste`
LEFT OUTER JOIN `donnees_artiste_evenements`
    ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement`
    ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`)
LEFT OUTER JOIN `donnees_artiste_evenements` T4
    ON (`donnees_artiste`.`id` = T4.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement` T5
    ON (T4.`evenement_id` = T5.`id`)
LEFT OUTER JOIN `donnees_artiste_evenements` T6
    ON (`donnees_artiste`.`id` = T6.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement` T7
    ON (T6.`evenement_id` = T7.`id`)

WHERE (
    (`donnees_artiste`.`nom` LIKE '%c%'
  OR `donnees_artiste`.`prenom` LIKE '%c%'
  OR `donnees_evenement`.`cote` LIKE '%c%'
  OR `donnees_evenement`.`titre` LIKE '%c%' )
AND (`donnees_artiste`.`nom` LIKE '%d%'
  OR `donnees_artiste`.`prenom` LIKE '%d%'
  OR T5.`cote` LIKE '%d%'
  OR T5.`titre` LIKE '%d%' )
AND (`donnees_artiste`.`nom` LIKE '%k%'
  OR `donnees_artiste`.`prenom` LIKE '%k%'
  OR T7.`cote` LIKE '%k%'
  OR T7.`titre` LIKE '%k%' )
);

这两个查询给出相同的 EXPLAIN ,所以没有线索。

Both of these query give the same EXPLAIN, so no clue there.

ID  SELECT_TYPE     TABLE   TYPE    POSSIBLE_KEYS   KEY     KEY_LEN     REF     ROWS    EXTRA
1   SIMPLE  donnees_artiste     ALL     None    None    None    None    4368    Using temporary; Using filesort
1   SIMPLE  donnees_artiste_evenements  ref     artiste_id,donnees_artiste_evenements_eb99df11  artiste_id  4   mmac.donnees_artiste.id     1   Using index; Distinct
1   SIMPLE  donnees_evenement   eq_ref  PRIMARY,donnees_evenements_id_index     PRIMARY     4   mmac.donnees_artiste_evenements.evenement_id    1   Using where; Distinct
1   SIMPLE  T4  ref     artiste_id,donnees_artiste_evenements_eb99df11  artiste_id  4   mmac.donnees_artiste.id     1   Using index; Distinct
1   SIMPLE  T5  eq_ref  PRIMARY,donnees_evenements_id_index     PRIMARY     4   mmac.T4.evenement_id    1   Using where; Distinct
1   SIMPLE  T6  ref     artiste_id,donnees_artiste_evenements_eb99df11  artiste_id  4   mmac.donnees_artiste.id     1   Using index; Distinct
1   SIMPLE  T7  eq_ref  PRIMARY,donnees_evenements_id_index     PRIMARY     4   mmac.T6.evenement_id    1   Using where; Distinct

此外,如果我在 COUNT

SELECT COUNT(DISTINCT `donnees_artiste`.`id`)
    FROM `donnees_artiste`
LEFT OUTER JOIN `donnees_artiste_evenements`
    ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement`
    ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`)
LEFT OUTER JOIN `donnees_artiste_evenements` T4
    ON (`donnees_artiste`.`id` = T4.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement` T5
    ON (T4.`evenement_id` = T5.`id`)
LEFT OUTER JOIN `donnees_artiste_evenements` T6
    ON (`donnees_artiste`.`id` = T6.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement` T7
    ON (T6.`evenement_id` = T7.`id`)

WHERE (
    (`donnees_artiste`.`nom` LIKE '%c%'
  OR `donnees_artiste`.`prenom` LIKE '%c%'
  OR `donnees_evenement`.`cote` LIKE '%c%'
  OR `donnees_evenement`.`titre` LIKE '%c%' )
AND (`donnees_artiste`.`nom` LIKE '%d%'
  OR `donnees_artiste`.`prenom` LIKE '%d%'
  OR T5.`cote` LIKE '%d%'
  OR T5.`titre` LIKE '%d%' )
AND (`donnees_artiste`.`nom` LIKE '%e%'
  OR `donnees_artiste`.`prenom` LIKE '%e%'
  OR T7.`cote` LIKE '%e%'
  OR T7.`titre` LIKE '%e%' )
);

我的 innodb_buffer_pool_size 设置为高。我有所有相关字段和主键上的索引,我已经优化了我的表。

My innodb_buffer_pool_size is set high. I have indexes on all relevant fields and on primary keys and I already optimized my tables.

那么,为什么第一个查询是这么快,其他2个这么慢?这3个查询只是例子。很多时候,我只是更改或删除一个字符从查询,它在查询时间有很大的不同。但我看不到任何模式。

So, why the first query is so fast and the 2 others so slow? These 3 queries are just examples. Many time I'm just changing or removing one character from a query and it made big difference on the query time. But I can't see any pattern.

UPDATE

性能问题绝对来自Django如何生成这些查询。所有这些冗余 LEFT OUTER JOIN 链接在一起杀死了性能。目前还不完全清楚,如果它是Django SQL生成器中的一个错误,如何为搜索字段构建查询的错误,或者如果所有的工作正如Django开发人员所预期的。我仍然在调查,但至少有一个奇怪的事情在Django行为...

The performance problem definitely come from how Django generate these queries. All these redundant LEFT OUTER JOIN chained together kill the performance. At the moment it's not totally clear to me if it's a bug in the Django SQL generator, a bug in how the query is built for the search field or if all that work as expected by the Django developers. I'm still investigating but there's, at least, one strange thing in the Django behavior...

如果我运行这个查询(这不一定等同于第二一个,但不远),结果相当快(161毫秒,没有缓存):

If I run this query (that is not necessarily equivalent to the second one, but not far) the results come pretty fast (161 ms, no cache):

SELECT DISTINCT `donnees_artiste`.`id`
    FROM `donnees_artiste`
LEFT OUTER JOIN `donnees_artiste_evenements`
    ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`)
LEFT OUTER JOIN `donnees_evenement`
    ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`)

WHERE (
    (`donnees_artiste`.`nom` LIKE '%c%'
  OR `donnees_artiste`.`prenom` LIKE '%c%'
  OR `donnees_evenement`.`cote` LIKE '%c%'
  OR `donnees_evenement`.`titre` LIKE '%c%' )
AND (`donnees_artiste`.`nom` LIKE '%d%'
  OR `donnees_artiste`.`prenom` LIKE '%d%'
  OR `donnees_evenement`.`cote` LIKE '%d%'
  OR `donnees_evenement`.`titre` LIKE '%d%' )
AND (`donnees_artiste`.`nom` LIKE '%k%'
  OR `donnees_artiste`.`prenom` LIKE '%k%'
  OR `donnees_evenement`.`cote` LIKE '%k%'
  OR `donnees_evenement`.`titre` LIKE '%k%' )
);

第二个更新

最后,这不是Django中的错误,我很确定这是所需的行为。这个想法是,在多项搜索上,下一项的搜索是在上一项的子集返回上完成的,所以对于相关字段,所有项不必在同一行中具有匹配。为此,数据库必须创建临时表与每个子集并扫描它。这解释了为什么会有很多变化,因为如果第一项只匹配几行,临时表将很小,并且后续术语的搜索将是快速的(因为它们将在一个小表上完成)。这两个查询之间的区别是微妙的,但Django查询可以返回更多的匹配一般。

Finally that's not a bug in Django, I'm pretty sure it's the desired behavior. The idea is, on a multi-terms search, the search of the next term is done on the subset return by the previous term so, for the related fields, all the terms don't have to be in the same row to have a match. For this, the DB have to create temporary table with each subsets and scan it. That explain why there can be a lot a variation because if the first term match only a few rows, the temporary table will be small, and search of subsequent term will be fast (because they will be done on a small table). The difference between the two queries is subtle but the Django query can return more matches in general.

推荐答案

在大多数情况下, e 位于扫描的字符串的开头和第一个搜索的字符串中,允许缩短OR条件,而匹配 k 发生在最后条件和字符串末尾的某处。由于 k 的行数显着减少,因此应更多字符串进行完整扫描,而无任何匹配。

I think, the answer is that e in most cases is located at the beginning of the scanned strings and in the first searched string, allowing to short cirquit the OR conditions, while matches for k happen in the last conditions and somewhere in the end of the string. And since there are significantly less rows with k, more strings should be full scanned without any matches.

这篇关于为什么MySQL(InnoDB)的性能有很大的变化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆