MySQL“在何处"是慢的? [英] MySQL "in-where" is slow?
问题描述
我有两个表:Posts
和 Tags
,它们存储用户发布的文章和他们附加到文章的标签.表格PostTags
用于表示文章ID和标签ID的关系.结构如下:
I have two tables: Posts
and Tags
, which stores user's posting articles and the tags they attached to the articles. A table PostTags
is used to indicate the relationship of article IDs and tag IDs. The structures are as below:
帖子:
id | title | author_id | create_time | update_time | ... #(title, author_id, create_time) is unique
标签:
id | tag_text | create_time #tag_text is unique and index
帖子标签:
id | post_id | tag_id #(post_id, tag_id) is unique
我现在使用下面的sql来获取带有相应标签的文章(使用group_concat).
I now use the following sql to get the articles with corresponding tags (using group_concat).
SELECT p.id, p.title, t.tag AS Tags FROM Posts p
LEFT JOIN Tags t on t.id IN
(SELECT tag_id FROM PostTags WHERE post_id=s.id)
GROUP BY p.id ORDER BY p.update_time DESC LIMIT 0, 10
但我发现它很慢(对于 2.5k 行文章和 600 个标签,它需要 >3s).如何提高性能?
But I find it very slow (for 2.5k rows of articles and 600 tags, it takes >3s). How can I improve the performance?
EXPLAIN 结果如下:
The EXPLAIN result is as follows:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 2569 | Using temporary; Using filesort
1 | PRIMARY | t | ALL | NULL | NULL | NULL | NULL | 616
2 | DEPENDENT SUBQUERY | PostTags | index_subquery | unique_index,tag_id,post_id |
tag_id | 4 | func | 1 | Using where
PS,我原来的sql是(用group_concat)
PS, my original sql is (with group_concat)
SELECT p.id, p.title, group_concat(DINSTINCT t.tag) AS Tags FROM Posts p
LEFT JOIN Tags t on t.id IN
(SELECT tag_id FROM PostTags WHERE post_id=s.id)
GROUP BY p.id ORDER BY p.update_time DESC LIMIT 0, 10
但是没有 group_concat 的情况是一样的.
But the case without group_concat is same.
推荐答案
IN 子查询性能不佳的典型情况是当子查询返回少量行但外部查询返回大量行以与子查询结果进行比较时.
A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.
问题在于,对于使用 IN 子查询的语句,优化器将其重写为相关子查询.[..] 如果内部查询和外部查询分别返回 M 行和 N 行,则执行时间将变为 O(M×N) 的数量级,而不是 O(M+N)不相关的子查询.
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. [..] If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.
使用另一个连接而不是子查询将是一个更好的解决方案:
Using another join instead of a subquery would be a more optimal solution:
SELECT p.id, p.title, t.tag AS Tags FROM Posts p
LEFT JOIN PostTags pt on pt.post_id = p.id
LEFT JOIN Tags t on t.id = pt.tag_id
GROUP BY p.id ORDER BY p.update_time DESC LIMIT 0, 10
这篇关于MySQL“在何处"是慢的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!