可以用大量数据更快地执行[MySQL] [英] can it be executed faster with big amount of data [MySQL]
问题描述
有什么方法可以优化下一个查询:
EXPLAIN EXTENDED SELECT keyword_id, ck.keyword, COUNT( article_id ) AS cnt
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id )
WHERE keyword_id
IN (
SELECT keyword_id
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id )
WHERE article_id
IN (
SELECT article_id
FROM career_article_keyword
WHERE keyword_id =9
)
AND keyword_id <>9
)
GROUP BY keyword_id
ORDER BY cnt DESC
如果我有特定的keyword_id(CURRENT_KID),则这里的主要任务是我需要找到与CURRENT_KID一起属于任何文章的所有关键字,并根据这些关键字的使用量对结果进行排序
表定义为:
mysql> show create table career_article_keyword;
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_article_keyword | CREATE TABLE `career_article_keyword` (
`article_id` int(11) unsigned NOT NULL,
`keyword_id` int(11) NOT NULL,
UNIQUE KEY `article_id` (`article_id`,`keyword_id`),
CONSTRAINT `career_article_keyword_ibfk_1` FOREIGN KEY (`article_id`) REFERENCES `career` (`menu_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show create table career_keywords;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_keywords | CREATE TABLE `career_keywords` (
`keyword_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(250) NOT NULL,
PRIMARY KEY (`keyword_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
解释"的输出使我感到恐惧
在大数据上,此查询可以杀死所有内容:)我可以某种方式使其速度更快吗?
谢谢.
看着您的EXPLAIN
输出,我担心您对子查询的使用导致索引的使用不是最理想的.我感觉(没有任何理由-在这个方面,我很可能是错的),使用JOIN
进行重写可能会导致更优化的查询.
要做到这一点,我们需要了解您的查询打算做什么.如果您的问题清楚了,那会有所帮助,但是经过一番苦思冥想之后,我决定您的查询试图获取出现在任何包含某些给定关键字的文章中的所有其他关键字的列表,以及一个计数这些关键字出现的所有文章中的.
现在让我们分阶段重建查询:
-
获取"包含某些给定关键字的任何文章"(不必担心重复):
SELECT ca2.article_id FROM career_article_keyword AS ca2 WHERE ca2.keyword_id = 9;
-
获取" [以上]中出现的所有其他关键字"
SELECT ca1.keyword_id FROM career_article_keyword AS ca1 JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ca1.keyword_id;
-
获取" [以上],以及出现这些关键字的所有文章的数量"
SELECT ca1.keyword_id, COUNT(DISTINCT ca0.article_id) AS cnt FROM career_article_keyword AS ca0 JOIN career_article_keyword AS ca1 USING (keyword_id) JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ca1.keyword_id ORDER BY cnt DESC;
-
最后,我们想将
career_keyword
表中的匹配关键字本身添加到输出中:SELECT ck.keyword_id, ck.keyword, COUNT(DISTINCT ca0.article_id) AS cnt FROM career_keywords AS ck JOIN career_article_keyword AS ca0 USING (keyword_id) JOIN career_article_keyword AS ca1 USING (keyword_id) JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ck.keyword_id -- equal to ca1.keyword_id due to join conditions ORDER BY cnt DESC;
一个显而易见的事情是,原始查询两次引用了career_keywords
,而这个重写后的查询仅引用了该表一次.仅此一项就可以解释性能差异-尝试删除对其的第二个引用(即,它出现在第一个子查询中的位置),因为在那里完全多余.
回顾此查询,我们可以看到在以下列上正在执行联接:
-
career_keywords.keyword_id
此表定义了
PRIMARY KEY (`keyword_id`)
,因此有一个很好的索引可用于此联接. -
career_article_keyword.article_id
此表定义了
UNIQUE KEY `article_id` (`article_id`,`keyword_id`)
,并且由于article_id
是该索引中最左边的列,因此有一个很好的索引可用于此联接. -
career_article_keyword.keyword_id
没有可用于此连接的索引:此表中定义的唯一索引在
keyword_id
的左侧还有另一列article_id
-因此,如果没有第一个MySQL,MySQL将无法在索引中找到keyword_id
条目了解article_id
.我建议您创建一个新索引,该索引的最左列为keyword_id
.(同样可以通过查看您的原始查询来确定对索引的需求,您的两个最外面的查询在该列上执行联接.)
ck JOIN ca0
中的ca1 JOIN ca2
中的ck JOIN ca0
和ca0 JOIN ca1
中的is there any way how to optimize next query:
EXPLAIN EXTENDED SELECT keyword_id, ck.keyword, COUNT( article_id ) AS cnt
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id )
WHERE keyword_id
IN (
SELECT keyword_id
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id )
WHERE article_id
IN (
SELECT article_id
FROM career_article_keyword
WHERE keyword_id =9
)
AND keyword_id <>9
)
GROUP BY keyword_id
ORDER BY cnt DESC
The main task here if I have particular keyword_id (CURRENT_KID) i need to find all keywords which was ever belongs to any article together with CURRENT_KID, and sort result based on quantity of usage these keywords
tables defined as:
mysql> show create table career_article_keyword;
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_article_keyword | CREATE TABLE `career_article_keyword` (
`article_id` int(11) unsigned NOT NULL,
`keyword_id` int(11) NOT NULL,
UNIQUE KEY `article_id` (`article_id`,`keyword_id`),
CONSTRAINT `career_article_keyword_ibfk_1` FOREIGN KEY (`article_id`) REFERENCES `career` (`menu_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show create table career_keywords;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_keywords | CREATE TABLE `career_keywords` (
`keyword_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(250) NOT NULL,
PRIMARY KEY (`keyword_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
output of "explain" is scared me
on big data this query can kill everything :) can i make it faster somehow ?
thanks.
Looking at your EXPLAIN
output, I was concerned that your use of subqueries had resulted in a suboptimal use of indexes. I felt (without any justification - and on this I may very well be wrong) that rewriting using JOIN
might lead to a more optimised query.
To do that, we need to understand what it is your query is intended to do. It would have helped if your question had articulated it, but after a little head-scratching I decided your query was trying to fetch a list of all other keywords that appear in any article that contains some given keyword, together with a count of all articles in which those keywords appear.
Now let's rebuild the query in stages:
Fetch "any article that contains some given keyword" (not worrying about duplicates):
SELECT ca2.article_id FROM career_article_keyword AS ca2 WHERE ca2.keyword_id = 9;
Fetch "all other keywords that appear in [the above]"
SELECT ca1.keyword_id FROM career_article_keyword AS ca1 JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ca1.keyword_id;
Fetch "[the above], together with a count of all articles in which those keywords appear"
SELECT ca1.keyword_id, COUNT(DISTINCT ca0.article_id) AS cnt FROM career_article_keyword AS ca0 JOIN career_article_keyword AS ca1 USING (keyword_id) JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ca1.keyword_id ORDER BY cnt DESC;
Finally, we want to add to the output the matching keyword itself from the
career_keyword
table:SELECT ck.keyword_id, ck.keyword, COUNT(DISTINCT ca0.article_id) AS cnt FROM career_keywords AS ck JOIN career_article_keyword AS ca0 USING (keyword_id) JOIN career_article_keyword AS ca1 USING (keyword_id) JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ck.keyword_id -- equal to ca1.keyword_id due to join conditions ORDER BY cnt DESC;
One thing that is immediately clear is that your original query referenced career_keywords
twice, whereas this rewritten query references that table only once; this alone might explain the performance difference - try removing the second reference to it (i.e. where it appears in your first subquery), as it's entirely redundant there.
Looking back over this query, we can see that joins are being performed on the following columns:
career_keywords.keyword_id
inck JOIN ca0
This table defines
PRIMARY KEY (`keyword_id`)
, so there is a good index which can be used for this join.career_article_keyword.article_id
inca1 JOIN ca2
This table defines
UNIQUE KEY `article_id` (`article_id`,`keyword_id`)
and, sincearticle_id
is the leftmost column in this index, there is a good index which can be used for this join.career_article_keyword.keyword_id
inck JOIN ca0
andca0 JOIN ca1
There is no index that can be used for this join: the only index defined in this table has another column,
article_id
to the left ofkeyword_id
- so MySQL cannot findkeyword_id
entries in the index without first knowing thearticle_id
. I suggest you create a new index which haskeyword_id
as its leftmost column.(The need for this index could equally have been ascertained directly from looking at your original query, where your two outermost queries perform joins on that column.)
这篇关于可以用大量数据更快地执行[MySQL]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!