可以用大量数据更快地执行[MySQL] [英] can it be executed faster with big amount of data [MySQL]

查看:108
本文介绍了可以用大量数据更快地执行[MySQL]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以优化下一个查询:

EXPLAIN EXTENDED SELECT keyword_id, ck.keyword, COUNT( article_id ) AS cnt
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id ) 
WHERE keyword_id
IN (

SELECT keyword_id
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id ) 
WHERE article_id
IN (

SELECT article_id
FROM career_article_keyword
WHERE keyword_id =9
)
AND keyword_id <>9
)
GROUP BY keyword_id
ORDER BY cnt DESC

如果我有特定的keyword_id(CURRENT_KID),则这里的主要任务是我需要找到与CURRENT_KID一起属于任何文章的所有关键字,并根据这些关键字的使用量对结果进行排序

表定义为:

mysql> show create table career_article_keyword;
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                  | Create Table                                                                                                                                                                                                                                                                                                                                               |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_article_keyword | CREATE TABLE `career_article_keyword` (
  `article_id` int(11) unsigned NOT NULL,
  `keyword_id` int(11) NOT NULL,
  UNIQUE KEY `article_id` (`article_id`,`keyword_id`),
  CONSTRAINT `career_article_keyword_ibfk_1` FOREIGN KEY (`article_id`) REFERENCES `career` (`menu_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table career_keywords;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                                                                         |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_keywords | CREATE TABLE `career_keywords` (
  `keyword_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` varchar(250) NOT NULL,
  PRIMARY KEY (`keyword_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

解释"的输出使我感到恐惧

http://o7.no/J6ThIs

在大数据上,此查询可以杀死所有内容:)我可以某种方式使其速度更快吗?

谢谢.

解决方案

看着您的EXPLAIN输出,我担心您对子查询的使用导致索引的使用不是最理想的.我感觉(没有任何理由-在这个方面,我很可能是错的),使用JOIN进行重写可能会导致更优化的查询.

要做到这一点,我们需要了解您的查询打算做什么.如果您的问题清楚了,那会有所帮助,但是经过一番苦思冥想之后,我决定您的查询试图获取出现在任何包含某些给定关键字的文章中的所有其他关键字的列表,以及一个计数这些关键字出现的所有文章中的.

现在让我们分阶段重建查询:

  1. 获取"包含某些给定关键字的任何文章"(不必担心重复):

    SELECT ca2.article_id
    FROM
           career_article_keyword AS ca2
    WHERE
          ca2.keyword_id = 9;
    

  2. 获取" [以上]中出现的所有其他关键字"

    SELECT ca1.keyword_id
    FROM
           career_article_keyword AS ca1
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ca1.keyword_id;
    

  3. 获取" [以上],以及出现这些关键字的所有文章的数量"

    SELECT ca1.keyword_id, COUNT(DISTINCT ca0.article_id) AS cnt
    FROM
           career_article_keyword AS ca0
      JOIN career_article_keyword AS ca1 USING (keyword_id)
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ca1.keyword_id
    ORDER BY cnt DESC;
    

  4. 最后,我们想将career_keyword表中的匹配关键字本身添加到输出中:

    SELECT ck.keyword_id, ck.keyword, COUNT(DISTINCT ca0.article_id) AS cnt
    FROM
           career_keywords        AS ck 
      JOIN career_article_keyword AS ca0 USING (keyword_id)
      JOIN career_article_keyword AS ca1 USING (keyword_id)
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ck.keyword_id -- equal to ca1.keyword_id due to join conditions
    ORDER BY cnt DESC;
    

一个显而易见的事情是,原始查询两次引用了career_keywords,而这个重写后的查询仅引用了该表一次.仅此一项就可以解释性能差异-尝试删除对其的第二个引用(即,它出现在第一个子查询中的位置),因为在那里完全多余.

回顾此查询,我们可以看到在以下列上正在执行联接:

    ck JOIN ca0

    中的
  • career_keywords.keyword_id

    此表定义了PRIMARY KEY (`keyword_id`),因此有一个很好的索引可用于此联接.

  • ca1 JOIN ca2

    中的
  • career_article_keyword.article_id

    此表定义了UNIQUE KEY `article_id` (`article_id`,`keyword_id`),并且由于article_id是该索引中最左边的列,因此有一个很好的索引可用于此联接.

  • ck JOIN ca0ca0 JOIN ca1

    中的
  • career_article_keyword.keyword_id

    没有可用于此连接的索引:此表中定义的唯一索引在keyword_id的左侧还有另一列article_id-因此,如果没有第一个MySQL,MySQL将无法在索引中找到keyword_id条目了解article_id.我建议您创建一个新索引,该索引的最左列为keyword_id.

    (同样可以通过查看您的原始查询来确定对索引的需求,您的两个最外面的查询在该列上执行联接.)

is there any way how to optimize next query:

EXPLAIN EXTENDED SELECT keyword_id, ck.keyword, COUNT( article_id ) AS cnt
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id ) 
WHERE keyword_id
IN (

SELECT keyword_id
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id ) 
WHERE article_id
IN (

SELECT article_id
FROM career_article_keyword
WHERE keyword_id =9
)
AND keyword_id <>9
)
GROUP BY keyword_id
ORDER BY cnt DESC

The main task here if I have particular keyword_id (CURRENT_KID) i need to find all keywords which was ever belongs to any article together with CURRENT_KID, and sort result based on quantity of usage these keywords

tables defined as:

mysql> show create table career_article_keyword;
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                  | Create Table                                                                                                                                                                                                                                                                                                                                               |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_article_keyword | CREATE TABLE `career_article_keyword` (
  `article_id` int(11) unsigned NOT NULL,
  `keyword_id` int(11) NOT NULL,
  UNIQUE KEY `article_id` (`article_id`,`keyword_id`),
  CONSTRAINT `career_article_keyword_ibfk_1` FOREIGN KEY (`article_id`) REFERENCES `career` (`menu_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table career_keywords;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                                                                         |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_keywords | CREATE TABLE `career_keywords` (
  `keyword_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` varchar(250) NOT NULL,
  PRIMARY KEY (`keyword_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

output of "explain" is scared me

http://o7.no/J6ThIs

on big data this query can kill everything :) can i make it faster somehow ?

thanks.

解决方案

Looking at your EXPLAIN output, I was concerned that your use of subqueries had resulted in a suboptimal use of indexes. I felt (without any justification - and on this I may very well be wrong) that rewriting using JOIN might lead to a more optimised query.

To do that, we need to understand what it is your query is intended to do. It would have helped if your question had articulated it, but after a little head-scratching I decided your query was trying to fetch a list of all other keywords that appear in any article that contains some given keyword, together with a count of all articles in which those keywords appear.

Now let's rebuild the query in stages:

  1. Fetch "any article that contains some given keyword" (not worrying about duplicates):

    SELECT ca2.article_id
    FROM
           career_article_keyword AS ca2
    WHERE
          ca2.keyword_id = 9;
    

  2. Fetch "all other keywords that appear in [the above]"

    SELECT ca1.keyword_id
    FROM
           career_article_keyword AS ca1
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ca1.keyword_id;
    

  3. Fetch "[the above], together with a count of all articles in which those keywords appear"

    SELECT ca1.keyword_id, COUNT(DISTINCT ca0.article_id) AS cnt
    FROM
           career_article_keyword AS ca0
      JOIN career_article_keyword AS ca1 USING (keyword_id)
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ca1.keyword_id
    ORDER BY cnt DESC;
    

  4. Finally, we want to add to the output the matching keyword itself from the career_keyword table:

    SELECT ck.keyword_id, ck.keyword, COUNT(DISTINCT ca0.article_id) AS cnt
    FROM
           career_keywords        AS ck 
      JOIN career_article_keyword AS ca0 USING (keyword_id)
      JOIN career_article_keyword AS ca1 USING (keyword_id)
      JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id)
    WHERE
          ca1.keyword_id <> 9
      AND ca2.keyword_id =  9
    GROUP BY ck.keyword_id -- equal to ca1.keyword_id due to join conditions
    ORDER BY cnt DESC;
    

One thing that is immediately clear is that your original query referenced career_keywords twice, whereas this rewritten query references that table only once; this alone might explain the performance difference - try removing the second reference to it (i.e. where it appears in your first subquery), as it's entirely redundant there.

Looking back over this query, we can see that joins are being performed on the following columns:

  • career_keywords.keyword_id in ck JOIN ca0

    This table defines PRIMARY KEY (`keyword_id`), so there is a good index which can be used for this join.

  • career_article_keyword.article_id in ca1 JOIN ca2

    This table defines UNIQUE KEY `article_id` (`article_id`,`keyword_id`) and, since article_id is the leftmost column in this index, there is a good index which can be used for this join.

  • career_article_keyword.keyword_id in ck JOIN ca0 and ca0 JOIN ca1

    There is no index that can be used for this join: the only index defined in this table has another column, article_id to the left of keyword_id - so MySQL cannot find keyword_id entries in the index without first knowing the article_id. I suggest you create a new index which has keyword_id as its leftmost column.

    (The need for this index could equally have been ascertained directly from looking at your original query, where your two outermost queries perform joins on that column.)

这篇关于可以用大量数据更快地执行[MySQL]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆