使用JOIN的MySQL查询不使用INDEX [英] MySQL query with JOIN not using INDEX

查看:153
本文介绍了使用JOIN的MySQL查询不使用INDEX的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MySQL(简体)中有以下两个表。

I have the following two tables in MySQL (Simplified).


  • 点击次数(InnoDB)


    • 包含大约70,000,000条记录

    • date_added上有一个索引

    • 有一列 link_id ,它引用链接中的记录 table

    • clicks (InnoDB)
      • Contains around about 70,000,000 records
      • Has an index on the date_added column
      • Has a column link_id which refers to a record in the links table

      • 包含的记录少得多,大约65,000

      我正在尝试使用这些表运行一些分析查询。我需要提取一些数据,关于在两个指定日期内发生的点击,同时使用其他表格将一些其他用户选择的过滤器加入到链接表中。

      I'm trying to run some analytical queries using these tables. I need to pull out some data, about clicks that occurred inside of two specified dates while applying some other user selected filters using other tables and joining them into the links table.

      然而,我的问题围绕索引的使用。当我运行以下查询时:

      My question revolves around the use of indexes however. When I run the following query:

      SELECT
          COUNT(1)
      FROM
          clicks
      WHERE
          date_added >= '2016-11-01 00:00:00'
      AND date_added <= '2016-11-03 23:59:59';
      

      我在1.40秒内收到回复。使用 EXPLAIN 我发现MySQL按预期使用 date_added 列上的索引。

      I get a response back in 1.40 sec. Using EXPLAIN I find that the MySQL uses the index on the date_added column as expected.

      EXPLAIN SELECT COUNT(1) FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-11-16 23:59:59';
      +----+-------------+--------+-------+---------------+------------+---------+------+---------+--------------------------+
      | id | select_type | table  | type  | possible_keys | key        | key_len | ref  | rows    | Extra                    |
      +----+-------------+--------+-------+---------------+------------+---------+------+---------+--------------------------+
      |  1 | SIMPLE      | clicks | range | date_added    | date_added | 4       | NULL | 1559288 | Using where; Using index |
      +----+-------------+--------+-------+---------------+------------+---------+------+---------+--------------------------+
      

      但是,当我链接表中的 LEFT JOIN 时,我发现查询执行的时间要长得多:

      However, when I LEFT JOIN in my links table I find that the query takes much longer to execute:

      SELECT
          COUNT(1) AS clicks
      FROM
          clicks AS c
      LEFT JOIN links AS l ON l.id = c.link_id
      WHERE
          c.date_added >= '2016-11-01 00:00:00'
      AND c.date_added <= '2016-11-16 23:59:59';
      

      以6.50秒完成。使用 EXPLAIN 我发现该索引未在 date_added 列中使用:

      Which completed in 6.50 sec. Using EXPLAIN I find that the index was not used on the date_added column:

      EXPLAIN SELECT COUNT(1) AS clicks FROM clicks AS c LEFT JOIN links AS l ON l.id = c.link_id WHERE c.date_added >= '2016-11-01 00:00:00' AND c.date_added <= '2016-11-16 23:59:59';
      +----+-------------+-------+--------+---------------+------------+---------+---------------+---------+-------------+
      | id | select_type | table | type   | possible_keys | key        | key_len | ref           | rows    | Extra       |
      +----+-------------+-------+--------+---------------+------------+---------+---------------+---------+-------------+
      |  1 | SIMPLE      | c     | range  | date_added    | date_added | 4       | NULL          | 6613278 | Using where |
      |  1 | SIMPLE      | l     | eq_ref | PRIMARY       | PRIMARY    | 4       | c.link_id     |       1 | Using index |
      +----+-------------+-------+--------+---------------+------------+---------+---------------+---------+-------------+
      

      As您可以看到索引未用于较大表中的 date_added 列,并且似乎需要更长时间。当我加入其他表时,这似乎变得更糟。

      As you can see the index isn't being used for the date_added column in the larger table and seems to take far longer. This seems to get even worse when I join in other tables.

      有谁知道为什么会发生这种情况,或者我有什么办法可以让它使用索引点击表中的 date_added 列?

      Does anyone know why this is happening or if there's anything I can do to get it to use the index on the date_added column in the clicks table?

      编辑

      我刚尝试使用其他方法从数据库中获取统计数据。我的方法的第一步涉及从点击表中提取一组不同的 link_id 。我发现我在这里再次看到同样的问题,没有加入。索引未被使用:

      I've just attempted to get my stats out of the database using a different method. The first step in my method involves pulling out a distinct set of link_ids from the clicks table. I've found that I'm seeing the same problem here again, without a JOIN. The index is not being used:

      我的查询:

      SELECT
          DISTINCT(link_id) AS link_id
      FROM
          clicks
      WHERE
          date_added >= '2016-11-01 00:00:00'
      AND date_added <= '2016-12-05 10:16:00'
      

      此查询几乎占用了分钟完成。我对此运行了一个 EXPLAIN ,发现查询没有像我预期的那样使用索引:

      This query took almost a minute to complete. I ran an EXPLAIN on this and found that the query is not using the index as I expected it would:

      +----+-------------+---------+-------+---------------+----------+---------+------+----------+-------------+
      | id | select_type | table   | type  | possible_keys | key      | key_len | ref  | rows     | Extra       |
      +----+-------------+---------+-------+---------------+----------+---------+------+----------+-------------+
      |  1 | SIMPLE      | clicks  | index | date_added    | link_id  | 4       | NULL | 79786609 | Using where |
      +----+-------------+---------+-------+---------------+----------+---------+------+----------+-------------+
      

      我预计它将使用 date_added 过滤结果集,然后提取不同的 link_id 值。知道为什么会这样吗?我有 link_id 的索引以及 date_added

      I expected that it would use the index on date_added to filter down the result set and then pull out the distinct link_id values. Any idea why this is happening? I have an index on link_id as well as date_added.

      推荐答案

      您是否想使用普通的 JOIN 代替 LEFT JOIN LEFT JOIN 保留右边的所有行,因此它将产生与 COUNT()相同的值作为unjoined表。如果您只想计算右侧表中左侧表中具有匹配行的行,请使用 JOIN ,而不是 LEFT JOIN

      Do you want to use an ordinary JOIN in place of the LEFT JOIN? LEFT JOIN preserves all the rows on the right, so it will yield the same value of COUNT() as the unjoined table. If you want to count only the rows from your right-hand table that have matching rows in the left-hand table, use JOIN, not LEFT JOIN.

      尝试在 date_added 上删除索引并将其替换为复合索引(date_added,link_id)此类索引称为覆盖独立 x。当查询规划器知道它可以从索引中获取所需的所有内容时,它不必退回到表中。在这种情况下,查询计划程序可以随机访问索引到日期范围的开头,然后执行索引范围扫描到范围的末尾。但它仍然需要引用另一个表格。

      Try dropping your index on date_added and replacing it with a compound index on (date_added, link_id). This sort of index is called a covering index. When the query planner knows it can get everything it needs from an index, it doesn't have to bounce back to the table. In this case the query planner can random-access the index to the beginning of your date range, then do an index range scan to the end of the range. It's still going to have to refer to the other table, though.

      编辑)为了实验,尝试更窄的日期范围。查看 EXPLAIN 是否发生了变化。在这种情况下,查询计划程序可能会猜测您的date_added列的基数错误。

      (Edit) For the sake of experimentation, try a narrower date range. See if EXPLAIN changes. In that case, the query planner might be guessing your date_added column's cardinality wrong.

      您可以尝试索引提示。例如,尝试

      You might try an index hint. For example, try

      SELECT COUNT(1) AS clicks
        FROM clicks AS c USE INDEX (date_added)
        LEFT JOIN links AS l ON l.id = c.link_id
       WHERE etc
      

      但是,从您的 EXPLAIN 输出判断,您已经在 date_added 上进行了范围扫描。无论你喜欢与否,你的下一步是复合覆盖指数。

      But, judging from your EXPLAIN output, you're already doing a range scan on date_added. Your next step, like it or not, is the compound covering index.

      确保链接(id)上有索引。可能有,因为它可能是PK。

      Make sure there's an index on links(id). There probably is, because it's probably the PK.

      尝试使用 COUNT(*)而不是 COUNT(1)。它可能不会有所作为,但值得一试。 COUNT(*)只计算行而不是为它计算的每一行计算一些东西。

      Try using COUNT(*) instead of COUNT(1). It probably won't make a difference, but it's worth a try. COUNT(*) simply counts rows rather than evaluating something for each row it counts.

      (Nitpick )你的日期范围闻起来很有趣。使用< 作为范围的最后结果,以获得最佳效果。

      (Nitpick) Your date range smells funny. Use < for the end of your range for best results, like so.

       WHERE c.date_added >= '2016-11-01'
         AND c.date_added <  '2016-11-17';
      

      编辑:看,MySQL查询规划器使用了很多关于内部知识的知识表的结构如何。并且,截至2016年底,每个表只能使用一个索引来满足查询。这是一个限制。

      Edit: Look, the MySQL query planner uses lots of internal knowledge about how tables are structured. And, it can only use one index per table to satisfy a query as of late 2016. That's a limitation.

      SELECT DISTINCT列实际上是一个相当复杂的查询,因为它必须对<$ c $进行重复数据删除c>列有问题。如果该列上有索引,则查询计划程序可能会使用它。选择该索引意味着它无法选择其他索引。

      SELECT DISTINCT column is actually a fairly complex query, because it has to de-dupe the column in question. If there's an index on that column, the query planner is likely to use it. Choosing that index means it could not choose some other index.

      复合索引(覆盖索引)有时但不总是解决这种索引选择困境,并允许索引双重使用。您可以在 http://use-the-index-luke.com/

      Compound indexes (covering indexes) sometimes but not always resolve this kind of index-selection dilemma, and allow index dual usage. You can read about all this at http://use-the-index-luke.com/

      但如果您的操作限制阻止添加复合索引,则需要使用一秒钟查询。这并不坏。

      But if your operational constraints prevent the adding of compound indexes, you'll need to live with the one-second query. It isn't that bad.

      当然,说你无法添加复合索引来完成工作是这样的:

      Of course, saying you can't add compound indexes to get your job done is like this:

      A :东西从高速公路上的卡车上掉下来。

      A: stuff is falling off my truck on the freeway.

      B :把东西盖上盖子把它绑起来。

      B: put a tarp over the stuff and tie it down.

      A :我的老板不会让我把卡车放在卡车上。

      A: my boss won't let me put a tarp on the truck.

      B :那么,开车很慢。

      这篇关于使用JOIN的MySQL查询不使用INDEX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆