使用JOIN的MySQL查询不使用INDEX [英] MySQL query with JOIN not using INDEX
问题描述
我在MySQL(简体)中有以下两个表。
I have the following two tables in MySQL (Simplified).
-
点击次数
(InnoDB)
- 包含大约70,000,000条记录
- 在
date_added上有一个索引
列 - 有一列
link_id
,它引用链接中的记录
table
clicks
(InnoDB)- Contains around about 70,000,000 records
- Has an index on the
date_added
column - Has a column
link_id
which refers to a record in thelinks
table
- 包含的记录少得多,大约65,000
我正在尝试使用这些表运行一些分析查询。我需要提取一些数据,关于在两个指定日期内发生的点击,同时使用其他表格将一些其他用户选择的过滤器加入到链接表中。
I'm trying to run some analytical queries using these tables. I need to pull out some data, about clicks that occurred inside of two specified dates while applying some other user selected filters using other tables and joining them into the links table.
然而,我的问题围绕索引的使用。当我运行以下查询时:
My question revolves around the use of indexes however. When I run the following query:
SELECT COUNT(1) FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-11-03 23:59:59';
我在1.40秒内收到回复。使用
EXPLAIN
我发现MySQL按预期使用date_added
列上的索引。I get a response back in 1.40 sec. Using
EXPLAIN
I find that the MySQL uses the index on thedate_added
column as expected.EXPLAIN SELECT COUNT(1) FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-11-16 23:59:59'; +----+-------------+--------+-------+---------------+------------+---------+------+---------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------------+---------+------+---------+--------------------------+ | 1 | SIMPLE | clicks | range | date_added | date_added | 4 | NULL | 1559288 | Using where; Using index | +----+-------------+--------+-------+---------------+------------+---------+------+---------+--------------------------+
但是,当我
链接
表中的LEFT JOIN
时,我发现查询执行的时间要长得多:However, when I
LEFT JOIN
in mylinks
table I find that the query takes much longer to execute:SELECT COUNT(1) AS clicks FROM clicks AS c LEFT JOIN links AS l ON l.id = c.link_id WHERE c.date_added >= '2016-11-01 00:00:00' AND c.date_added <= '2016-11-16 23:59:59';
以6.50秒完成。使用
EXPLAIN
我发现该索引未在date_added
列中使用:Which completed in 6.50 sec. Using
EXPLAIN
I find that the index was not used on thedate_added
column:EXPLAIN SELECT COUNT(1) AS clicks FROM clicks AS c LEFT JOIN links AS l ON l.id = c.link_id WHERE c.date_added >= '2016-11-01 00:00:00' AND c.date_added <= '2016-11-16 23:59:59'; +----+-------------+-------+--------+---------------+------------+---------+---------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+------------+---------+---------------+---------+-------------+ | 1 | SIMPLE | c | range | date_added | date_added | 4 | NULL | 6613278 | Using where | | 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | c.link_id | 1 | Using index | +----+-------------+-------+--------+---------------+------------+---------+---------------+---------+-------------+
As您可以看到索引未用于较大表中的
date_added
列,并且似乎需要更长时间。当我加入其他表时,这似乎变得更糟。As you can see the index isn't being used for the
date_added
column in the larger table and seems to take far longer. This seems to get even worse when I join in other tables.有谁知道为什么会发生这种情况,或者我有什么办法可以让它使用索引点击表中的
date_added
列?Does anyone know why this is happening or if there's anything I can do to get it to use the index on the
date_added
column in the clicks table?编辑
我刚尝试使用其他方法从数据库中获取统计数据。我的方法的第一步涉及从点击表中提取一组不同的
link_id
。我发现我在这里再次看到同样的问题,没有加入。索引未被使用:I've just attempted to get my stats out of the database using a different method. The first step in my method involves pulling out a distinct set of
link_id
s from the clicks table. I've found that I'm seeing the same problem here again, without a JOIN. The index is not being used:我的查询:
SELECT DISTINCT(link_id) AS link_id FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 10:16:00'
此查询几乎占用了分钟完成。我对此运行了一个
EXPLAIN
,发现查询没有像我预期的那样使用索引:This query took almost a minute to complete. I ran an
EXPLAIN
on this and found that the query is not using the index as I expected it would:+----+-------------+---------+-------+---------------+----------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+---------------+----------+---------+------+----------+-------------+ | 1 | SIMPLE | clicks | index | date_added | link_id | 4 | NULL | 79786609 | Using where | +----+-------------+---------+-------+---------------+----------+---------+------+----------+-------------+
我预计它将使用
date_added
过滤结果集,然后提取不同的link_id
值。知道为什么会这样吗?我有link_id
的索引以及date_added
。I expected that it would use the index on
date_added
to filter down the result set and then pull out the distinctlink_id
values. Any idea why this is happening? I have an index onlink_id
as well asdate_added
.推荐答案
您是否想使用普通的
JOIN
代替LEFT JOIN
?LEFT JOIN
保留右边的所有行,因此它将产生与COUNT()
相同的值作为unjoined表。如果您只想计算右侧表中左侧表中具有匹配行的行,请使用JOIN
,而不是LEFT JOIN
。Do you want to use an ordinary
JOIN
in place of theLEFT JOIN
?LEFT JOIN
preserves all the rows on the right, so it will yield the same value ofCOUNT()
as the unjoined table. If you want to count only the rows from your right-hand table that have matching rows in the left-hand table, useJOIN
, notLEFT JOIN
.尝试在
date_added
上删除索引并将其替换为复合索引(date_added,link_id)
。 此类索引称为覆盖独立 x。当查询规划器知道它可以从索引中获取所需的所有内容时,它不必退回到表中。在这种情况下,查询计划程序可以随机访问索引到日期范围的开头,然后执行索引范围扫描到范围的末尾。但它仍然需要引用另一个表格。Try dropping your index on
date_added
and replacing it with a compound index on(date_added, link_id)
. This sort of index is called a covering index. When the query planner knows it can get everything it needs from an index, it doesn't have to bounce back to the table. In this case the query planner can random-access the index to the beginning of your date range, then do an index range scan to the end of the range. It's still going to have to refer to the other table, though.(编辑)为了实验,尝试更窄的日期范围。查看
EXPLAIN
是否发生了变化。在这种情况下,查询计划程序可能会猜测您的date_added列的基数错误。(Edit) For the sake of experimentation, try a narrower date range. See if
EXPLAIN
changes. In that case, the query planner might be guessing your date_added column's cardinality wrong.您可以尝试索引提示。例如,尝试
You might try an index hint. For example, try
SELECT COUNT(1) AS clicks FROM clicks AS c USE INDEX (date_added) LEFT JOIN links AS l ON l.id = c.link_id WHERE etc
但是,从您的
EXPLAIN
输出判断,您已经在date_added
上进行了范围扫描。无论你喜欢与否,你的下一步是复合覆盖指数。But, judging from your
EXPLAIN
output, you're already doing a range scan ondate_added
. Your next step, like it or not, is the compound covering index.确保
链接(id)
上有索引。可能有,因为它可能是PK。Make sure there's an index on
links(id)
. There probably is, because it's probably the PK.尝试使用
COUNT(*)
而不是COUNT(1)
。它可能不会有所作为,但值得一试。COUNT(*)
只计算行而不是为它计算的每一行计算一些东西。Try using
COUNT(*)
instead ofCOUNT(1)
. It probably won't make a difference, but it's worth a try.COUNT(*)
simply counts rows rather than evaluating something for each row it counts.(Nitpick )你的日期范围闻起来很有趣。使用
<
作为范围的最后结果,以获得最佳效果。(Nitpick) Your date range smells funny. Use
<
for the end of your range for best results, like so.WHERE c.date_added >= '2016-11-01' AND c.date_added < '2016-11-17';
编辑:看,MySQL查询规划器使用了很多关于内部知识的知识表的结构如何。并且,截至2016年底,每个表只能使用一个索引来满足查询。这是一个限制。
Edit: Look, the MySQL query planner uses lots of internal knowledge about how tables are structured. And, it can only use one index per table to satisfy a query as of late 2016. That's a limitation.
SELECT DISTINCT列
实际上是一个相当复杂的查询,因为它必须对<$ c $进行重复数据删除c>列有问题。如果该列上有索引,则查询计划程序可能会使用它。选择该索引意味着它无法选择其他索引。SELECT DISTINCT column
is actually a fairly complex query, because it has to de-dupe thecolumn
in question. If there's an index on that column, the query planner is likely to use it. Choosing that index means it could not choose some other index.复合索引(覆盖索引)有时但不总是解决这种索引选择困境,并允许索引双重使用。您可以在 http://use-the-index-luke.com/阅读所有这些内容。 a>
Compound indexes (covering indexes) sometimes but not always resolve this kind of index-selection dilemma, and allow index dual usage. You can read about all this at http://use-the-index-luke.com/
但如果您的操作限制阻止添加复合索引,则需要使用一秒钟查询。这并不坏。
But if your operational constraints prevent the adding of compound indexes, you'll need to live with the one-second query. It isn't that bad.
当然,说你无法添加复合索引来完成工作是这样的:
Of course, saying you can't add compound indexes to get your job done is like this:
A :东西从高速公路上的卡车上掉下来。
A: stuff is falling off my truck on the freeway.
B :把东西盖上盖子把它绑起来。
B: put a tarp over the stuff and tie it down.
A :我的老板不会让我把卡车放在卡车上。
A: my boss won't let me put a tarp on the truck.
B :那么,开车很慢。
这篇关于使用JOIN的MySQL查询不使用INDEX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!