如何避免在此mysql查询上进行全表扫描? [英] How can I avoid a full table scan on this mysql query?

查看:586
本文介绍了如何避免在此mysql查询上进行全表扫描?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

explain
select
    *
from
    zipcode_distances z 
inner join
    venues v    
    on z.zipcode_to=v.zipcode
inner join
    events e
    on v.id=e.venue_id
where
    z.zipcode_from='92108' and
    z.distance <= 5

我正在尝试查找所有邮政编码92108内5英里内的场所中的事件",但是,我很难优化此查询.

I'm trying to find all "events at venues within 5 miles of zipcode 92108", however, I am having a hard time optimizing this query.

这是解释的样子:

id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra

1, SIMPLE, e, ALL, idx_venue_id, , , , 60024, 
1, SIMPLE, v, eq_ref, PRIMARY,idx_zipcode, PRIMARY, 4, comedyworld.e.venue_id, 1, 
1, SIMPLE, z, ref, idx_zip_from_distance,idx_zip_to_distance,idx_zip_from_to, idx_zip_from_to, 30, const,comedyworld.v.zipcode, 1, Using where; Using index

我正在对"e"表进行全表扫描,而且我无法弄清楚需要创建什么索引才能使其快速运行.

I'm getting a full table scan on the "e" table, and I can't figure out what index I need to create to get it to be fast.

任何建议将不胜感激

谢谢

推荐答案

基于您问题中的EXPLAIN输出,您已经具有查询应当使用的所有索引,即: /p>

Based on the EXPLAIN output in your question, you already have all the indexes the query should be using, namely:

CREATE INDEX idx_zip_from_distance
  ON zipcode_distances (zipcode_from, distance, zipcode_to);
CREATE INDEX idx_zipcode ON venues (zipcode, id);
CREATE INDEX idx_venue_id ON events (venue_id);

(我无法从您的索引名称中确定idx_zip_from_distance是否确实包含zipcode_to列.如果没有,则应添加它以使其成为

(I'm not sure from your index names whether idx_zip_from_distance really includes the zipcode_to column. If not, you should add it to make it a covering index. Also, I've included the venues.id column in idx_zipcode for completeness, but, assuming it's the primary key for the table and that you're using InnoDB, it will be included automatically anyway.)

但是,看起来MySQL正在选择一个不同的,可能不是最理想的查询计划,它会扫描所有事件,找到其场所和邮政编码,然后才按距离过滤结果.如果事件表的基数足够低,那么这可能是的最佳查询计划,但是从您问这个问题的事实来看,我认为并非如此.

However, it looks like MySQL is choosing a different, and possibly suboptimal, query plan, where it scans through all events, finds their venues and zip codes, and only then filters the results on distance. This could be the optimal query plan, if the cardinality of the events table was low enough, but from the fact that you're asking this question I assume it's not.

查询计划欠佳的一个原因可能是,因为您有太多的索引,这使计划者感到困惑.例如,假设它存储的数据大概是对称的,那么您真的真的需要zipcode表上的所有这三个索引吗?就个人而言,我建议仅在(zipcode_to, zipcode_from)上使用上述索引,并在(zipcode_to, zipcode_from)上添加唯一索引(如果没有人为的话,也可以是主键),最好按此顺序,以便偶尔对zipcode_to=?的查询可以使用它.)

One reason for the suboptimal query plan could be the fact that you have too many indexes which are confusing the planner. For instance, do you really need all three of those indexes on the zipcode table, given that the data it stores is presumably symmetric? Personally, I'd suggest only the index I described above, plus a unique index (which can also be the primary key, if you don't have an artificial one) on (zipcode_to, zipcode_from) (preferably in that order, so that any occasional queries on zipcode_to=? can make use of it).

但是,基于我所做的一些测试,我怀疑MySQL选择错误查询计划的主要原因仅仅是表的相对基数.假设您的实际zipcode_distances表是 huge ,并且MySQL不够聪明,无法意识到WHERE子句中的条件实际上将其范围缩小了.

However, based on some testing I did, I suspect the main issue why MySQL is choosing the wrong query plan comes simply down to the relative cardinalities of your tables. Presumably, your actual zipcode_distances table is huge, and MySQL isn't smart enough to realize quite how much the conditions in the WHERE clause really narrow it down.

如果是这样,最好和最简单的解决方法可能就是简单地强制MySQL使用您想要的索引:

If so, the best and simplest fix may be to simply force MySQL to use the indexes you want:

select
    *
from
    zipcode_distances z 
    FORCE INDEX (idx_zip_from_distance)
inner join
    venues v    
    FORCE INDEX (idx_zipcode)
    on z.zipcode_to=v.zipcode
inner join
    events e
    FORCE INDEX (idx_venue_id)
    on v.id=e.venue_id
where
    z.zipcode_from='92108' and
    z.distance <= 5

使用该查询,您确实应该获得所需的查询计划. (您在这里确实需要FORCE INDEX,因为仅使用USE INDEX,查询计划者仍可以决定使用表扫描而不是建议的索引,这违背了目的.我在第一次测试时就发生了这种情况.)

With that query, you should indeed get the desired query plan. (You do need FORCE INDEX here, since with just USE INDEX the query planner could still decide to use a table scan instead of the suggested index, defeating the purpose. I had this happen when I first tested this.)

Ps.这是有关SQLize的演示,带有 FORCE INDEX,演示了此问题.

Ps. Here's a demo on SQLize, both with and without FORCE INDEX, demonstrating the issue.

这篇关于如何避免在此mysql查询上进行全表扫描?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆