mysql没有使用索引? [英] mysql not using index?

查看:137
本文介绍了mysql没有使用索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含word,A_,E_,U_等列的表。这些带X_的列是tinyints,其值为字中存在特定字母的次数(以便稍后帮助优化通配符搜索查询)。

I have a table with columns like word, A_, E_, U_ .. these columns with X_ are tinyints having the value of how many times the specific letter exists in the word (to later help optimize the wildcard search query).

共有252k行。如果我像WHERE> 0那样进行搜索,我会获得60k行。但是,如果我做那个选择的解释,它说有225k行可以通过,没有索引可能。为什么?列被添加为索引。为什么它没有说有60k行要通过,可能的关键是U _?

There is totally 252k rows. If i make search like WHERE u_ > 0 i get 60k rows. But if i do the explain of that select, it says there is 225k rows to go through and no index possible. Why? Column was added as index. Why it doesn't say there is 60k rows to go through and that possible key is U_?

在表格中列出索引(也奇怪其他人在A_索引下分组)

listing the indexes on table (also strange that others are groupped under A_ index)

相比之下,如果我运行查询:其中id> 250000我得到2983结果,如果我解释那个选择它说有2982行和键使用主要。

In comparison if i run query: where id > 250000 i get 2983 results, and if i do explain of that select it says there is 2982 rows and key to be used primary.

顺便说一句,如果我按U_分组得到这个:(但可能并不重要,因为我已经说过查询返回60k结果)

Btw if i group by U_ i get this: (but probably doesnt matter much because i already said the query returns 60k results)

编辑:

如果我创建列U(varchar(1))并更新U ='U',其中U_> 0,然后如果我做选择WHERE U ='U'我也得到60k行(显然),但如果我解释我得到这个:

If i create column U (varchar(1)) and do the update U = 'U' where U_ > 0, then if i do the select WHERE U = 'U' i get also 60k rows (obviously), but if i do explain i get this:

仍然不太好(行120k而不是60k)但至少比先前情况下的行225k更好。虽然这个解决方案比第一个解决方案有点小,但可能更有效率。

Still not so good (rows 120k not 60k) but at least better than rows 225k in previous case. Although this solution is bit more piggy that than the first one, but maybe bit more efficient.

推荐答案

你问的是后端查询优化器。特别是你问:它如何选择一个访问路径?为什么要在这里进行索引,但是表可以在那里进行索引?

You're asking about the backend query optimizer. In particular you're asking: "how does it choose an access path? Why index here but tablescan there?"

让我们考虑一下优化器。什么是优化?经过的时间,在期待中。它具有连续读取和随机读取所用时间以及查询选择性(即查询返回的预期行数)的模型。从几个备用访问路径中,它选择看起来需要最少经过时间的路径。

Let's think about that optimizer. What is it optimizing? Elapsed time, in expectation. It has a model for how long sequential reads and random reads take, and for query selectivity, that is, expected number of rows returned by a query. From several alternative access paths it chooses the one that appears to require the least elapsed time.

您的 id> 250000 查询有一些事情要做:

Your id > 250000 query had a few things going for it:


  1. 良好的选择性,因此不到1%的行将会出现在结果集中

  2. id 是主键,因此在导航到btree中的正确位置时,所有列都可立即使用

  1. good selectivity, so less than 1% of rows will appear in the result set
  2. id is the Primary Key, so all columns are immediately available upon navigating to the right place in the btree

这导致优化器计算索引访问路径的预期运行时间,远小于表扫描的预期时间。

This caused the optimizer to compute an expected elapsed time for the indexed access path much smaller than expected time for tablescan.

另一方面,您的 u_> 0 查询的选择性非常差,将近四分之一的行拖入结果集。此外,索引不是 c> * 要求将所有列值复制到结果集中的覆盖索引。因此优化器预测它必须读取四分之一的索引块,然后基本上读取它们指向的所有数据行块。因此,与tablescan相比,我们必须从磁盘读取更多块,它们将是随机读取而不是顺序读取。这两个人都反对使用索引,因此选择tablescan是因为它最便宜。此外,请记住,通常多行将适合单个磁盘块或单个读取请求。如果它总是选择索引访问路径,我们会称它为悲观者,即使在索引磁盘I / O需要更长时间的情况下也是如此。

On the other hand, your u_ > 0 query has very poor selectivity, dragging nearly a quarter of the rows into the result set. Additionally, the index is not a covering index for your * demand of copying all column values into the result set. So the optimizer predicts it will have to read a quarter of the index blocks, and then essentially all of the data row blocks that they point to. So compared to tablescan, we'd have to read more blocks from disk, and they would be random reads instead of sequential reads. Both of those argue against using the index, so tablescan was selected because it was cheapest. Also, remember that often multiple rows will fit within a single disk block, or within a single read request. We would call it a pessimizer if it always chose the indexed access path, even in cases where indexed disk I/O would take longer.

当您的查询具有良好的选择性时,在单个列上使用索引,返回远小于关系行的1%。当您的查询选择性较差且您愿意制作时,请使用覆盖索引空间与时间的权衡。

Use an index on a single column when your queries have good selectivity, returning much less than 1% of a relation's rows. Use a covering index when your queries have poor selectivity and you're willing to make a space vs. time tradeoff.

这篇关于mysql没有使用索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆