尽管覆盖索引,MySQL MyISAM慢count()查询 [英] MySQL MyISAM slow count() query despite covering index

查看:280
本文介绍了尽管覆盖索引,MySQL MyISAM慢count()查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在拔头发试图找出我做错了什么. 该表非常简单:

I am pulling my hair out trying to figure out what I am doing wrong. The table is very simple:

CREATE TABLE `icd_index` (
  `icd` char(5) NOT NULL,
  `core_id` int(11) NOT NULL,
  `dx_order` tinyint(4) NOT NULL,
  PRIMARY KEY (`icd`,`dx_order`,`core_id`),
  KEY `core` (`core_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

如您所见,我创建了一个覆盖表的所有三列的覆盖索引,并在core_id上添加了一个用于潜在联接的附加索引.这是一对多链接表,每个core_id映射到一个或多个icd.该表包含6500万行.

As you can see, I've created a covering index that spans all three columns of the table, plus an additional index on the core_id for potential joins. This is a one-to-many link table with each core_id mapped to one or more icd. The table contains 65 million rows.

所以,这就是问题所在.假设我想知道有多少人的icd代码为"25000". [如果您想知道的话,那就是糖尿病].我写了一个看起来像这样的查询:

So, here's the issue. Let's say I want to know how many people have an icd code of '25000'. [That's diabetes, in case you were wondering]. I write a query that looks like this:

SELECT COUNT(core_id) FROM icd_index WHERE icd='25000'

这需要60秒钟以上的时间才能执行.我曾经以为,由于icd列在涵盖索引中排在第一位,因此算起来会很快.

This takes over 60 seconds to execute. I had thought that since the icd column was first in the covered index, counting it would be fast.

更令人困惑的是,一旦我运行了一次查询,它现在就会非常快速地运行.我以为那是因为查询被缓存了,但是即使我RESET QUERY CACHE,查询现在也可以在几分之一秒内运行.但是,如果我等待足够长的时间,它似乎又会变慢-而且我不知道为什么.

More confusing, once I've run the query once, it now runs very quickly. I assumed that was because the query was cached, but even if I RESET QUERY CACHE, the query now runs in fractions of a second. But, if I wait long enough, it seems to slow down again -- and I can't figure out why.

我缺少明显的东西.我是否仅需要icd上的索引?这是6500万行所能达到的最佳性能吗?为什么运行查询然后重置缓存会影响速度?结果是否存储在索引中?

I am missing something obvious. Do I need an index on icd alone? Is this the best performance I can get with 65M rows? Why does running the query then resetting the cache affect speed? Are the results being stored in the index?

我正在运行MySQL 5.6(以防万一).

I'm running MySQL 5.6 (in case that matters).

这是查询的EXPLAIN:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  icd_index   ref PRIMARY PRIMARY 15  const   910104  Using where; Using index

推荐答案

这是怎么回事.

The SELECT COUNT (...) icd_index where icd='25000'

将使用索引,该索引是与数据分开的BTree.但是它以这种方式扫描它:

will use the index, which is a BTree separate from the data. But it scans it in this way:

  1. 找到具有icd ='25000'的第一个条目.这几乎是瞬时的.
  2. 向前扫描,直到找到icd的变化.这将仅扫描索引,而不接触数据.根据EXPLAIN,将有大约910,104个索引条目要扫描.

现在让我们看一下该索引的BTree.根据索引中的字段,每行将恰好是22个字节,加上一些开销(估计为40%). MyISAM索引块为1KB(参见InnoDB的16KB).我估计每个块有33行. 910,104/33表示需要读取大约27K块来进行计数. (请注意,COUNT(core_id)需要检查core_id是否为空,COUNT(*)不需要;这是一个很小的差异.)在普通硬盘驱动器上读取27K块大约需要270秒.您很幸运能在60秒内完成.

Now let's look at the BTree for that index. Based on the fields in the index, each row will be exactly 22 bytes, plus there will be some overhead (estimate 40%). A MyISAM index block is 1KB (cf InnoDB's 16KB). I would estimate 33 rows per block. 910,104/33 says about 27K blocks need to be read to do the COUNT. (Note COUNT(core_id) needs to check core_id for being null, COUNT(*) does not; this is a minor difference.) Reading 27K blocks on a plain hard drive takes about 270 seconds. You were lucky to get it done in 60 seconds.

第二次运行在key_buffer中找到了所有这些块(假设key_buffer_size至少为27MB),因此不必等待磁盘.因此,它要快得多. (这忽略了查询缓存,您可以明智地刷新或使用SQL_NO_CACHE.)

The second run found all those blocks in the key_buffer (assuming key_buffer_size is at least 27MB), so it did not have to wait for the disk. Hence it was much faster. (This ignoring the Query cache, which you had the wisdom to flush or use SQL_NO_CACHE.)

5.6恰好无关紧要(但是感谢您提及它),因为此过程自4.0或更早版本以来没有发生变化(除了utf8不存在;更多内容请参见下文).

5.6 happens to be irrelevant (but thanks for mentioning it), since this process has not changed since 4.0 or before (except that utf8 did not exist; more on that below).

切换到InnoDB将以多种方式提供帮助.主键将与数据聚集"在一起,而不是存储为单独的BTree.因此,一旦数据或PK被缓存,另一个立即可用.块的数量将更像是5K,但它们将是16KB块.如果高速缓存很冷,这些文件可能会更快地加载.

Switching to InnoDB would help in a couple of ways. The PRIMARY KEY would be 'clustered' with the data, not stored as a separate BTree. Hence, once the data or the PK is cached, the other is immediately available. The number of blocks would be more like 5K, but they would be 16KB blocks. These might be loadable faster if the cache is cold.

您问我是否需要单独在icd上建立索引?"-嗯,这会使MyISAM BTree的大小缩小到每行约21个字节,因此BTree的大小约为21/27倍,没有太大改进(至少对于冷缓存情况如此.

You ask " Do I need an index on icd alone? " -- Well that would shrink the MyISAM BTree size to about 21 bytes per row, so the BTree would be about 21/27ths the size, not much improvement (at least for the cold-cache situation).

另一种想法是, if icd始终是数字,并且始终是数字,以使用MEDIUMINT UNSIGNED,并在ZEROFILL上加点(如果它可以有前导零).

Another thought is, if icd is always numeric and always numeric, to use MEDIUMINT UNSIGNED, and tack on ZEROFILL if it can have leading zeros.

糟糕,我没注意到字符集. (我已经固定了上面的数字,但让我详细说明.)

Oops, I failed to notice the CHARACTER SET. (I have fixed the numbers above, but let me elaborate.)

  • CHAR(5)允许5个字符.
  • ascii每个字符占用1个 byte .
  • 每个字符
  • utf8最多占用3个字节.
  • 因此,CHAR(5)字符集utf8占用15个字节 始终.
  • CHAR(5) allows for 5 characters.
  • ascii takes 1 byte per character.
  • utf8 takes up to 3 bytes per characters.
  • So, CHAR(5) CHARACTER SET utf8 takes 15 bytes always.

将列更改为CHAR(5) CHARACTER SET ascii会将其缩小到5个字节.

Changing the column to CHAR(5) CHARACTER SET ascii would shrink it to 5 bytes.

将其更改为MEDIUMINT UNSIGNED ZEROFILL会将其缩小为3个字节.

Changing it to MEDIUMINT UNSIGNED ZEROFILL would shrink it to 3 bytes.

收缩数据将使I/O速度大致成比例(在另外两个字段中再允许6个字节之后.

Shrinking the data would speed up I/O by a roughly proportional amount (after allowing another 6 bytes for the other two fields.

这篇关于尽管覆盖索引,MySQL MyISAM慢count()查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆