使用CHAR / VARCHAR索引时，为什么MySQL查询的性能如此糟糕？ [英] Why performance of MySQL queries are so bad when using a CHAR/VARCHAR index?

查看：134 发布时间：2018/8/2 15:47:46 mysql sql performance optimization indexing

本文介绍了使用CHAR / VARCHAR索引时，为什么MySQL查询的性能如此糟糕？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，我将描述问题域的简化版本。

First, I will describe a simplified version of the problem domain.

有表字符串：

CREATE TABLE strings (
  value CHAR(3) COLLATE utf8_unicode_ci NOT NULL,
  INDEX(value)
) ENGINE=InnoDB;

如您所见，它有一个非唯一索引 CHAR（ 3）列。

As you can see, it have a non-unique index of CHAR(3) column.

使用以下脚本填充表格：

The table is populated using the following script:

CREATE TABLE a_variants (
  letter CHAR(1) COLLATE utf8_unicode_ci  NOT NULL
) ENGINE=MEMORY;

INSERT INTO a_variants VALUES -- 60 variants of letter 'A'
  ('A'),('a'),('À'),('Á'),('Â'),('Ã'),('Ä'),('Å'),('à'),('á'),('â'),('ã'),
  ('ä'),('å'),('Ā'),('ā'),('Ă'),('ă'),('Ą'),('ą'),('Ǎ'),('ǎ'),('Ǟ'),('ǟ'),
  ('Ǡ'),('ǡ'),('Ǻ'),('ǻ'),('Ȁ'),('ȁ'),('Ȃ'),('ȃ'),('Ȧ'),('ȧ'),('Ḁ'),('ḁ'),
  ('Ạ'),('ạ'),('Ả'),('ả'),('Ấ'),('ấ'),('Ầ'),('ầ'),('Ẩ'),('ẩ'),('Ẫ'),('ẫ'),
  ('Ậ'),('ậ'),('Ắ'),('ắ'),('Ằ'),('ằ'),('Ẳ'),('ẳ'),('Ẵ'),('ẵ'),('Ặ'),('ặ');

INSERT INTO strings
  SELECT CONCAT(a.letter, b.letter, c.letter) -- 60^3 variants of string 'AAA'
    FROM a_variants a, a_variants b, a_variants c
  UNION ALL SELECT 'BBB'; -- one variant of string 'BBB'

因此，它包含216000无法区分（就 utf8_unicode_ci collation）字符串AAA的变体和字符串BBB的一个变体：

So, it contains 216000 indistinguishable (in terms of the utf8_unicode_ci collation) variants of string "AAA" and one variant of string "BBB":

SELECT value, COUNT(*) FROM strings GROUP BY value;

+-------+----------+
| value | COUNT(*) |
+-------+----------+
| AAA   |   216000 |
| BBB   |        1 |
+-------+----------+

当值被编入索引时，我希望以下两个查询具有相似的性能：

As value is indexed, I expect the following two queries to have similar performance:

SELECT SQL_NO_CACHE COUNT(*) FROM strings WHERE value = 'AAA';
SELECT SQL_NO_CACHE COUNT(*) FROM strings WHERE value = 'BBB';

但在实践中，第一个比 <300倍慢比第二！请参阅：

But in practice the first one is more than 300x times slower than the second! See:

+----------+------------+---------------------------------------------------------------+
| Query_ID | Duration   | Query                                                         |
+----------+------------+---------------------------------------------------------------+
|        1 | 0.11749275 | SELECT SQL_NO_CACHE COUNT(*) FROM strings WHERE value = 'AAA' |
|        2 | 0.00033325 | SELECT SQL_NO_CACHE COUNT(*) FROM strings WHERE value = 'BBB' |
|        3 | 0.11718050 | SELECT SQL_NO_CACHE COUNT(*) FROM strings WHERE value = 'AAA' |
+----------+------------+---------------------------------------------------------------+

- 为了确定，我在这里运行了两次AAA查询。

-- I ran the 'AAA' query twice here just to be sure.

如果我更改索引列的大小或将其类型更改为 VARCHAR ，性能问题仍然会出现。同时，在类似的情况下，但当非唯一索引不是 CHAR / VARCHAR 时（例如 INT ），查询速度与预期一样快。

If I change size of the indexed column or change its type to VARCHAR, the problem with performance still manifests itself. Meanwhile, in analogous situations, but when the non-unique index is not CHAR/VARCHAR (e.g. INT), queries are as fast as expected.

因此，问题是为什么MySQL查询的性能在使用时会如此糟糕 CHAR / VARCHAR index？

So, the question is why performance of MySQL queries are so bad when using a CHAR/VARCHAR index?

我有强烈的感觉MySQL对索引键匹配的所有值执行全线性扫描。但是为什么它只能返回匹配行的计数呢？我错过了什么，这真的需要吗？或者这是MySQL优化器的一个可悲缺点？

I have strong feeling that MySQL perform full linear scan of all the values matched by the index key. But why it do so when it can just return the count of the matched rows? Am I missing something and that is really needed? Or is that a sad shortcoming of MySQL optimizer?

使用CHAR / VARCHAR索引时，为什么MySQL查询的性能如此糟糕？ [英] Why performance of MySQL queries are so bad when using a CHAR/VARCHAR index?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

使用CHAR / VARCHAR索引时，为什么MySQL查询的性能如此糟糕？ [英] Why performance of MySQL queries are so bad when using a CHAR/VARCHAR index?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭