涉及范围时,索引中的第一个基数列更高? [英] Higher cardinality column first in an index when involving a range?

查看:81
本文介绍了涉及范围时,索引中的第一个基数列更高?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

表格中有一百万行。文件时间大多不同。有一定数量的 ext 值。因此, filetime 具有较高的基数, ext 的基数较低。

There are a million rows in the table. The filetimes are mostly distinct. There are a finite number of ext values. So, filetimehas a high cardinality and ext has a much lower cardinality.

查询涉及 ext filetime

WHERE ext = '...'
  AND filetime BETWEEN ... AND ...

这两个指数中哪一个更好?为什么?

Which of those two indexes is better? And why?

推荐答案

首先,让我们尝试 FORCE INDEX 来选择 ef fe 。时间太短,无法清楚地了解哪个更快,但`EXPLAIN显示差异:

First, let's try FORCE INDEX to pick either ef or fe. The timings are too short to get a clear picture of which is faster, but `EXPLAIN shows a difference:

强制 filetime上的范围首先。 (注意: WHERE 中的顺序没有影响。)

Forcing the range on filetime first. (Note: The order in WHERE has no impact.)

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
|  1 | SIMPLE      | files | range | fe            | fe   | 14      | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

强制低基数 ext 首先:

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | files | range | ef            | ef   | 14      | NULL |  538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

显然, ef 更好。但是,让我们检查优化器跟踪。产量相当笨重;我只展示有趣的部分。不需要 FORCE ;跟踪将显示两个选项然后选择更好。

Clearly, the rows says ef is better. But let's check with the Optimizer trace. The output is rather bulky; I'll show only the interesting parts. No FORCE is needed; the trace will show both options then pick the better.

             ...
             "potential_range_indices": [
                ...
                {
                  "index": "fe",
                  "usable": true,
                  "key_parts": [
                    "filetime",
                    "ext",
                    "did",
                    "filename"
                  ]
                },
                {
                  "index": "ef",
                  "usable": true,
                  "key_parts": [
                    "ext",
                    "filetime",
                    "did",
                    "filename"
                  ]
                }
              ],

...

              "analyzing_range_alternatives": {
                "range_scan_alternatives": [
                  {
                    "index": "fe",
                    "ranges": [
                      "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 16684,
                    "cost": 20022,               <-- Here's the critical number
                    "chosen": true
                  },
                  {
                    "index": "ef",
                    "ranges": [
                      "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 538,
                    "cost": 646.61,               <-- Here's the critical number
                    "chosen": true
                  }
                ],

...

          "attached_conditions_computation": [
            {
              "access_type_changed": {
                "table": "`files`",
                "index": "ef",
                "old_type": "ref",
                "new_type": "range",
                "cause": "uses_more_keyparts"   <-- Also interesting
              }
            }

fe (范围列第一),r可以使用ange,但估计通过16684行扫描 ext ='gif'

With fe (range column first), the range could be used, but it estimated scanning through 16684 rows fishing for ext='gif'.

ef (低基数 ext 首先),它可以使用索引的两列并在BTree中更有效地向下钻取。然后它找到了估计538行,所有这些行对查询都很有用 - 无需进一步过滤。

With ef (low cardinality ext first), it could use both columns of the index and drill down more efficiently in the BTree. Then it found an estimated 538 rows, all of which are useful for the query -- no further filtering needed.

结论:


  • INDEX(filetime,ext)仅使用第一列。

  • INDEX(ext,filetime)使用了两列。

  • 将列涉及 = 无论基数,都会在索引中测试

  • 查询计划不会超出第一个范围列。

  • 基数与复合索引和此类查询无关

  • INDEX(filetime, ext) used only the first column.
  • INDEX(ext, filetime) used both columns.
  • Put columns involved in = tests first in the index regardless of cardinality.
  • The query plan won't go beyond the first 'range' column.
  • "Cardinality" is irrelevant for composite indexes and this type of query.

(使用索引条件表示存储引擎(InnoDB)将使用超出用于过滤的索引的列。)

("Using index condition" means that the Storage Engine (InnoDB) will use columns of the index beyond the one used for filtering.")

这篇关于涉及范围时,索引中的第一个基数列更高?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆