为什么在第一种情况下不使用索引而在另一种情况下工作? [英] Why aren't indexes used in the first case but work in the other?

查看:59
本文介绍了为什么在第一种情况下不使用索引而在另一种情况下工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想验证我的假设是否正确.我有两个表,只是索引顺序不同.

I'd like to verify that my assumptions are right. I have two tables, which only differ in index order.

它们看起来像这样:

CREATE TABLE `ipcountry` (
    `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `ipFROM` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
    `ipTO` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
    `countrySHORT` CHAR(2) NOT NULL DEFAULT '' COLLATE 'utf8_czech_ci',
    `countryLONG` VARCHAR(255) NOT NULL DEFAULT ' ' COLLATE 'utf8_czech_ci',
    PRIMARY KEY (`id`),
    INDEX `ipINDEX` (`ipTO`, `ipFROM`)
)
COLLATE='utf8_czech_ci'
ENGINE=InnoDB
AUTO_INCREMENT=2490331
;


CREATE TABLE `ipcountry2` (
    `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `ipFROM` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
    `ipTO` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
    `countrySHORT` CHAR(2) NOT NULL DEFAULT '' COLLATE 'utf8_czech_ci',
    `countryLONG` VARCHAR(255) NOT NULL DEFAULT ' ' COLLATE 'utf8_czech_ci',
    PRIMARY KEY (`id`),
    INDEX `ipINDEX` (`ipFROM`, `ipTO`)
)
COLLATE='utf8_czech_ci'
ENGINE=InnoDB
AUTO_INCREMENT=2490331
;

两个表的行数完全相同,大约为 2,500,000.

Both tables have the exact same amount of rows which is roughly 2,500,000.

当执行 EXPLAIN SELECT * FROM `ipcountry` WHERE ipFROM<=3548978221 AND ipTO>=3548978221 我得到

{
    "table": "UnknownTable",
    "rows":
    [
        {
            "id": 1,
            "select_type": "SIMPLE",
            "table": "ipcountry",
            "partitions": null,
            "type": "range",
            "possible_keys": "ipINDEX",
            "key": "ipINDEX",
            "key_len": "4",
            "ref": null,
            "rows": 83260,
            "filtered": 33.33,
            "Extra": "Using index condition"
        }
    ]
}

当执行 EXPLAIN SELECT * FROM `ipcountry2` WHERE ipFROM<=3548978221 AND ipTO>=3548978221 我得到

When performing EXPLAIN SELECT * FROM `ipcountry2` WHERE ipFROM<=3548978221 AND ipTO>=3548978221 I get

{
    "table": "UnknownTable",
    "rows":
    [
        {
            "id": 1,
            "select_type": "SIMPLE",
            "table": "ipcountry2",
            "partitions": null,
            "type": "ALL",
            "possible_keys": "ipINDEX",
            "key": null,
            "key_len": null,
            "ref": null,
            "rows": 2515343,
            "filtered": 16.66,
            "Extra": "Using where"
        }
    ]
}

是不是因为运算符的优先级?

Is it because of the precedence of operators?

推荐答案

第一个 EXPLAIN 中的通知:

Notice in the first EXPLAIN:

        "key_len": "4",

这表明只有查询仅读取索引(4 个字节)中的第一个 INT 以进行查找.您可以看到,此查找将搜索范围从 2.5M 缩小到大约 83K,大约有 30:1 的选择性.

This shows that only the query only reads the first INT in the index (4 bytes) for the lookup. You can see that this lookup narrows down the search from 2.5M to about 83K, about a 30:1 selectivity.

        "rows": 83260,

当您的查询中有两个范围条件时,MySQL 不能将索引的两列都用于 B 树搜索.它可以对第一列进行 B 树搜索,但不能在该搜索中使用索引的后续列.

When you have two range conditions as in your query, MySQL can't use both columns of the index for the B-tree search. It can do a B-tree search on the first column, but subsequent columns of the index can't be used in that search.

您的查询还使用 索引条件下推,由额外注释表示:

Your query also filters by the other column at the storage engine level with index condition pushdown, indicated by the Extra note:

        "Extra": "Using index condition"

这不是 B 树搜索的一部分,但它通过在行从存储引擎返回到 SQL 层之前过滤掉行而有所帮助.

This isn't part of the B-tree search, but it helps a little bit by filtering out rows before they're returned from the storage engine to the SQL layer.

最重要的是,没有办法使用 B 树索引搜索来优化同一表中不同列上的两个范围条件.

The bottom line is that there's no way to use a B-tree index search to optimize two range conditions on different columns in the same table.

如果 MySQL 估计读取整个表的成本与使用索引大致相同,它也会完全跳过使用索引.与您的条件匹配的行越多,这种可能性就越大.InnoDB 通过二级索引读取行是额外的工作,因此如果它估计您的索引查找将匹配大量行,则默认执行表扫描.发生这种情况的阈值不是官方的或记录在案的,但我观察到当您的条件匹配表中至少 20% 的行时会发生这种情况.

MySQL will also skip using the index entirely if it estimates the cost of reading the whole table would be approximately the same as using the index. The more rows match your condition, the more likely this is. It's extra work for InnoDB to read rows via a secondary index, so it defaults to doing a table-scan if it estimates that your index lookup will match a large number of rows. The threshold at which this happens is not official or documented, but I've observed it happens when your condition matches at least 20% of the rows in the table.

在您的第二个表中,鉴于它也只能过滤第一列,我们可以推断 ipFROM 上的条件将匹配您表中行的很大子集.您正在搜索小于 3548978221 或 211.137.28.45 的所有 IP 地址,这在 IPv4 地址范围内相当高.至少有 20% 的行的值小于该数字并不奇怪.

In your second table, given that it also can only filter on the first column, we can reason that the condition on ipFROM alone will match a large subset of the rows in your table. You're searching for all IP addresses less than 3548978221, or 211.137.28.45, which is pretty high in the range of IPv4 addresses. It's not surprising that at least 20% of your rows have values less than that number.

所以 MySQL 优化器得出结论,在第二个查询中,它不会给使用索引带来足够的好处,它决定进行表扫描.不使用第一列就不能使用索引的第二列.

So the MySQL optimizer concludes that in the second query, it won't give enough benefit to use the index, and it decides to do a table-scan. It can't use the second column of the index without using the first column.

这篇关于为什么在第一种情况下不使用索引而在另一种情况下工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆