使用具有低基数的索引是否有意义? [英] Does it make sense to use an index that will have a low cardinality?

查看:228
本文介绍了使用具有低基数的索引是否有意义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我主要是一个Actionscript开发人员,绝不是SQL专家,但我不得不开发简单的服务器端。所以,我想我会问更多有经验的人关于标题中的问题。

I'm mainly an Actionscript developer and by no means an expert in SQL, but from time to time I have to develop simple server side stuff. So, I thought I'd ask more experienced people about the question in the title.

我的理解是你通过在列中设置索引没有获得多少收益这将保留一些不同的价值观。我有一个包含布尔值的列(实际上它是一个小的int,但我将它用作标志),并且此列用于我所拥有的大多数查询的WHERE子句中。在理论上的平均情况下,一半的记录值将为1而另一半为0.因此,在这种情况下,数据库引擎可以避免全表扫描,但无论如何都必须读取大量行(总行数/ 2)。

My understanding is that you don't gain much by setting an index in a column that will hold few distinct values. I have a column that holds a boolean value (actually it's a small int, but I'm using it as a flag), and this column is used in the WHERE clauses of most of the queries I have. In a theoretical "average" case, half of the records' values will be 1 and the other half, 0. So, in this scenario, the database engine could avoid a full table scan, but will have to read a lot of rows anyway (total rows/2).

那么,我应该把这个列作为索引吗?

So, should I make this column an index?

为了记录,我使用的是Mysql 5 ,但是我更感兴趣的是一般的基本原理,为什么它对我所知道的基数较低的列进行索引/没有意义。

For the record, I'm using Mysql 5, but I'm more interested in a general rationale on why it does / does not make sense indexing a column that I know that will have a low cardinality.

谢谢提前。

推荐答案

如果符合以下情况,索引甚至可以在低基数字段中提供帮助:

An index can help even on low cardinality fields if:


  1. 当其中一个可能的值与其他值相比非常罕见并且您搜索它时。

  1. When one of possible values is very infrequent compared to the other values and you search for it.

例如,有很少有色盲妇女,所以这个查询:

For instance, there are very few color blind women, so this query:

SELECT  *
FROM    color_blind_people
WHERE   gender = 'F'

最有可能受益于性别指数

当值倾向于按表顺序分组时:

When the values tend to be grouped in the table order:

SELECT  *
FROM    records_from_2008
WHERE   year = 2010
LIMIT 1

虽然这里只有 3 不同的年份,但很可能先添加早年的记录,因此必须先扫描很多记录如果不是索引,则返回第一个 2010 记录。

Though there are only 3 distinct years here, records with earlier years are most probably added first so very many records would have to be scanned prior to returning the first 2010 record if not for the index.

你需要 ORDER BY / LIMIT

SELECT  *
FROM    people
ORDER BY
        gender, id
LIMIT 1

如果没有索引,则需要 filesort 。虽然它对 LIMIT 进行了一些优化,但它仍然需要全表扫描。

Without the index, a filesort would be required. Though it's somewhat optimized do to the LIMIT, it would still need a full table scan.

索引涵盖查询中使用的所有字段:

When the index covers all fields used in the query:

CREATE INDEX (low_cardinality_record, value)

SELECT  SUM(value)
FROM    mytable
WHERE   low_cardinality_record = 3


  • 当你需要 DISTINCT

    SELECT  DISTINCT color
    FROM    tshirts
    

    MySQL 将使用 INDEX FOR GROUP-BY ,如果您的颜色很少,即使有数百万条记录,此查询也会立即生效。

    MySQL will use INDEX FOR GROUP-BY, and if you have few colors, this query will be instant even with millions of records.

    这是低基数字段上的索引比高基数字段上的索引更多效率的情况示例。

    This is an example of a scenario when the index on a low cardinality field is more efficient than that on a high cardinality field.

    请注意,如果 DML 性能问题不大,那么创建索引是安全的。

    Note that if DML performance is not much on an issue, then it's safe to create the index.

    如果优化者认为如果索引效率低下,则不会使用索引。

    If optimizer thinks that the index is inefficient, the index just will not be used.

    这篇关于使用具有低基数的索引是否有意义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆