应该索引低基数标志吗? [英] Should low cardinality flags be indexed?

查看:41
本文介绍了应该索引低基数标志吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个表,其中特定的行子集对于读取要热得多.就像对于 people 表有一个名为 is_alive 的标志.或者,如果您实施软/逻辑删除并且您的搜索条件始终包括 is_deleted = 0.

Let's say you have a table where a specific subset of rows is much hotter for reads. Like if there's a flag called is_alive for a people table. Or if you implement soft/logical deletes and your search criteria always includes is_deleted = 0.

这些字段是否应该包含在这些表的索引中?如果是这样,他们应该更左还是更右?

Should these fields be included on indexes for these tables? If so, should they be more left or more right?

假设您有类似...的索引

Let's say you have indexes like...

people [ last_name ]
people [ zip_code ]
people [ gender ]

widgets [ category_id ]
widgets [ seller_id ]

你让他们看起来像

people  [ last_name, is_alive   ]
widgets [ category_id, is_valid ]

people  [ is_alive, last_name   ]
widgets [ is_valid, category_id ]

除非与其他搜索条件配对,否则布尔值本身的基数/重要性较低.

The booleans themselves have low cardinality/significance unless they are paired with other search criteria.

尽管几乎每次都使用它,但将这个字段添加到每个索引中感觉很可疑.也许这本身就是一个问题"?是否应该将行传送到具有相同架构的不同表?基本上是在标志上分区.

It just feels fishy adding this field to EVERY index although it is used nearly every time. Maybe that in itself is a "problem"? Should the rows be shuttled to a different table that has the same schema? Basically partitioning on the flag.

供应商不可知.

推荐答案

索引帮助查询的主要方式之一是减少全表扫描需要读取的页数.请记住,数据库引擎正在管理页面,而页面又存储记录.想象一下,我们有一个客户表,它有一个状态索引.过滤到单个状态的查询只需读取一小部分数据.当然,这个比例可能是 10%(加利福尼亚州),而小州则不到 1%.问题是:读取这些数据需要多少页.

One of the key ways that an index helps a query is by reducing the number of pages that need to be read for a full table scan. Remember, the database engine is managing pages, which in turn store records. Imagine we have a table of customers and it has an index on state. A query that filters down to a single state only has to read a small proportion of the data. Of course, the proportion could be like 10% (for California) versus less than 1% for a small state. The question is: how many pages are needed to read this data.

要回答这个问题,我们需要信息:(1) 查询的选择性如何?(2) 一页上有多少条记录?因此,如果 100 条记录适合一个页面,那么选择 2% 行的查询几乎总是必须读取所有页面.在这种情况下,索引对全表扫描没有帮助.索引反过来会产生开销,因此可能不应该使用它.

To answer this question we need information: (1) how selective is the query? and (2) how many records fit on a page? So, if 100 records fit on a page, then a query that selects 2% of the rows will almost always have to read all the pages anyway. In this case, the index is not helping with a full table scan. The index in turn incurs overhead, so it probably should not be used.

另一方面,如果页面上只有 1 条记录,那么选择 2% 行的查询将只需要读取 2% 的页面——节省了 50 倍.索引产生的任何开销几乎都是值得的.

On the other hand, if only 1 record fits on a page, then a query that selects 2% of the rows will only have to read 2% of the pages -- a savings of 50X. Alsmost any overhead incurred by the index is worth it.

因为索引有多种用途,并且因为不同的数据库引擎实现它们不同,并且因为页表的实现不同等等,所以没有硬性规定.但是,我通常可以说,低基数标志可能不是索引的好候选.

Because indexes get used for multiple purposes, and because different databases engines implement them differently, and because page tables are implemented differently, and so on, there is no hard and fast rule. However, I can generally say that low cardinality flags are probably not good candidates for indexes.

当我想到它时,我可以想到一种可能证明索引有效的情况.这将用于宽行和可以由索引专门处理的查询(选择标志,count(*) from table group by flag).

As I think about it, I can think of one case where an index might prove efficient. That would be for wide rows and a queries that can be handled exclusively by the index (select flag, count(*) from table group by flag).

另一方面,如果您有多个这样的标志,复合索引可能有助于查询性能.

On the other hand, if you have several such flags, a composite index may help query performance.

这篇关于应该索引低基数标志吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆