聚集索引应该包含哪一列? [英] What column should the clustered index be put on?

查看:180
本文介绍了聚集索引应该包含哪一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我一直在对所有类型的索引进行一些阅读,主要建议是将聚簇索引放在表的主键上,但是如果主键实际上没有在查询中使用(通过选择或加入),只是为了纯粹的关系目的,所以在这种情况下,它不是被查询。例如,我有一个 car_parts 表,它包含3列, car_part_id car_part_no car_part_title car_part_id 是唯一的主键标识列。在这种情况下, car_part_no 也是唯一的,很可能 car_part_title car_part_no 是最被查询的内容,所以将聚集索引放在该列而不是 car_part_id ?问题的基础是什么栏应该实际拥有聚集索引,因为你只能被允许一个?

解决方案

索引如果且仅当索引中最左边的键被过滤时,可以由查询优化器使用聚簇或非聚类。因此,如果您在列(A,B,C)上定义索引,则 B = @ b 上的WHERE条件, C = @ c B = @ b AND C = @ c 不会充分利用索引(见注释)。这也适用于加入条件。包含 A 的任何WHERE过滤器将考虑索引: A = @ a A = @A AND B = @ b A = @ a AND C = @ c A = @ a AND B = @ b AND C = @ c



所以在你的例子中,如果你把 part_no 上的聚簇索引作为最左边的键,那么查询一个特定的 part_id 使用索引,单独的非聚集索引必须存在于 part-id



现在关于许多索引中哪一个应该是集群的问题。如果您有几种查询模式大致相同的重要性和频率,并且根据所需的密钥相互矛盾(例如, part_no part_id )然后考虑其他因素:




  • width :聚集索引键用作所有其他非聚集索引的查找键。所以如果你选择一个宽键(比如说两个唯一标识符列),那么你正在使所有其他索引更宽,从而消耗更多的空间,产生更多的IO并减慢一切。所以从阅读的角度看,在等于好的键之间,选择最窄的一个作为聚类,并使更广泛的一个非聚集。

  • 争用:如果你有插入和删除的具体模式尝试将它们物理分离,以便它们出现在聚簇索引的不同部分。例如。如果表充当一个逻辑端的所有插入的队列,并且在另一个逻辑端执行所有删除,则尝试布局聚簇索引,使物理顺序与此逻辑顺序相匹配(例如,入队顺序)。

  • 分区:如果表非常大,并且您计划部署分区,则分区键必须是聚簇索引。典型的例子是使用滑动窗口分区方案归档的历史数据。即使实体有一个逻辑主键,如entity_id,群集索引由datetime列完成,也可用于分区函数。

  • 稳定性:经常更改的关键字是聚集密钥的不良候选者,因为每个更新集群密钥值并强制所有非聚簇索引来更新其存储的查找密钥。随着群集密钥的更新也可能将记录重定位到不同的页面,它可能导致聚集索引碎片。



注意:不完全利用,因为有时引擎会选择非集群索引,而不是集群索引,而不是集群索引,因为更窄,因此有更少的页面要扫描。在我的例子中,如果你有一个索引(A,B,C)和一个WHERE过滤器在 B = @ b 和查询项目 C ,索引很可能被用作扫描,但不像搜索一样,因为仍然比完整的群集扫描(页数更少)要快。


Lately, I have been doing some reading on indexes of all types and the main advice is to put the clustered index on the primary key of the table, but what if the primary key actually is not used in a query (via a select or join) and is just put for purely relational purposes, so in this case it is not queried against. Example, say I have a car_parts table and it contains 3 columns, car_part_id, car_part_no, and car_part_title. car_part_id is the unique primary key identity column. In this case car_part_no is unique as well and is most likely car_part_title. car_part_no is what is most queried against, so doesn't it make sense to put the clustered index on that column instead of car_part_id? The basics of the question is what column should actually have the clustered index since you are only allowed one of them?

解决方案

An index, clustered or non clustred, can be used by the query optimizer if and only if the leftmost key in the index is filtered on. So if you define an index on columns (A, B, C), a WHERE condition on B=@b, on C=@c or on B=@b AND C=@c will not fully leverage the index (see note). This applies also to join conditions. Any WHERE filter that includes A will consider the index: A=@a or A=@a AND B=@b or A=@a AND C=@c or A=@a AND B=@b AND C=@c.

So in your example if you make the clustred index on part_no as the leftmost key, then a query looking for a specific part_id will not use the index and a separate non-clustered index must exist on part-id.

Now about the question which of the many indexes should be the clustered one. If you have several query patterns that are about the same importance and frequency and contradict each other on terms of the keys needed (eg. frequent queries by either part_no or part_id) then you take other factors into consideration:

  • width: the clustered index key is used as the lookup key by all other non-clustered indexes. So if you choose a wide key (say two uniquidentifier columns) then you are making all the other indexes wider, thus consuming more space, generating more IO and slowing down everything. So between equaly good keys from a read point of view, choose the narrowest one as clustered and make the wider ones non-clustered.
  • contention: if you have specific patterns of insert and delete try to separate them physically so they occur on different portions of the clustered index. Eg. if the table acts as a queue with all inserts at one logical end and all deletes at the other logical end, try to layout the clustered index so that the physical order matches this logical order (eg. enqueue order).
  • partitioning: if the table is very large and you plan to deploy partioning then the partitioning key must be the clustered index. Typical example is historical data that is archived using a sliding window partitioning scheme. Even thow the entities have a logical primary key like 'entity_id', the clustred index is done by a datetime column that is also used for the partitioning function.
  • stability: a key that changes often is a poor candidate for a clustered key as each update the clustered key value and force all non-clustered indexes to update the lookup key they store. As an update of a clustered key will also likely relocate the record into a different page it can cause fragmentation on the clustered index.

Note: not fully leverage as sometimes the engine will choose an non-clustered index to scan instead of the clustered index simply because is narrower and thus has fewer pages to scan. In my example if you have an index on (A, B, C) and a WHERE filter on B=@b and the query projects C, the index will be likely used but not as a seek, as a scan, because is still faster than a full clustered scan (fewer pages).

这篇关于聚集索引应该包含哪一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆