集群与非集群 [英] Clustered vs Non-Clustered

查看:64
本文介绍了集群与非集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 SQL (Server 2008) 的低级知识有限,现在正受到我们的 DBA 的挑战.让我解释一下(我已经提到了明显的陈述,希望我是对的,但如果你发现不对的地方,请告诉我)场景:

My lower level knowledge of SQL (Server 2008) is limited, and is now being challanged by our DBAs. Let me explain (I have mentioned obvious statements in the hope that I am right, but if you see something wrong, please tell me) the scenario:

我们有一张桌子,上面放着人们的法院命令".当我创建表时,(名称:CourtOrder),我创建了它:

We have a table which holds 'Court Orders' for people. When I created the table, (Name: CourtOrder), I created it like:

CREATE TABLE dbo.CourtOrder
(
  CourtOrderID INT NOT NULL IDENTITY(1,1), (Primary Key)
  PersonId INT NOT NULL,
  + around 20 other fields of different types.
)

然后我对主键应用了一个非聚集索引(为了效率).我的理由是它是一个唯一的字段(主键),应该被索引,主要是为了选择,因为我们经常Select from table where primary key = ...

I then applied a non-clustered index to the primary key (for efficiency). My reasons is that it is a unique field (primary key), and should be indexed, mainly for selection purposes, as we often Select from table where primary key = ...

然后我在 PersonId 上应用了一个 CLUSTERED 索引.原因是在物理上对特定人的订单进行分组,因为绝大多数工作都是为一个人获得订单.所以,select from mytable where personId = ...

I then applied a CLUSTERED index on PersonId. The reason was to group orders for a particular person physically, as the vast majority of work is getting orders for a person. So, select from mytable where personId = ...

我现在对此很感兴趣.有人告诉我,我们应该将聚集索引放在主键上,将普通索引放在 personId 上.这对我来说似乎很奇怪.首先,为什么要将聚集索引放在唯一列上?什么是聚类?这肯定是对聚集索引的浪费吗?我曾相信一个普通的索引会被用在一个唯一的列上.此外,对索引进行聚类意味着我们不能对不同的列进行聚类(每个表一个,对吗?).

I have been pulled up on this now. I have been told that we should put the clustered index on the primary key, and the normal index on the personId. That seems very strange to me. First off, why would you put a clustered index on a unique column? what is it clustering? Surely that's a waste of the clustered index? I'd have believed a normal index would be used on a unique column. Also, clustering the index would mean we can't cluster a different column (One per table, right?).

我被告知我犯了一个错误的原因是他们认为在 PersonId 上放置聚集索引会使插入变慢.对于选择速度提高 5% 的情况,插入和更新的速度将降低 95%.这是正确有效的吗?

The reasoning for me being told I have made a mistake is that they believe putting a clustered index on the PersonId would make inserts slow. For the 5% gain in speed of a select, we would be getting a 95% degradation in speed on inserts and updates. Is that correct and valid?

他们说,因为我们对 personId 进行了聚类,所以每当我们插入或更改 PersonId 时,SQL Server 都必须重新排列数据.

They say that because we cluster the personId, SQL Server has to rearrange data when ever we insert or make a change to the PersonId.

那么我问了,如果 SQL 这么慢,为什么会有集群索引的概念?有他们说的那么慢吗?我应该如何设置索引以实现最佳性能?我原以为 SELECT 的使用比 INSERT 多……但他们说我们在 INSERTS 上遇到了锁定问题……

So then I have asked, why would SQL have the concept of a CLUSTERED INDEX, if it's so slow? Is it as slow as they're saying? How should I have setup my indexes to achieve optimum performance? I'd have thought SELECT is used more than INSERT... but they say that we're having locking issues on INSERTS...

希望有人能帮助我.

推荐答案

聚集索引和非聚集索引的区别在于聚集索引决定了数据库中行的物理顺序.换句话说,将聚集索引应用到 PersonId 意味着这些行将在表中按 PersonId 物理排序,允许对此直接进行索引搜索到该行(而不是非聚集索引,它将引导您到行的位置,添加一个额外的步骤).

The distinction between a clustered vs. non-clustered index is that the clustered index determines the physical order of the rows in the database. In other words, applying the clustered index to PersonId means that the rows will be physically sorted by PersonId in the table, allowing an index search on this to go straight to the row (rather than a non-clustered index, which would direct you to the row's location, adding an extra step).

也就是说,主键不是聚集索引是不寻常,但并非闻所未闻.您的方案的问题实际上与您的假设相反:您希望聚集索引中的唯一值,而不是重复值.因为聚集索引决定了行的物理顺序,如果索引位于非唯一列上,那么服务器必须向具有重复键值的行添加背景值(在您的情况下,任何具有相同键值的行)PersonId) 以便组合值(键 + 背景值)是唯一的.

That said, it's unusual for the primary key not to be the clustered index, but not unheard of. The issue with your scenario is actually the opposite of what you're assuming: you want unique values in a clustered index, not duplicates. Because the clustered index determines the physical order of the row, if the index is on a non-unique column, then the server has to add a background value to rows who have a duplicate key value (in your case, any rows with the same PersonId) so that the combined value (key + background value) is unique.

我唯一建议的是不要使用代理键(您的CourtOrderId)列作为主键,而是使用的复合主键>PersonId 和其他一些唯一标识列或列集.但是,如果这不可能(或不切实际),则将聚集索引放在 CourtOrderId 上.

The only thing I would suggest is not using a surrogate key (your CourtOrderId) column as the primary key, but instead use a compound primary key of the PersonId and some other uniquely-identifying column or set of columns. If that's not possible (or not practical), though, then put the clustered index on CourtOrderId.

这篇关于集群与非集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆