群集与非群集 [英] Clustered vs Non-Clustered

查看:84
本文介绍了群集与非群集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对SQL(Server 2008)的较低级别知识是有限的,现在由我们的DBA挑战。让我解释一下(我已经提到了明显的陈述,希望我是对的,但如果你看错了,请告诉我)情景:

My lower level knowledge of SQL (Server 2008) is limited, and is now being challanged by our DBAs. Let me explain (I have mentioned obvious statements in the hope that I am right, but if you see something wrong, please tell me) the scenario:

我们有一张桌子它为人们持有法院命令。当我创建表(名称:CourtOrder)时,我创建了它:

We have a table which holds 'Court Orders' for people. When I created the table, (Name: CourtOrder), I created it like:

CREATE TABLE dbo.CourtOrder
(
  CourtOrderID INT NOT NULL IDENTITY(1,1), (Primary Key)
  PersonId INT NOT NULL,
  + around 20 other fields of different types.
)

然后我将非聚集索引应用于主键(为了提高效率)。我的理由是它是一个唯一的字段(主键),并且应该被编入索引,主要用于选择目的,因为我们经常从表中选择主键= ...

I then applied a non-clustered index to the primary key (for efficiency). My reasons is that it is a unique field (primary key), and should be indexed, mainly for selection purposes, as we often Select from table where primary key = ...

然后我在PersonId上应用了CLUSTERED索引。原因是在物理上对某个人进行分组,因为绝大多数工作都是为了获得一个人的订单。所以,从mytable中选择personId = ...

I then applied a CLUSTERED index on PersonId. The reason was to group orders for a particular person physically, as the vast majority of work is getting orders for a person. So, select from mytable where personId = ...

我现在已被提起。我被告知我们应该将聚簇索引放在主键上,将正常索引放在personId上。这对我来说似乎很奇怪。首先,为什么要将聚簇索引放在唯一的列上?什么是聚类?当然这是浪费聚集索引?我相信一个普通的索引会用在一个独特的列上。此外,聚类索引意味着我们不能聚集不同的列(每个表一个,对吗?)。

I have been pulled up on this now. I have been told that we should put the clustered index on the primary key, and the normal index on the personId. That seems very strange to me. First off, why would you put a clustered index on a unique column? what is it clustering? Surely that's a waste of the clustered index? I'd have believed a normal index would be used on a unique column. Also, clustering the index would mean we can't cluster a different column (One per table, right?).

我被告知我犯了错误的原因是因为他们认为在PersonId上放置聚簇索引会使插入变慢。对于选择速度增加5%,我们将在插入和更新时降低95%的速度。这是正确有效的吗?

The reasoning for me being told I have made a mistake is that they believe putting a clustered index on the PersonId would make inserts slow. For the 5% gain in speed of a select, we would be getting a 95% degradation in speed on inserts and updates. Is that correct and valid?

他们说因为我们聚集了personId,所以当我们插入或更改PersonId时,SQL Server必须重新排列数据。

They say that because we cluster the personId, SQL Server has to rearrange data when ever we insert or make a change to the PersonId.

那么我问过,为什么SQL会有一个CLUSTERED INDEX的概念,如果它太慢了?它和他们说的一样慢吗?我应该如何设置索引以获得最佳性能?我认为SELECT比INSERT更多......但他们说我们在INSERTS上遇到锁定问题......

So then I have asked, why would SQL have the concept of a CLUSTERED INDEX, if it's so slow? Is it as slow as they're saying? How should I have setup my indexes to achieve optimum performance? I'd have thought SELECT is used more than INSERT... but they say that we're having locking issues on INSERTS...

希望有人可以帮助我。

Hope someone can help me.

推荐答案

聚簇索引与非聚簇索引之间的区别在于聚簇索引确定了聚簇索引的物理顺序数据库中的行。换句话说,将聚簇索引应用于 PersonId 意味着行将按表中的 PersonId 进行物理排序,允许对此进行索引搜索直接进入行(而不是非聚集索引,这会将您引导到行的位置,添加额外的步骤)。

The distinction between a clustered vs. non-clustered index is that the clustered index determines the physical order of the rows in the database. In other words, applying the clustered index to PersonId means that the rows will be physically sorted by PersonId in the table, allowing an index search on this to go straight to the row (rather than a non-clustered index, which would direct you to the row's location, adding an extra step).

也就是说,主键不是聚集索引是异常,但并非闻所未闻。您的方案的问题实际上与您假设的相反:您希望聚簇索引中的唯一值,而不是重复。由于聚簇索引确定行的物理顺序,因此如果索引位于非唯一列上,则服务器必须向具有重复键值的行添加背景值(在您的情况下,任何具有相同键的行) PersonId ),以便组合值(键+背景值)是唯一的。

That said, it's unusual for the primary key not to be the clustered index, but not unheard of. The issue with your scenario is actually the opposite of what you're assuming: you want unique values in a clustered index, not duplicates. Because the clustered index determines the physical order of the row, if the index is on a non-unique column, then the server has to add a background value to rows who have a duplicate key value (in your case, any rows with the same PersonId) so that the combined value (key + background value) is unique.

我唯一建议的是使用代理键(你的 CourtOrderId )列作为主键,而是使用 PersonId 的复合主键以及其他一些唯一标识列或列集。但是,如果这不可能(或不实际),则将聚集索引放在 CourtOrderId 上。

The only thing I would suggest is not using a surrogate key (your CourtOrderId) column as the primary key, but instead use a compound primary key of the PersonId and some other uniquely-identifying column or set of columns. If that's not possible (or not practical), though, then put the clustered index on CourtOrderId.

这篇关于群集与非群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆