为什么要集群主键? [英] Why Cluster a Primary Key?

查看:73
本文介绍了为什么要集群主键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可能会因为成千上万的理由而被击落,但我从来没有真正听过或读过令人信服的解释,所以

这里是......


与非聚集索引相比,聚集索引在返回大量

记录方面效率更高。同意? (假设NC索引

当然不包括查询)


因为它只能有一个聚簇索引,为什么是这个

几乎总是用于主键,根据定义,主键

键总是会返回1条记录?




主键指定非聚集索引通常不是更好,并保留一个列的聚簇索引,该列最有可能用于
返回多行数据集的查询(例如

日期列)?


此外,如果您使用的是顺序键,则群集这将导致

在表的最后一页插入热点,如果你没有使用行级锁定,可能会导致并发问题。如果你是使用随机聚类键那么
那么插入通常会得到改善,假设你使用了一个合理的fillfactor,那么你将失去这个插件。 $>
使用聚集索引进行多记录检索的优势。


我很想听听别人对此的看法。


Phil

解决方案

我发现群集主键的主要原因是群集

其他任何东西都会破坏包括DAO和ADO在内的前端库,而

有时聚类主键似乎至少会将记录保存在一起

及时进入并且那些恰好是按日期关闭的那些

减少日期范围内的页数

查询。


就个人而言,我几乎总是有一些我宁愿聚集的东西而不是主要的

键,但是DAO和ADO都假设聚集索引是主键

即使实际上是其他东西,它也是不可行的。

聚集索引是唯一的,比PK大得多,导致不必要的网络流量,或聚集索引不是唯一的,前端

变得混乱,似乎有多个记录使用相同的

键。


2004年3月5日03:56:38 -0800, ph********@btopenworld.com (Philip Yale)写道:

我可能会因为成千上万的理由而被击落,但我从来没有真正听过或读过令人信服的解释,所以
这里有...

与非聚集索引相比,聚簇索引在返回大量
记录方面更有效。同意? (假设NC索引
当然不包括查询)

因为它只能有一个聚簇索引,为什么这个几乎总是如此?用于主键,根据定义,主键将始终返回1条记录?

通常不是更好地为
指定非聚集索引主键,并保留列的聚簇索引,该列最有可能用于返回多行数据集的查询(例如
日期列)?

此外,如果您正在使用顺序密钥,则群集这将导致在表的最后一页上插入热点,如果您没有使用行级锁定,则可能导致并发问题。如果您使用随机聚类密钥,那么插入通常会得到改进,假设您使用了合理的填充因子,但您仍然失去了使用聚簇索引的优势多记录检索。

我很想听听别人对此的看法。

Phil




>>由于它只能有一个聚簇索引,为什么这个

几乎总是用于主键,而根据定义,主要的

键总是会返回1记录[原文如此]? <


实际上,你敲了敲头,并不知道。当SQL

首次实现时,基于文件的数据的心理和物理模型是

(行不是记录;字段不是列;表格

不是文件)。具有连续存储和连续存储的文件,特别是磁带和穿孔卡(没有顺序

访问或在RDBMS中订购,所以第一次, ;下一个和最后一个是

完全没有意义)。

Master磁带文件按键排序,通常位于前面

记录,就在删除之后旗。这样你就可以将交易磁带(也用相同的密钥排序)合并到主服务器中。


Codd博士也为此而且首先是关于关系的第一篇

论文的PRIMARY KEY。过了一会儿,他发现错误并且

意识到关系密钥是密钥是密钥而且没有一个是

更平等。比其他人。不幸的是,SQL是基于Codd的

的第一篇论文并向前推进了错误。


Sybase简单地使用了Unix和现有文件中的内容

用于构建SQL Server和Microsoft的系统紧随其后。


您是否熟悉罗马帝国如何确定
$ b $的故事b航天飞机助推器的大小因此大多数设计都是班车?


jo ******* @ northface.edu (--CELKO--)在留言中写道:< a2 ******** ******************@posting.google。 com> ...

由于它只能有一个聚簇索引,为什么这个


几乎总是用于主键,根据定义,主键将始终返回1条记录[原文如此]? <

实际上,你敲了敲头,并不知道。当SQL
首次实现时,数据的心理和物理模型基于文件(行不是记录;字段不是列;表格不是文件)。具有顺序,连续存储和特别是磁带和穿孔卡的文件(在RDBMS中没有顺序访问或排序,因此第一,下一步和最后按钮。

主磁带文件按键排序,通常位于
记录的前面,就在删除之后。旗。这样你就可以将交易磁带(也用相同的密钥排序)合并到主服务器中。

科德博士也为此而堕落并首先从PRIMARY KEY开始关于关系的论文。过了一会儿,他发现错误并且意识到关系密钥是密钥是关键,而且没有一个是更加平等。比其他人。不幸的是,SQL是基于Codd的第一篇论文并向前推进了错误。

Sybase只是使用Unix中的那些和现有的文件系统来构建SQL服务器和微软也纷纷效仿。

您是否熟悉罗马帝国如何确定航天飞机助推器大小的故事以及因此大多数航天飞机的设计? ?




谢谢你,Celko。这是非常有趣的,虽然我必须承认我不确定它与我原来的

问题有什么关系?无论RDBMS系统的背景演变如何,在今天的现实世界中,人们称之为主键。返回1

行,我觉得这对于聚集索引来说有点浪费




顺便说一句 - 我多次听过罗马理论,但这真的只是一个城市神话。例如,在英国,铁路轨道的轨距为4英寸,8.5英寸左右。因为这是由标准车轴宽度产生的结果

为5''。世界上还有很多其他仪表,

a非常好的纸张muenchen.de/ls_komlos/northam.pdftarget =_ blank> http://www.vwl.uni-muenchen.de/ls_komlos/northam.pdf
详情

他们的进化。


I''m probably going to get shot down with thousands of reasons for
this, but I''ve never really heard or read a convincing explanation, so
here goes ...

Clustered indexes are more efficient at returning large numbers of
records than non-clustered indexes. Agreed? (Assuming the NC index
doesn''t cover the query, of course)

Since it''s only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record?

Isn''t it generally better to specify a non-clustered index for the
primary key, and reserve the clustered index for a column which will
most likely be used for queries that return multi-row data sets (e.g.
date columns)?

Also, if you are using a sequential key, clustering this will cause an
insert hotspot on the last page of the table, which can cause
concurrency problems if you aren''t using row-level locking. If you''re
using a random clustered key then inserts will generally be improved,
assuming you''re using a sensible fillfactor, but you still lose the
advantage of using the clustered index for multi-record retrieval.

I''d be very interested to hear other peoples'' views on this.

Phil

解决方案

The main reason I''ve found for clustering the primary key is that clustering
anything else will mess up front-end libraries including DAO and ADO, and
sometimes clustering the primary key seems to at least keep records together
that were entered close together in time, and those happen to be the ones
close tegether by date which reduces the number of pages hit in date range
queries.

Personally, I almost always have something I''d rather cluster than the primary
key, but with DAO and ADO both assuming the clustered index is the primary key
even when something else actually is, it''s just not workable. Either the
clustered index is unique and much larger than the PK leading to unnecessary
network traffic, or the clustered index is not unique, and the front-end
becomes confused that there seems to be more than one record with the same
key.

On 5 Mar 2004 03:56:38 -0800, ph********@btopenworld.com (Philip Yale) wrote:

I''m probably going to get shot down with thousands of reasons for
this, but I''ve never really heard or read a convincing explanation, so
here goes ...

Clustered indexes are more efficient at returning large numbers of
records than non-clustered indexes. Agreed? (Assuming the NC index
doesn''t cover the query, of course)

Since it''s only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record?

Isn''t it generally better to specify a non-clustered index for the
primary key, and reserve the clustered index for a column which will
most likely be used for queries that return multi-row data sets (e.g.
date columns)?

Also, if you are using a sequential key, clustering this will cause an
insert hotspot on the last page of the table, which can cause
concurrency problems if you aren''t using row-level locking. If you''re
using a random clustered key then inserts will generally be improved,
assuming you''re using a sensible fillfactor, but you still lose the
advantage of using the clustered index for multi-record retrieval.

I''d be very interested to hear other peoples'' views on this.

Phil




>> Since it''s only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record [sic]? <<

Actually, you hit the nail on the head and did not know it. When SQL
was first implemented, the mental and physical models for data were
based on files (Rows are not records; fields are not columns; tables
are not files). Files with sequential, contigous storage and in
particular, magnetic tape and punch cards (there is no sequential
access or ordering in an RDBMS, so "first", "next" and "last" are
totally meaningless).

A Master mag tape file is sorted on a key, usually at the front of the
records, just after the "deleted" flag. This is so that you can merge
the transaction tapes, also sorted on the same key, into the Master.

Dr. Codd also fell for this and began with the PRIMARY KEY in first
papers on the relational. A bit later, he caught the error and
realized that a relational key is a key is a key and none of them are
"more equal" than the others. Unfortunately, SQL was based on Codd''s
first papers and carried the error forward.

Sybase simply used what was there in Unix and the existing file
systems to build SQL Server and Microsoft followed suit.

Are you familiar with the story of how the Roman Empire determined the
size of the Space Shuttle boosters and therefore most of the design of
the shuttle?


jo*******@northface.edu (--CELKO--) wrote in message news:<a2**************************@posting.google. com>...

Since it''s only possible to have one clustered index, why is this


almost always used for the primary key, when by definition a primary
key will always return 1 record [sic]? <<

Actually, you hit the nail on the head and did not know it. When SQL
was first implemented, the mental and physical models for data were
based on files (Rows are not records; fields are not columns; tables
are not files). Files with sequential, contigous storage and in
particular, magnetic tape and punch cards (there is no sequential
access or ordering in an RDBMS, so "first", "next" and "last" are
totally meaningless).

A Master mag tape file is sorted on a key, usually at the front of the
records, just after the "deleted" flag. This is so that you can merge
the transaction tapes, also sorted on the same key, into the Master.

Dr. Codd also fell for this and began with the PRIMARY KEY in first
papers on the relational. A bit later, he caught the error and
realized that a relational key is a key is a key and none of them are
"more equal" than the others. Unfortunately, SQL was based on Codd''s
first papers and carried the error forward.

Sybase simply used what was there in Unix and the existing file
systems to build SQL Server and Microsoft followed suit.

Are you familiar with the story of how the Roman Empire determined the
size of the Space Shuttle boosters and therefore most of the design of
the shuttle?



Thanks for that, Celko. It''s very interesting, although I must
confess that I''m not sure what it''s got to do with my original
question? Whatever the background evolution of RDBMS systems, in the
real world today what people refer to as a "primary key" returns 1
row, and I feel that it''s a bit of a waste putting a clustered index
on this.

BTW - I''ve heard the Roman theory many times, but this really is just
an urban myth. Railway tracks, for example, in the UK, have a gauge
of 4'' 8.5" because this was what resulted from a standard axle width
of 5''. There are many other gauges throughout the world, and there''s
a very good paper at
http://www.vwl.uni-muenchen.de/ls_komlos/northam.pdf which details
their evolution.


这篇关于为什么要集群主键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆