什么是索引,非聚集索引可以是非唯一的? [英] what is index and can non-clustered index be non-unique?

查看:489
本文介绍了什么是索引,非聚集索引可以是非唯一的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

回答我的问题[1]:

(MS SQL Server)索引的所有定义(我能找到的)都是含糊不清的,所有解释都基于它,使用未定义或含糊不清的术语来叙述某些内容。

index的定义是什么?

All definitions of (MS SQL Server) index (that I could find) are ambiguous and all explanations, based on it, narrate something using undefined or ambiguously defined terms.
What is the definition of index?

例如,wiki中最常见的索引定义( http://en.wikipedia.org/wiki/Index_(数据库)):

For ex., the most common definition of index from wiki (http://en.wikipedia.org/wiki/Index_(database) ) :


  • 1)数据库索引是一种数据结构,它以提高写入速度和增加存储空间为代价提高数据库表上数据检索操作的速度。可以使用数据库表的一列或多列创建索引...

  • 2)SQL Server默认在主键上创建聚簇索引[1]。数据以随机顺序存在,但逻辑顺序由索引指定。数据行可以随机分布在整个表中。非聚集索引树按排序顺序包含索引键,索引的叶级包含指向页面的指针和数据页中的行号

嗯,这是不明确的。可以在索引下理解:

Well, it is ambiguous. One can understand under index:


  • 1)有序数据结构,一棵树,包含中间节点和叶节点;

  • 2)叶节点数据包含索引列的值+指向页面的指针和数据页中的行号

考虑到2),非聚集索引是否可以是非唯一的?或者,甚至,1)?

对我来说似乎不是这样......

Can non-clustered index be non-unique, considering 2)? or, even, 1) ?
It doesn't seem so to me ...

但是TSQL意味着存在非唯一的非聚集索引?

But does TSQL imply existence of non-unique non-clustered index?

如果是,那么CREATE INDEX(Transact-SQL)[2]中的非聚集索引可以理解什么,以及在那里应用UNIQUE参数是什么?

If yes, then What is understood by non-clustered index in "CREATE INDEX (Transact-SQL)"[2] and to what the argument UNIQUE is applied there?

是吗:


  • 3)包含索引列值的叶节点数据?即如2)但没有指针+行号)?

如果是3),那么问题1)又出现了 - 为什么要在索引中应用约束来复制真实数据,而不是现场的真实数据?

If it is 3), then again question 1) arises - why to apply constraints to copy of real data in "index", instead of real data in-situ?

更新:

不是真实数据行的书签(指针+行号) (唯一标识行)?

这个书签不构成索引的一部分,从而使索引唯一吗?

你能给我一个索引的定义,而不是解释如何使用它UNDEFINED?后一部分我已经知道(或者可以自己读)。

Update:
Is not bookmark (pointer+row number) to a real data row unique (uniquely identify row)?
Doesn't this bookmark constitute part of the index and thereby makes the index unique?
Can you give me the definition of the index instead of explaining how to use it UNDEFINED? The latter part I already know (or can read myself).

[1]

创建INDEX的UNIQUE参数 - 用于什么?

创建INDEX的UNIQUE参数 - 用途是什么?

[2]

[CREATE INDEX(Transact-SQL)]

http://msdn.microsoft.com/en-us/library/ms188783.aspx

推荐答案

索引是一种旨在优化查询大型数据集的数据结构。因此,目前还没有关于任何事物是否是唯一的声明。

An index is a data structure designed to optimize querying large data sets. As such, no claim is made about whether or not anything is unique at this point.

你绝对可以拥有非唯一的非聚集索引 - 你怎么能索引在姓氏,名字??这是从不将是独一无二的(例如在Facebook上.....)

You can definitely have non-unique non-clustered indices - how else could you index on lastname, firstname ?? That's never going to be unique (e.g. on Facebook.....)

您可以将索引定义为唯一 - 这只是添加额外检查它是否允许重复值。如果您在(姓氏,名字)UNIQUE上建立索引,那么在您的网站上注册的第二个Brad Pitt就不能这样做,因为该唯一索引会拒绝他的数据。

You can define an index as being unique - this just adds the extra check to it that no duplicate values are allowed. If you would make your index on (lastname, firstname) UNIQUE, then the second Brad Pitt to sign up on your site couldn't do so, since that unique index would reject his data.

任何给定表上的主键都有一个例外。主键是用于唯一且精确地标识数据库中每一行的逻辑标识符。因此,它必须对所有行都是唯一的,并且不能包含任何NULL值。

One exception is the primary key on any given table. The primary key is the logical identifier used to uniquely and precisely identify each single row in your database. As such, it must be unique over all rows and cannot contain any NULL values.

SQL Server中的聚簇索引是特殊的确实包含叶子节点中的实际数据。到目前为止没有任何限制 - 但是:聚集索引也用于唯一地定位(物理定位)数据库中的数据,因此,聚簇索引必须是唯一的 - 它必须是能够分别告诉布拉德皮特#1和布拉德皮特#2。如果您不小心并为聚簇索引提供一组唯一的列,SQL Server将向那些不唯一的行添加uniquefier(4字节INT),例如:你会得到BradPitt001和BradPitt002(或类似的东西)。

The clustered index in SQL Server is special in that they do contain the actual data in their leaf nodes. There's no restriction up to this point - however: the clustered index is also being used to uniquely locate (physically locate) the data in your database, and thus, the clustered index must be unique - it must be able to tell Brad Pitt #1 and Brad Pitt #2 apart. If you don't take care and provide a unique set of columns to your clustered index, SQL Server will add a "uniquefier" (a 4-byte INT) to those rows that aren't unique, e.g. you'd get BradPitt001 and BradPitt002 (or something like that).

聚集索引用作SQL Server表中实际数据行的指针,所以它也包含在每个非聚集索引中。因此,(lastname,firstname)上的非聚集,非唯一索引不仅包含这两个字段,而且实际上它还包含该表上的聚簇键 - 这就是为什么它很重要SQL Server表上的聚簇键很小,稳定且唯一 - 通常是INT。

The clustered index is used as the "pointer" to the actual data row in your SQL Server table, so it's included in every single non-clustered index, too. So your non-clustered, non-unique index on (lastname, firstname) would not only contain these two fields, but in reality, it also contains the clustered key on that table - that's why it's important the clustered key on a SQL Server table is small, stable, and unique - typically an INT.

所以你的(lastname,firstname)上的非聚集索引真的会有(姓氏,名字,personID),将有(Pitt,Brad,10176)(Pitt,Brad,17665)依此类推。当您在非聚集索引中搜索Brad Pitt时,SQL Server现在将找到这两个条目,对于这两个条目,它具有物理指针,可以找到这两个人的其​​余数据的位置,因此如果您要求的不仅仅是名字和姓氏,SQL Server现在可以获取两个Brad Pitt条目中的每一个的整行,并为您提供查询所需的数据。

So your non-clustered index on (lastname, firstname) will really have (lastname, firstname, personID) and will have entries like (Pitt, Brad, 10176), (Pitt, Brad, 17665) and so forth. When you search for "Brad Pitt" in your non-clustered index, SQL Server will now find these two entries, and for both, it has the "physical pointer" to where to find the rest of the data for those two guys, so if you ask for more than just the first- and last name, SQL Server could now go grab the whole row for each of the two Brad Pitt entries and provide you with the data the query requires.

这篇关于什么是索引,非聚集索引可以是非唯一的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆