了解主键的工作原理如何使用它 [英] understanding how a primary key works & how to use it

查看:191
本文介绍了了解主键的工作原理如何使用它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SQL Server并创建一个表(示例在这个问题的最底层)。但是,我有一些问题,了解主键实际工作原理如何正确使用它们。



所以我知道主键确保表中的所有行都是唯一的,并且主键不能为空。我也读过这个页面

  type列名称
------------- ------------
datetime UploadDate
varchar(12)SecID
varchar(6)FundCode
varchar(100)名称
float价格
float名义
int SourceCode
datetime PriceDate
/ pre>

某些行的示例

  UploadDate SecID FundCode名称价格名义来源代码价格日期
2015-08-20 A045 ABCVPL Joe 1.3434 1000.33 3
2015-08-20 A563 ABCVPL Bob 1.5961 10.33 3
2015-08-20 A045 DEFGHJ Joe 1.3434 856.41 3
2015-08-20 XC45 PLMNOI Pip 2.3654 25.52 3
2015-08-20 KMM5 ABCVPL Nit 6.9565 1532 3
2015-08-21 A045 ABCVPL Joe 4.3434 1112 3
2015- 08-21 GH45 DEFGHJ Joe 3.3434 16532 3
2015-08-21 PL34 DEFGHJ Joe 7.3434 635 3
2015-08-21 ER33 ABCVPL Joe 8.3434 6320 3


解决方案

这个问题似乎混淆了两个不同的概念。第一个是主键,第二个是聚集索引。第一个是逻辑概念,后者是一个物理概念,是指数据如何实际存储。有些情况下,解耦主键和聚类键是有用的,但大多数情况下它们是一样的,默认情况下,主键将是您的聚类键。这是一个重要的区别。



我认为人们可以(还有)争论直到奶牛回家关于是否使用自然或代理主键。我不会太多,但基本是你使用3列定义一个唯一的行是一个自然键(即已经存在于您的数据)建议,另一种方法是使用一个身份列,这将给每行一个唯一的值,这是一个代理键,因为除了唯一标识您的行之外,它没有实际意义。


所以我知道可以在多个列上创建一个主键,所以在我的情况下,它将是3以上。如何在我的表上有一个
主键有助于提高select查询的性能?


没有,索引可能有助于您的查询。给定正确的索引,数据库引擎可以直接导航到所需的数据。


有人提到我应该创建一个整数列增加数字并使其成为主键 - 当运行选择查询时,我看不出有什么帮助,因为新的字段将不会有任何意义&不会在查询的任何select查询或where子句中使用?


这是一个很好的候选聚类键。根据索引Kimberly Tripp的女王a




  • 唯一

  • 缩小

  • 静态

  • 不断增加的模式



您已经勾选独特的盒子,与你的3列,这不是那么狭窄,但不是宽广的任何手段。第二个我不能回答,如果 UploadDate 是在创建时输入的默认值,那么您可能会有越来越多的模式,我不知道你的三列是静态的,或者它们可以改变。如果这两个中的任一个是真的,那么您应该使用代理标识列来集群。



我个人可能会根据(26个字节)将此消除为聚类密钥的候选者。您在群集索引中每行额外增加4个字节,但是您可以在所有后续索引中每行节省22个字节。



因此,在10,000,000行的表格中,您可以获得额外的38.1 MB由于标识列,但是您获得每个非聚簇索引209.8MB,虽然磁盘空间便宜,但不是浪费它不必要的原因。它不仅仅是获取这22个字节的所有索引,它也是所有使用外键的引用表,这将导致我的下一个要点,方便写入查询。你真的想要在每次提到密钥时输入这个加入:

  SELECT * 
FROM父AS p
INNER JOIN孩子AS c
ON c.UploadDate = p.UploadDate
AND c.SecID = p.SecID
AND c.FundCode = p.FundCode;

或者你宁愿写:

  SELECT * 
FROM Parent AS p
INNER JOIN儿童AS c
ON c.ParentID = p.ParentID;

因此,即使我已经决定了什么是逻辑上的一个主键不是一个好的候选人对于聚类密钥,我倾向于使聚类密钥成为关键表中易于引用的主要关键。例如,我有一个外部API,以XML格式发送订单详细信息:

 < orders> 
< order ID =12B47EF2-B9F5-4CD7-811F-2E7EC1A67E59>
< orderdetail>
< product>某些产品< / product>
< quantity> 1< / quantity< / quantity>
< / orderdetail>
< orderdetail>
< product>其他产品< / product>
< quantity> 2< / quantity< / quantity>
< / orderdetail>
< / order>
<订单ID =3A819217-49CA-4B4C-8AD5-CAD297FCA3F3>
< etc />
< / order>
< / orders>

如果我正在设置我的表来存储这个,虽然来自XML的ID将是逻辑主关键是我的订单表,这将是一个可怕的聚类键,所以我将添加一个代理标识字段,以避免与GUID上的聚类相关联的碎片:

  CREATE TABLE dbo.Orders 

OrderID INT IDENTITY NOT NULL,
SupplierOrderID UNIQUEIDENTIFIER NOT NULL,

CONSTRAINT PK_Orders__SupplierOrderID主键未分配(SupplierOrderID)
);
CREATE UNIQUE CLUSTERED INDEX UQ_Orders__OrderID ON dbo.Orders(OrderID);

GUID仍然是主键,所以我的订单明细表可以参考这个,但我一般认为如果我不认为这个密钥足够好集群,那么为什么我将同一个密钥放入另一个表中作为外键。我已经在 OrderID 中定义了一个更窄的键,为什么不用这个作为我的外键的顺序细节,并保存自己的12个字节。所以我最终会得到:

  CREATE TABLE dbo.Orders 

OrderID INT IDENTITY NOT NULL ,
SupplierOrderID UNIQUEIDENTIFIER NOT NULL,

CONSTRAINT PK_Orders__OrderID PRIMARY KEY CLUSTERED(OrderID)
);
CREATE UNIQUE NONCLUSTERED INDEX UQ_Orders__SupplierOrderID ON dbo.Orders(SupplierOrderID);

与所有内容一样,有例外情况,我会选择3列作为复合(聚集)主键,这个问题是如果我知道没有子表,而我所有的选择查询仍然需要我选择 UploadedDate SecID FundCode 。如果您在名称上有一个索引,例如:

  CREATE NONCLUSTERED INDEX IX_YourTable__Name ON dbo.YourTable(Name); 

SELECT UploadDate,SecID,FundCode,Name
FROM dbo.YourTable
WHERE Name ='Bob';

如果您有代理键,那么您将通过名称索引查找第2行的Bob然后在群集索引上查找第2行,以获取 UploadedDate SecID FundCode 。如果这三列是您的群集密钥,那么您删除了查找的需要,因为您已经在名称索引中的数据。每个索引额外的209.8MB可能是值得的,以避免这些查找操作。



总之(像往常一样),它取决于 - 它取决于你的个人喜好(我相信,Aaron Bertrand和Joe Celko仍然在自然与代理关键辩论的关系上,如果这两个伟大的头脑不能同意,那么答案实际上必须是个人偏好),而且在某些情况下也是你的确切情况您将需要一个复合主键,在某些情况下,您将需要一个代理键,在某些情况下,您将需要主键和您的集群键是相同的,在其他情况下您将不会。


I am using SQL Server and creating a table (example is at the very bottom of this question). However I am having some issues understanding how primary keys actually work & how to use them properly.

So I know a primary key ensures all the rows in a table are unique and that a primary key can't be null. I also read this page index basics - simple talk on indices and how indices are organised in a b-tree structure.

So in my table for a row to have a unique value I would have to use the first 3 columns (UploadDate, SecID & FundCode of types datetime, varchar(12) & varchar(6)). Only select queries will be used on this table & the where clause will be using one or more of the the three fields just mentioned.

So I know can create a primary key over multiple columns so in my case it would be the 3 above. How though does having a primary key on my table help to improve the performance of select queries? So I take it the primary key creates an index or some sort with the value of of your column (or in my case 3 columns), I don't see how this will help as my value will be a datetime & two bits of text?

Someone mentioned that I should just create an integer column with incrementing numbers and make that the primary key - I can't see how that can help when running a select query as the new field won't have any meaning & wouldn't be used in any select query or where clause of the query?

type             column name
-------------    ------------
datetime         UploadDate
varchar(12)      SecID
varchar(6)       FundCode
varchar(100)     Name
float            Price
float            Nominal
int              SourceCode
datetime         PriceDate

example of a some rows

UploadDate   SecID    FundCode   Name   Price   Nominal   SourceCode   PriceDate
2015-08-20   A045     ABCVPL     Joe    1.3434  1000.33   3
2015-08-20   A563     ABCVPL     Bob    1.5961  10.33     3
2015-08-20   A045     DEFGHJ     Joe    1.3434  856.41    3
2015-08-20   XC45     PLMNOI     Pip    2.3654  25.52     3
2015-08-20   KMM5     ABCVPL     Nit    6.9565  1532      3
2015-08-21   A045     ABCVPL     Joe    4.3434  1112      3
2015-08-21   GH45     DEFGHJ     Joe    3.3434  16532     3
2015-08-21   PL34     DEFGHJ     Joe    7.3434  635       3
2015-08-21   ER33     ABCVPL     Joe    8.3434  6320      3

解决方案

The question appears to confuse two different concepts. The first is a primary key, the second is a clustered index. The first is a logical concept, the latter is a physical concept and refers to how the data is actually stored. There are cases when it is useful to decouple the primary key and clustering key, but for the most part they are one and the same, and by default your primary key will be your clustering key. It is an important distinction nonetheless.

I think people can (and have) argued until the cows come home about whether to use a natural or surrogate primary key. I won't touch upon this too much, but the basic is what you are suggesting when using the 3 columns that would define a unique row is a natural key (i.e. already exists in your data), and another approach is to use an identity column, which will give each row a unique value, this is a surrogate key since it has no actual meaning other than to uniquely identify your row.

So I know can create a primary key over multiple columns so in my case it would be the 3 above. How though does having a primary key on my table help to improve the performance of select queries?

It doesn't, having an index might help depending on your queries. Given the right index the database engine is able to navigate directly to the required data.

Someone mentioned that I should just create an integer column with incrementing numbers and make that the primary key - I can't see how that can help when running a select query as the new field won't have any meaning & wouldn't be used in any select query or where clause of the query?

This is a good candidate for the clustering key. According to the queen of indexing Kimberly Tripp a clustered index should be:

  • Unique
  • Narrow
  • Static
  • Ever increasing pattern

You have already ticked the unique box, with your 3 columns, this is not that narrow, but not wide by any means. The second I can't answer, if UploadDate is a default value that is entered at the time of creation then you could have an ever increasing pattern, and I have no idea if your three columns are static or they could change. If either of these last two are true then you should be using a surrogate identity column to cluster on regardless.

I would personally probably have eliminated this as a candidate for a clustering key based on the with (26 bytes). You have an extra 4 bytes per row in the clustered index, but you save 22 bytes per row in all subsequent indexes.

So in a table of 10,000,000 rows you gain an extra 38.1 MB due to the identity column, however you gain 209.8MB for each non clustered index, although disk space is cheap, it is not a reason to waste it unnecessarily. It is not just all indexes that gain these 22 bytes, it is also all referencing tables with foreign keys, which leads to my next point, convenience when writing queries. Do you really want to have to type out this join each time you refer to the key:

SELECT  *
FROM    Parent AS p
        INNER JOIN Child AS c
            ON c.UploadDate = p.UploadDate
            AND c.SecID = p.SecID
            AND c.FundCode = p.FundCode;

Or would you rather simply write:

SELECT  *
FROM    Parent AS p
        INNER JOIN Child AS c
            ON c.ParentID = p.ParentID;

For this reason, even if I have decided that what is logically a primary key is not a good candidate for a clustering key, I tend to still make the clustering key the primary key for ease of reference in relation tables. For example I have an external API that sends me order details in XML:

<orders>
    <order ID="12B47EF2-B9F5-4CD7-811F-2E7EC1A67E59">
        <orderdetail>
            <product>Some Product</product>
            <quantity>1</quantity</quantity>
        </orderdetail>
        <orderdetail>
            <product>Some Other Product</product>
            <quantity>2</quantity</quantity>
        </orderdetail>
    </order>
    <order ID="3A819217-49CA-4B4C-8AD5-CAD297FCA3F3">
        <etc />
    </order>
</orders>

If I was setting up my tables to store this, although the ID from XML would be the logical primary key for my Orders table, it would be a terrible clustering key, so I would add a surrogate identity field to avoid the fragmentation associated with clustering on a GUID:

CREATE TABLE dbo.Orders
(
        OrderID INT IDENTITY NOT NULL,
        SupplierOrderID UNIQUEIDENTIFIER NOT NULL,

    CONSTRAINT PK_Orders__SupplierOrderID PRIMARY KEY NONCLUSTERED (SupplierOrderID)
);
CREATE UNIQUE CLUSTERED INDEX UQ_Orders__OrderID ON dbo.Orders (OrderID);

The GUID is still the primary key, so my order detail table can refer to this, but I generally think that if I don't consider the key good enough to cluster on, why would I then put the same key into another table as a foreign key. I have already defined a more narrow key in OrderID, why not just use this as my foreign key in order details, and save myself 12 bytes. So I would end up with:

CREATE TABLE dbo.Orders
(
        OrderID INT IDENTITY NOT NULL,
        SupplierOrderID UNIQUEIDENTIFIER NOT NULL,

    CONSTRAINT PK_Orders__OrderID PRIMARY KEY CLUSTERED (OrderID)
);
CREATE UNIQUE NONCLUSTERED INDEX UQ_Orders__SupplierOrderID ON dbo.Orders (SupplierOrderID);

As with everything, there are exceptions, and there are cases where I would choose the 3 columns as a composite (clustered) primary key, and this woule be if I knew there would be no child tables, and that all my select queries would still require me to select UploadedDate, SecID, and FundCode. If you had an index on Name for example:

CREATE NONCLUSTERED INDEX IX_YourTable__Name ON dbo.YourTable (Name);

SELECT  UploadDate, SecID, FundCode, Name
FROM    dbo.YourTable
WHERE   Name = 'Bob';

If you have a surrogate key, then you will seek through the name index and find Bob at row 2 only, then lookup row 2 on your clustered index to get the corresponding values for UploadedDate, SecID, and FundCode. If these three columns are your clustering key, then you remove the need for the lookup since you already have the data in the name index. The extra 209.8MB on each index could be worth it to avoid these lookup operations.

In summary (as usual), it depends - it depends on both your personal preference (I believe Aaron Bertrand and Joe Celko are still at loggerheads on the natural vs surrogate key debate, and if these two great minds can't agree, then the answer really has to be personal preference), and also your exact situation, in some situations you will want a composite primary key, in some instances you will want a surrogate key, in some instances you will want your primary key and your clustering key to be the same thing, in other instances you won't.

这篇关于了解主键的工作原理如何使用它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆