数据库中 ID 字段的 INT 与唯一标识符 [英] INT vs Unique-Identifier for ID field in database

查看:34
本文介绍了数据库中 ID 字段的 INT 与唯一标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SQL Server 2005(在不久的将来可能会使用 SQL Server 2008)为网站创建一个新数据库.作为应用程序开发人员,我见过许多数据库使用 integer(或 bigint 等)作为将用于关系的表的 ID 字段.但最近我也看到数据库使用 唯一标识符 (GUID) 作为 ID 字段.

I am creating a new database for a web site using SQL Server 2005 (possibly SQL Server 2008 in the near future). As an application developer, I've seen many databases that use an integer (or bigint, etc.) for an ID field of a table that will be used for relationships. But lately I've also seen databases that use the unique identifier (GUID) for an ID field.

我的问题是一个是否比另一个有优势?integer 字段在查询和加入等方面会更快吗?

My question is whether one has an advantage over the other? Will integer fields be faster for querying and joining, etc.?

更新:为了清楚起见,这是针对表中的主键.

UPDATE: To make it clear, this is for a primary key in the tables.

推荐答案

由于高随机性,GUID 作为聚集键存在问题.Paul Randal 在上一期 Technet 杂志问答专栏中解决了这个问题:I'我想使用 GUID 作为聚集索引键,但其他人认为它会导致索引的性能问题.这是真的吗?如果是,您能解释原因吗?

GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?

现在请记住,讨论是专门针对聚集索引的.您说您想将该列用作ID",不清楚您是将其作为聚集键还是只是主键.通常两者重叠,所以我假设您想将其用作聚集索引.我上面提到的文章链接中解释了为什么这是一个糟糕的选择.

Now bear in mind that the discussion is specifically about clustered indexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.

对于非聚集索引,GUID 仍然存在一些问题,但不像它们是表最左边的聚集键时那么大.同样,GUID 的随机性引入了页面拆分和碎片化,仅在非聚集索引级别(一个小得多的问题).

For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).

有许多关于 GUID 用法的都市传说根据它们的大小(16 字节)与 int(4 字节)相比谴责它们,并承诺如果使用它们会带来可怕的性能厄运.这有点夸张.在正确设计的数据模型上,大小为 16 的密钥仍然可以是非常出色的密钥.虽然比 int 大 4 倍会导致索引中更多低密度的非叶页,但这对于绝大多数表来说并不是真正的问题.b 树结构是一棵自然平衡的树,树遍历的深度很少成为问题,因此基于 GUID 键而不是 INT 键寻找值在性能上是相似的.叶页遍历(即表扫描)不会查看非叶页,GUID 大小对页大小的影响通常非常小,因为记录本身比引入的额外 12 个字节要大得多通过 GUID.所以我会接受基于是 16 个字节 vs. 4 个字节"的传闻建议,其中包含相当大的盐粒.逐案分析并确定大小影响是否真正产生影响:表中有多少其他列(即 GUID 大小对叶页有多大影响)以及有多少列引用正在使用它(即,有多少other 表将增加,因为它们需要存储更大的外键).

There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pages in indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depth of tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many other columns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many other tables will increase because of the fact they need to store a larger foreign key).

我在对 GUID 的一种临时防御中提出所有这些细节,因为它们最近受到了很多负面报道,有些是不应该的.它们有其优点,并且在任何分布式系统中都是不可或缺的(当您谈论数据移动时,无论是通过复制还是同步框架或其他方式).我见过基于 GUID 的坏名声做出的错误决定,因为他们在没有适当考虑的情况下被回避.但确实如此,如果您必须使用 GUID 作为聚集键,请确保解决随机性问题:尽可能使用顺序 guid.

I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guids when possible.

最后,回答您的问题:如果您没有具体使用 GUID 的理由,请使用 INT.

And finally, to answer your question: if you don't have a specific reason to use GUIDs, use INTs.

这篇关于数据库中 ID 字段的 INT 与唯一标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆