复合主键与唯一对象ID字段 [英] Composite primary keys versus unique object ID field

查看:162
本文介绍了复合主键与唯一对象ID字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我继承了一个使用复合键比使用唯一对象ID字段更理想的数据库,并且在构建数据库时,单个唯一ID应 从不 用作主键。因为我正在为这个数据库构建一个Rails前端,我遇到了困难,使它符合Rails约定(虽然可以使用自定义视图和一些额外的gem来处理复合键)。



这个特定的模式设计背后的原因是谁写了它的人有关于如何数据库以一种非高效的方式处理ID字段,当它建立索引时,树排序是有缺陷的。这个解释缺乏任何深度,我仍然试图包装我的头围绕这个概念(我熟悉使用复合键,但不是100%的时间)。



<任何人都可以提供意见或增加任何更深入的话题?

解决方案

大多数常用的引擎(MS SQL Server,Oracle,DB2,MySQL等)使用代理键系统。有些人甚至可能通过使用代理来提升性能,但性能问题是高度平台特定的。



一般来说,自然密钥复合键)与代理关键性辩论具有悠久的历史,不可能有正确答案。



自然键(单数或复合)的参数通常包括一些。大多数被建模的实体已经包括一个或多个属性或属性组合,这些属性包括:满足一个关键的需要,以创造关系的目的。



2)它们不需要某些连接。例如,如果您有具有客户代码的客户,以及具有发票号码(两者都是自然键)的发票,并且要检索特定客户代码的所有发票编号,则只需使用SELECT InvoiceNumber FROM发票WHERE CustomerCode ='XYZ123'。在经典的代理键方法中,SQL看起来像这样:SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode ='XYZ123'



3)它们有助于更广泛适用的数据建模方法。可以在不同的SQL引擎之间大部分不变。许多代理键方法使用特定的SQL引擎技术来生成密钥,因此需要更专业化的数据模型在不同的平台上实现。



代理键的参数往往围绕



1)当业务需求/规则更改时,可以更轻松地更改属性。这是因为它们允许数据属性被隔离到单个表。这主要是对于没有有效实现标准SQL结构(如DOMAIN)的SQL引擎而言的问题。当某个属性由DOMAIN语句定义时,可以使用ALTER DOMAIN语句对模式范围内的属性进行更改。不同的SQL引擎在更改域时具有不同的性能特性,而一些SQL引擎根本不实现DOMAINS,因此数据建模人员通过添加代理键来改善对属性进行更改的能力,从而弥补这些情况。



2)它们比自然键更容易实现并发。在自然关键的情况下,如果两个用户同时使用相同的信息集,行,并且其中一个用户修改自然键值,则第二用户的更新将失败,因为它们正在更新的客户代码不再存在于数据库中。在代理键的情况下,更新将成功处理,因为不可变的ID值用于标识数据库中的行,而不是可变的客户代码。然而,并不总是期望允许第二次更新 - 如果客户代码改变,则可能不允许第二用户继续他们的改变,因为行的实际身份已经改变 - 第二用户可能更新错误的行。代理键和自然键都不能解决这个问题。全面的并发解决方案必须在实现密钥之外解决。



3)它们的性能优于自然密钥。性能最直接受SQL引擎的影响。由于SQL引擎数据存储和检索机制,使用不同SQL引擎在相同硬件上实现的相同数据库模式通常具有显着不同的性能特性。一些SQL引擎非常接近平面文件系统,其中当相同属性(例如客户代码)出现在数据库模式的多个位置时,数据实际上被冗余存储。当需要对数据或模式进行更改时,SQL引擎的此冗余存储可能会导致性能问题。其他SQL引擎在数据模型和存储/检索系统之间提供了更好的分离,允许更快速地更改数据和模式。



4)代理键功能由于大多数代理键设计(例如:所有关系键都是整数)的同质属性,数据访问库,ORM和GUI框架可以处理信息,而不需要数据的特殊知识。自然键由于它们的异构性质(不同的数据类型,大小等),不能与自动化或半自动化的工具包和库一起工作。对于专用场景(例如嵌入式SQL数据库),设计具有特定工具包的数据库可能是可以接受的。在其他情况下,数据库是企业信息资源,由多个平台,应用程序,报告系统和设备并发访问,因此当设计时关注任何特定库或框架时,其功能不会很好。



我倾向于在自然键的侧面(显然),因为它是一个非常有用的工具包,但我不是狂热的。由于我工作的环境,任何给定的数据库我帮助设计可能被各种应用程序使用,我使用自然键为大多数数据建模,并很少引入代理。但是,我不想以我的方式尝试重新实现使用代理的现有数据库。



有一些优秀的资源讨论每种方法的优点:



http:// www。 google.com/search?q=natural+key+surrogate+key



http://www.agiledata.org/essays/keys.html



http://www.informationweek.com/news/software/bi/201806814


I inherited a database built with the idea that composite keys are much more ideal than using a unique object ID field and that when building a database, a single unique ID should never be used as a primary key. Because I was building a Rails front-end for this database, I ran into difficulties getting it to conform to the Rails conventions (though it was possible using custom views and a few additional gems to handle composite keys).

The reasoning behind this specific schema design from the person who wrote it had to do with how the database handles ID fields in a non-efficient manner and when it's building indexes, tree sorts are flawed. This explanation lacked any depth and I'm still trying to wrap my head around the concept (I'm familiar with using composite keys, but not 100% of the time).

Can anyone offer opinions or add any greater depth to this topic?

解决方案

Most of the commonly used engines (MS SQL Server, Oracle, DB2, MySQL, etc.) would not experience noticeable issues using a surrogate key system. Some may even experience a performance boost from the use of a surrogate, but performance issues are highly platform-specific.

In general terms, the natural key (and by extension, composite key) verses surrogate key debate has a long history with no likely "right answer" in sight.

The arguments for natural keys (singular or composite) usually include some the following:

1) They are already available in the data model. Most entities being modeled already include one or more attributes or combinations of attributes that meet the needs of a key for the purposes of creating relations. Adding an additional attribute to each table incorporates an unnecessary redundancy.

2) They eliminate the need for certain joins. For example, if you have customers with customer codes, and invoices with invoice numbers (both of which are "natural" keys), and you want to retrieve all the invoice numbers for a specific customer code, you can simply use "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'". In the classic surrogate key approach, the SQL would look something like this: "SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'".

3) They contribute to a more universally-applicable approach to data modeling. With natural keys, the same design can be used largely unchanged between different SQL engines. Many surrogate key approaches use specific SQL engine techniques for key generation, thus requiring more specialization of the data model to implement on different platforms.

Arguments for surrogate keys tend to revolve around issues that are SQL engine specific:

1) They enable easier changes to attributes when business requirements/rules change. This is because they allow the data attributes to be isolated to a single table. This is primarily an issue for SQL engines that do not efficiently implement standard SQL constructs such as DOMAINs. When an attribute is defined by a DOMAIN statement, changes to the attribute can be performed schema-wide using an ALTER DOMAIN statement. Different SQL engines have different performance characteristics for altering a domain, and some SQL engines do not implement DOMAINS at all, so data modelers compensate for these situations by adding surrogate keys to improve the ability to make changes to attributes.

2) They enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes. However, it is not always desirable to allow the second update – if the customer code changed it is possible that the second user should not be allowed to proceed with their change because the actual "identity" of the row has changed – the second user may be updating the wrong row. Neither surrogate keys nor natural keys, by themselves, address this issue. Comprehensive concurrency solutions have to be addressed outside of the implementation of the key.

3) They perform better than natural keys. Performance is most directly affected by the SQL engine. The same database schema implemented on the same hardware using different SQL engines will often have dramatically different performance characteristics, due to the SQL engines data storage and retrieval mechanisms. Some SQL engines closely approximate flat-file systems, where data is actually stored redundantly when the same attribute, such as a Customer Code, appears in multiple places in the database schema. This redundant storage by the SQL engine can cause performance issues when changes need to be made to the data or schema. Other SQL engines provide a better separation between the data model and the storage/retrieval system, allowing for quicker changes of data and schema.

4) Surrogate keys function better with certain data access libraries and GUI frameworks. Due to the homogeneous nature of most surrogate key designs (example: all relational keys are integers), data access libraries, ORMs, and GUI frameworks can work with the information without needing special knowledge of the data. Natural keys, due to their heterogeneous nature (different data types, size etc.), do not work as well with automated or semi-automated toolkits and libraries. For specialized scenarios, such as embedded SQL databases, designing the database with a specific toolkit in mind may be acceptable. In other scenarios, databases are enterprise information resources, accessed concurrently by multiple platforms, applications, report systems, and devices, and therefore do not function as well when designed with a focus on any particular library or framework. In addition, databases designed to work with specific toolkits become a liability when the next great toolkit is introduced.

I tend to fall on the side of natural keys (obviously), but I am not fanatical about it. Due to the environment I work in, where any given database I help design may be used by a variety of applications, I use natural keys for the majority of the data modeling, and rarely introduce surrogates. However, I don’t go out of my way to try to re-implement existing databases that use surrogates. Surrogate-key systems work just fine – no need to change something that is already functioning well.

There are some excellent resources discussing the merits of each approach:

http://www.google.com/search?q=natural+key+surrogate+key

http://www.agiledata.org/essays/keys.html

http://www.informationweek.com/news/software/bi/201806814

这篇关于复合主键与唯一对象ID字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆