复合主键与唯一对象 ID 字段 [英] Composite primary keys versus unique object ID field

查看:27
本文介绍了复合主键与唯一对象 ID 字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我继承了一个基于复合键比使用唯一对象 ID 字段更理想的想法构建的数据库,并且在构建数据库时,单个唯一 ID 应该永远 用作主键.因为我正在为这个数据库构建一个 Rails 前端,所以我在让它符合 Rails 约定时遇到了困难(尽管可以使用自定义视图和一些额外的 gem 来处理复合键).

I inherited a database built with the idea that composite keys are much more ideal than using a unique object ID field and that when building a database, a single unique ID should never be used as a primary key. Because I was building a Rails front-end for this database, I ran into difficulties getting it to conform to the Rails conventions (though it was possible using custom views and a few additional gems to handle composite keys).

这种特定模式设计背后的推理与数据库如何以低效的方式处理 ID 字段以及在构建索引时树排序存在缺陷有关.这个解释缺乏任何深度,我仍在尝试围绕这个概念(我熟悉使用复合键,但不是 100% 的时间).

The reasoning behind this specific schema design from the person who wrote it had to do with how the database handles ID fields in a non-efficient manner and when it's building indexes, tree sorts are flawed. This explanation lacked any depth and I'm still trying to wrap my head around the concept (I'm familiar with using composite keys, but not 100% of the time).

任何人都可以提供意见或对此主题进行更深入的讨论吗?

Can anyone offer opinions or add any greater depth to this topic?

推荐答案

大多数常用引擎(MS SQL Server、Oracle、DB2、MySQL 等)在使用代理键系统时不会遇到明显问题.有些人甚至可能会因为使用代理而体验到性能提升,但性能问题是高度特定于平台的.

Most of the commonly used engines (MS SQL Server, Oracle, DB2, MySQL, etc.) would not experience noticeable issues using a surrogate key system. Some may even experience a performance boost from the use of a surrogate, but performance issues are highly platform-specific.

一般而言,自然键(以及复合键)与代理键的争论由来已久,看不到可能的正确答案".

In general terms, the natural key (and by extension, composite key) verses surrogate key debate has a long history with no likely "right answer" in sight.

自然键(单数或复合)的参数通常包括以下一些内容:

The arguments for natural keys (singular or composite) usually include some the following:

1) 它们已经在数据模型中可用.大多数被建模的实体已经包含一个或多个属性或属性组合,这些属性或属性组合满足了创建关系所需的键的需要.为每个表添加额外的属性会带来不必要的冗余.

1) They are already available in the data model. Most entities being modeled already include one or more attributes or combinations of attributes that meet the needs of a key for the purposes of creating relations. Adding an additional attribute to each table incorporates an unnecessary redundancy.

2) 它们消除了对某些联接的需要.例如,如果您的客户有客户代码,而发票有发票编号(两者都是自然"键),并且您想要要检索特定客户代码的所有发票编号,您只需使用 "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'".在经典的代理键方法中,SQL 看起来像这样:"SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'".

2) They eliminate the need for certain joins. For example, if you have customers with customer codes, and invoices with invoice numbers (both of which are "natural" keys), and you want to retrieve all the invoice numbers for a specific customer code, you can simply use "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'". In the classic surrogate key approach, the SQL would look something like this: "SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'".

3) 它们有助于提供一种更普遍适用的数据建模方法.使用自然键,可以在不同 SQL 引擎之间基本不变地使用相同的设计.许多代理键方法使用特定的 SQL 引擎技术来生成密钥,因此需要更加专业化的数据模型才能在不同平台上实现.

3) They contribute to a more universally-applicable approach to data modeling. With natural keys, the same design can be used largely unchanged between different SQL engines. Many surrogate key approaches use specific SQL engine techniques for key generation, thus requiring more specialization of the data model to implement on different platforms.

代理键的参数倾向于围绕特定于 SQL 引擎的问题:

Arguments for surrogate keys tend to revolve around issues that are SQL engine specific:

1) 当业务需求/规则发生变化时,它们可以更轻松地更改属性.这是因为它们允许将数据属性隔离到单个表中.这主要是 SQL 引擎无法有效实现标准 SQL 构造(例如 DOMAIN)的问题.当属性由 DOMAIN 语句定义时,可以使用 ALTER DOMAIN 语句在架构范围内执行对属性的更改.不同的 SQL 引擎对于更改域具有不同的性能特征,并且一些 SQL 引擎根本没有实现 DOMAINS,因此数据建模者通过添加代理键来提高对属性进行更改的能力来弥补这些情况.

1) They enable easier changes to attributes when business requirements/rules change. This is because they allow the data attributes to be isolated to a single table. This is primarily an issue for SQL engines that do not efficiently implement standard SQL constructs such as DOMAINs. When an attribute is defined by a DOMAIN statement, changes to the attribute can be performed schema-wide using an ALTER DOMAIN statement. Different SQL engines have different performance characteristics for altering a domain, and some SQL engines do not implement DOMAINS at all, so data modelers compensate for these situations by adding surrogate keys to improve the ability to make changes to attributes.

2) 它们比自然键更容易实现并发.在自然键情况下,如果两个用户同时使用相同的信息集,例如客户行,并且其中一个用户修改自然键值,那么第二个用户的更新将失败,因为他们正在更新的客户代码不再存在于数据库中.在代理键的情况下,更新将成功处理,因为不可变的 ID 值用于标识数据库中的行,而不是可变的客户代码.然而,允许第二次更新并不总是可取的——如果客户代码改变了,第二个用户可能不应该被允许继续他们的更改,因为行的实际身份"已经改变——第二个用户可能正在更新错误的行.代理键和自然键本身都不能解决这个问题.必须在密钥的实现之外解决全面的并发解决方案.

2) They enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes. However, it is not always desirable to allow the second update – if the customer code changed it is possible that the second user should not be allowed to proceed with their change because the actual "identity" of the row has changed – the second user may be updating the wrong row. Neither surrogate keys nor natural keys, by themselves, address this issue. Comprehensive concurrency solutions have to be addressed outside of the implementation of the key.

3) 它们的性能优于自然键.性能最直接受 SQL 引擎影响.由于 SQL 引擎的数据存储和检索机制,使用不同 SQL 引擎在相同硬件上实现的相同数据库模式通常会具有截然不同的性能特征.一些 SQL 引擎非常接近平面文件系统,当相同的属性(例如客户代码)出现在数据库模式的多个位置时,数据实际上是冗余存储的.当需要对数据或架构进行更改时,SQL 引擎的这种冗余存储可能会导致性能问题.其他 SQL 引擎在数据模型和存储/检索系统之间提供了更好的分离,允许更快地更改数据和架构.

3) They perform better than natural keys. Performance is most directly affected by the SQL engine. The same database schema implemented on the same hardware using different SQL engines will often have dramatically different performance characteristics, due to the SQL engines data storage and retrieval mechanisms. Some SQL engines closely approximate flat-file systems, where data is actually stored redundantly when the same attribute, such as a Customer Code, appears in multiple places in the database schema. This redundant storage by the SQL engine can cause performance issues when changes need to be made to the data or schema. Other SQL engines provide a better separation between the data model and the storage/retrieval system, allowing for quicker changes of data and schema.

4) 代理键在某些数据访问库和 GUI 框架中运行得更好.由于大多数代理键设计的同质性(例如:所有关系键都是整数),数据访问库、ORM和 GUI 框架可以处理信息,而无需对数据有特殊的了解.自然键由于其异构性质(不同的数据类型、大小等),不能很好地与自动化或半自动化工具包和库配合使用.对于特殊场景,例如嵌入式 SQL 数据库,在设计数据库时考虑到特定的工具包可能是可以接受的.在其他场景中,数据库是企业信息资源,由多个平台、应用程序、报告系统和设备同时访问,因此在设计时专注于任何特定的库或框架时无法很好地发挥作用.此外,当引入下一个伟大的工具包时,旨在与特定工具包一起使用的数据库会成为一种负担.

4) Surrogate keys function better with certain data access libraries and GUI frameworks. Due to the homogeneous nature of most surrogate key designs (example: all relational keys are integers), data access libraries, ORMs, and GUI frameworks can work with the information without needing special knowledge of the data. Natural keys, due to their heterogeneous nature (different data types, size etc.), do not work as well with automated or semi-automated toolkits and libraries. For specialized scenarios, such as embedded SQL databases, designing the database with a specific toolkit in mind may be acceptable. In other scenarios, databases are enterprise information resources, accessed concurrently by multiple platforms, applications, report systems, and devices, and therefore do not function as well when designed with a focus on any particular library or framework. In addition, databases designed to work with specific toolkits become a liability when the next great toolkit is introduced.

我倾向于支持自然键(显然),但我并不狂热.由于我工作的环境,我帮助设计的任何给定数据库都可能被各种应用程序使用,我使用自然键进行大部分数据建模,并且很少引入代理.但是,我不会特意尝试重新实现使用代理的现有数据库.代理键系统运行良好 - 无需更改已经运行良好的系统.

I tend to fall on the side of natural keys (obviously), but I am not fanatical about it. Due to the environment I work in, where any given database I help design may be used by a variety of applications, I use natural keys for the majority of the data modeling, and rarely introduce surrogates. However, I don’t go out of my way to try to re-implement existing databases that use surrogates. Surrogate-key systems work just fine – no need to change something that is already functioning well.

有一些优秀的资源讨论了每种方法的优点:

There are some excellent resources discussing the merits of each approach:

http://www.google.com/search?q=自然+键+代理+键

http://www.agiledata.org/essays/keys.html

http://www.informationweek.com/news/software/bi/201806814

这篇关于复合主键与唯一对象 ID 字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆