复合vs代理在6NF中的参考完整性键 [英] Composite vs Surrogate keys for Referential Integrity in 6NF

查看:142
本文介绍了复合vs代理在6NF中的参考完整性键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

取三层信息:



第1层:信息



c $ c> UNIQUE 自然索引和可轻松转移的代理键。

 表Surnames: 

+ ----------------------------- + ----------- --- +
| ID(自动增量,PK)|姓|
+ ----------------------------- + -------------- +
| 1 |史密斯|
| 2 | Edwards |
| 3 |棕色|
+ ----------------------------- + + -------------- +

表FirstNames

+ ----------------------------- + - ------------- +
| ID(自动增量,PK)| FirstName |
+ ----------------------------- + -------------- +
| 1 | John |
| 2 | Bob |
| 3 |玛丽|
| 4 | Kate |
+ ----------------------------- + -------------- +

自然键



或者,上面的两个表可以没有 ID 并且使用Surname和FirstName作为自然主键,如Mike Sherrill所解释。在这种情况下,假设下面的层引用 varchar ,而不是 int



第2层:人



在此图层中,使用复合索引。该值可以是 UNIQUE PRIMARY ,取决于是否使用代理键作为主键。

  + ----------------- + ---------- ---- + 
| FirstName |姓氏
+ ----------------- + -------------- +
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 3 | 1 |
| 4 | 2 |
| ... | ... |
+ ----------------- + -------------- +



第3层:父级



在此图层中,人们之间的关系通过 ParentsOf 表。

  ParentsOf 

+ -------------- + ----------------- +
|人| PersonParent |
+ ----------------- + ----------------- +

OR

+ ----------------- + ----------------- + ----- ------------ + ----------------- +
| PersonFirstName | PersonSurname | ParentFirstName | ParentSurname |
+ ----------------- + ----------------- + --------- -------- + ----------------- +



问题



假设引用完整性对我来说非常重要,我会有 FOREIGN KEYS 这些索引,以便我保持数据库负责监控自己的完整性在这方面,如果我使用一个ORM,它将是一个像 Doctrine ,其本身支持复合主键...



请帮助我理解:





  • 使用复合键与代理键进行交换的列表




我不感兴趣听到哪个更好,因为我知道专业人士在这个问题上存在着重大的分歧,这将引发一场宗教战争。相反,我要求,非常简单和客观的人类可能,通过传递代理键到每个层与维护主键(自然/复合,或代理/复合)将采取什么折中。任何人都可以在SO和其他网站上找到 始终 使用代理键的人。



编辑:有人指出, surname示例是使用6NF的一个较差示例。为了保持问题的完整性,我要离开它。如果你想象这个的用例,一个更好的可能是杂货项目的列表。 AKA:

  + ----------------------- ------ + -------------- + 
| ID(自动增量,PK)|杂货|
+ ----------------------------- + -------------- +
| 1 |海绵|
| 2 |番茄汤|
| 3 |冰淇淋|
| 4 |柠檬|
| 5 |草莓|
| 6 |鞭打奶油|
+ ----------------------------- + -------------- +

+ ----------------------------- + ----------- --- +
| ID(自动增量,PK)|品牌|
+ ----------------------------- + + -------------- +
| 1 | Bright |
| 2 |本&杰瑞的|
| 3 |商店品牌|
| 4 | Campbell的|
| 5 |酷鞭|
+ ----------------------------- + -------------- +

自然复合键示例:

  + ----------------------------- + ----------- --- + 
|杂货|品牌|
+ ----------------------------- + -------------- +
|海绵| Bright |
|冰淇淋|本&杰瑞的|
|冰淇淋|商店品牌|
|番茄汤| Campbell的|
|番茄汤|商店品牌|
|柠檬|商店品牌|
|鞭打奶油|酷鞭|
+ ----------------------------- + -------------- +

建议配对

 code> + ----------------- + ----------------- + -------- --------- + ----------------- + 
| Grocery1 |品牌1 | Grocery2 | Brand2 |
+ ----------------- + ----------------- + --------- -------- + ----------------- +

重申一下,这也只是一个例子。这不是我建议继续进行,但它应该有助于说明我的问题。



这种方法有缺陷。我将重申,这个问题是要求走下面每个方法的优点和缺点,而不是突出一个比另一个更好。我相信大多数人都能够回顾这个具体例子的可疑性质来回答核心问题。这个编辑是为那些不能。



下面有一些非常好的答案,如果你对哪个方向感到好奇,请阅读。



END EDIT



谢谢!

解决方案>

以下是一些取舍:



单个代理(人工创建):




  • 所有子表外键仅需要一个列来引用主键。


  • 很容易更新表中的自然键而不需要使用外键更新每个子表。


  • 较小的主键/外键索引(即不是宽)这可以使数据库运行更快,例如,当在父表中删除记录时,需要搜索子表以确保这不会创建孤立。窄索引扫描速度更快(只是可见)。


  • 您将有更多的索引,因为您很可能还希望索引数据中存在的任何自然键。




自然复合键控表:




  • 数据库


  • 更少/更快的数据库中


  • 更新化合物中的一个键需要每个子表也更新

    >



然后还有另一个类别:人工复合主键



只有发现一个实例,这是有道理的。例如,假设您有一个数据库,其中存储了50,000个客户端的数据,而每个客户端都不是应该使用的。以查看其他客户端的数据 - 在Web应用程序开发中非常常见。



如果每个记录都标有 client_id 字段,那么您正在创建一个行级安全环境。大多数数据库具有在正确设置时执行行级安全性的工具。



首先要做的是设置主键和外键。通常,具有 id 字段作为主键的表。通过添加 client_id ,密钥现在是复合键。并且需要将 client_id 携带到所有子表。



复合键基于2个代理键,是一种防弹方式,可确保客户端之间和数据库内的数据完整性。



之后,您将创建视图(或者如果使用Oracle EE设置虚拟专用数据库)和其他各种结构,以允许数据库执行行级安全性所有它拥有)。



授予此数据结构不再归一化到第n度。每个pk / fk中的 client_id 字段将正常模型非规范化。该模型的优点是易于在数据库级别执行行级安全性(这是数据库应该做的)。每个选择,插入,更新,删除被限制为当前设置的 client_id 。数据库具有会话感知



摘要



代理键始终是安全的赌注。它们需要更多的工作来设置和需要更多的存储。



我认为最大的好处是:





  • 当数据被打乱时,它会在一些表中被更新。


  • 查询性能(查询性能)由于编程错误,代理键使得清理更加容易,并且在某些情况下只能执行,因为有代理键。因为数据库能够搜索属性以定位s.key,然后通过单个数字键连接所有子表。




Natural Keys特别是复合NKeys使编写代码变得很痛苦。当你需要连接4个表时,where子句将比使用单个SKey时更长(更容易混乱)。



代理键是安全路由。自然键在几个地方是有益的,我会说大约1%的表中的数据库。


Take three layers of information:

Layer 1: Information

This layer contains data with UNIQUE natural indexes and a surrogate key that is easily transferrable.

Table Surnames:

+-----------------------------+--------------+
|    ID (Auto Increment, PK)  |    Surname   |
+-----------------------------+--------------+
|               1             |     Smith    |
|               2             |    Edwards   |
|               3             |     Brown    |
+-----------------------------+--------------+

Table FirstNames

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   FirstName  |
+-----------------------------+--------------+
|               1             |     John     |
|               2             |     Bob      |
|               3             |     Mary     |
|               4             |     Kate     |
+-----------------------------+--------------+

Natural Keys

Alternatively, the two tables above can be without ID and utilize Surname and FirstName as Natural Primary Keys, as explained by Mike Sherrill. In this instance, assume the layer below references varchar rather than int.

Layer 2: People

In this layer a composite index is used. This value can be UNIQUE or PRIMARY, depending on whether a surrogate key is utilized as the Primary Key.

+-----------------+--------------+
|    FirstName    |    LastName  |
+-----------------+--------------+
|        1        |       2      |
|        1        |       3      |
|        2        |       3      |
|        3        |       1      |
|        4        |       2      |
|       ...       |      ...     |
+-----------------+--------------+

Layer 3: Parents

In this layer, relationships between people are explored through a ParentsOf table.

ParentsOf

+-----------------+-----------------+
|      Person     |   PersonParent  |
+-----------------+-----------------+

 OR

+-----------------+-----------------+-----------------+-----------------+
| PersonFirstName |  PersonSurname  | ParentFirstName |  ParentSurname  |
+-----------------+-----------------+-----------------+-----------------+

The Question

Assuming that referential integrity is VERY important to me at its very core, and I will have FOREIGN KEYS on these indexes so that I keep the database responsible for monitoring its own integrity on this front, and that, if I were to use an ORM, it would be one like Doctrine which has native support for Compound Primary Keys...

Please help me to understand:

  • The list of trade-offs that take place with utilizing surrogate keys vs. natural keys on the 1st Layer.

  • The list of trade-offs that take place with utilizing compound keys vs. surrogate keys on the 2nd Layer which can be transferred over to the 3rd Layer.

I am not interested in hearing which is better, because I understand that there are significant disagreements among professionals on this topic and it would be sparking a religious war. Instead, I am asking, very simply and as objectively as is humanly possible, what trade-offs will you be taking by passing surrogate keys to each Layer vs maintaining Primary keys (natural/composite, or surrogate/composite). Anyone will be able to find someone saying NEVER or ALWAYS use surrogate keys on SO and other websites. Instead, a reasoned analyses of trade-offs is what I will most appreciate in your answers.

EDIT: It has been pointed out that a surname example is a poor example for a use of 6NF. For the sake of keeping the question intact, I am going to leave it be. If you are having trouble imagining the use case for this, a better one might be a list of "Grocery Items". AKA:

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Grocery    |
+-----------------------------+--------------+
|               1             | Sponges      |
|               2             | Tomato Soup  |
|               3             | Ice Cream    |
|               4             | Lemons       |
|               5             | Strawberries |
|               6             | Whipped Cream|
+-----------------------------+--------------+

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Brand      |
+-----------------------------+--------------+
|               1             | Bright       |
|               2             | Ben & Jerry's|
|               3             | Store Brand  |
|               4             | Campbell's   |
|               5             | Cool Whip    |
+-----------------------------+--------------+    

Natural Composite Key Example:

+-----------------------------+--------------+
|           Grocery           |   Brand      |
+-----------------------------+--------------+
|           Sponges           | Bright       |
|           Ice Cream         | Ben & Jerry's|
|           Ice Cream         | Store Brand  |
|           Tomato Soup       | Campbell's   |
|           Tomato Soup       | Store Brand  |
|           Lemons            | Store Brand  |
|           Whipped Cream     | Cool Whip    |
+-----------------------------+--------------+ 

Recommended Pairings

+-----------------+-----------------+-----------------+-----------------+
|     Grocery1     |  Brand1        | Grocery2        |  Brand2         |
+-----------------+-----------------+-----------------+-----------------+

To reiterate, this is also just an example. This is not how I would recommend proceeding, but it should help to illustrate my question.

There ARE shortfalls to this method. I'll reiterate that this question was to request walking through the benefits and drawbacks of each method below, not to highlight one as better than another. I believe most people were able to look past the questionable nature of this specific example to answer the core question. This edit is for those that cannot.

There are some very good answers below and if you are curious about which direction to go, please read them.

END EDIT

Thank you!

解决方案

Here's some trade-offs:

Single Surrogate (artificially created):

  • All child tables foreign keys only need a single column to reference the primary key.

  • Very easy to update the natural keys in table without needing to update every child table with foreign keys

  • Smaller primary/foreign key indexes (ie. not a wide) This can make the database run faster, for example when a record is deleted in a parent table, the child tables need to be searched to make sure this will not create orphans. Narrow indexes are faster to scan (just sightly).

  • you will have more indexes because you most likely will also want to index whatever natural keys exists in the data.

Natural composite keyed tables:

  • fewer indexes in the database

  • less columns in the database

  • easier/faster to insert a ton of records as you will not need to grab the sequence generator

  • updating one of the keys in the compound requires that every child table also be updated.

Then there is another category: artificial composite primary keys

I've only found one instance where this makes sense. When you need to tag every record in every table for row level security.

For example, suppose you had an database which stored data for 50,000 clients and each client was not supposed to see other client's data--very common in web application development.

If each record was tagged with a client_id field, you are creating a row level security environment. Most databases have the tools to enforce row level security when setup correctly.

First thing to do is setup primary and foreign keys. Normally a table with have an id field as the primary key. By adding client_id the key is now composite key. And it is necessary to carry client_id to all child table.

The composite key is based on 2 surrogate keys and is a bulletproof way to ensure data integrity among clients and within the database a whole.

After this you would create views (or if using Oracle EE setup Virtual Private Database) and other various structures to allow the database to enforce row level security (which is a topic all it own).

Granted that this data structure is no longer normalized to the nth degree. The client_id field in each pk/fk denormalizes an otherwise normal model. The benefit of the model is the ease of enforcing row level security at the database level (which is what databases should do). Every select, insert, update, delete is restricted to whatever client_id your session is currently set. The database has session awareness.

Summary

Surrogate keys are always the safe bet. They require a little more work to setup and require more storage.

The biggest benefit in my opinion is:

  • Being able to update the PK in one table and all other child tables are instantaneously changed without ever being touched.

  • When data gets messed up--and it will at some point due to a programming mistake, surrogate keys make the clean up much much easier and in some cases only possible to do because there are surrogate keys.

  • Query performance is improved as the db is able to search attributes to locate the s.key and then join all child table by a single numeric key.

Natural Keys especially composite NKeys make writing code a pain. When you need to join 4 tables the "where clause" will be much longer (and easier to mess up) than when single SKeys were used.

Surrogate keys are the "safe" route. Natural keys are beneficial in a few places, I'd say around 1% of the tables in a db.

这篇关于复合vs代理在6NF中的参考完整性键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆