什么是将重复的行信息集合到进行数据库归一化时调用的新实体? [英] What is combining repeating sets of row information into new entities called when doing database normalization?

查看:121
本文介绍了什么是将重复的行信息集合到进行数据库归一化时调用的新实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对某个数据库的规范化有些困惑,并且认为我会问StackOverflow:

I'm a bit confused about a certain piece of database normalization and thought I'd ask StackOverflow:

想象一下,你有以下关系将产品与颜色相关联。请注意,产品1和产品2都使用相同的颜色(蓝色和绿色)。

Imagine you have the following relations that relate products to colors. Notice that Product 1 and Product 2 both use the same set of colors (Blue and Green).

Product_Color                         Color
╔════════════╦═════════════╗     ╔════════════╦═════════════╗
║ Product*   ║ Color*      ║     ║ ColorId*   ║ Name        ║
╠════════════╬═════════════╣     ╠════════════╬═════════════╣
║ 1          ║ 1           ║     ║ 1          ║ Blue        ║
║ 1          ║ 2           ║     ║ 2          ║ Green       ║
║ 2          ║ 1           ║     ╚════════════╩═════════════╝
║ 2          ║ 2           ║
╚════════════╩═════════════╝






如果我创建两个新关系, ColorSet ColorSet_Color ,我可以通过一起加入4个关系来显示相同​​的信息。


If I create two new relations, ColorSet and ColorSet_Color, I can display the same information by joining the 4 relations together.

Product_ColorSet:                 ColorSet_Color:             
╔════════════╦═════════════╗     ╔════════════╦═════════════╗ 
║ Product*   ║ ColorSetId* ║     ║ ColorSetId*║ ColorId*    ║ 
╠════════════╬═════════════╣     ╠════════════╬═════════════╣ 
║ 1          ║ 1           ║     ║ 1          ║ 1           ║ 
║ 2          ║ 1           ║     ║ 1          ║ 2           ║ 
╚════════════╩═════════════╝     ╚════════════╩═════════════╝ 

ColorSet:                         Color:
╔════════════╗                   ╔════════════╦═════════════╗
║ ColorSetId*║                   ║ ColorId*   ║ Name        ║
╠════════════╣                   ╠════════════╬═════════════╣
║ 1          ║                   ║ 1          ║ Blue        ║
║ 2          ║                   ║ 2          ║ Green       ║
╚════════════╝                   ╚════════════╩═════════════╝

此时如果我有一个大的Product_Color表,一个合理程度的共同的颜色组合,我将从空间角度获得相当大的收益。

At this point if I had a large Product_Color table, with a reasonable degree of shared groups of colors, I would stand to gain considerably from a space perspective.

在数据库规范化的上下文中,该操作的技术名称是什么?即使我创建的实体实际上并不存在,我清楚地删除了冗余信息,这更是一个随机的机会,这是很多重叠。通过这样做,我正在改变什么?

What is the technical name for this operation in the context of database normalization? I'm clearly removing redundant information even though the entity I've created doesn't actually exist, it's rather more just random chance that there is a lot of overlap. What specifically am I changing by doing this?

此外,似乎我可以随意对大多数实体做这个事情。令我困惑的是,当我们开始锻炼时,Product_Color和Color已经处于第6个正常形式(对?)。

Furthermore, it seems like I could arbitrarily do this to most entities. What puzzles me is that Product_Color and Color are already in 6th normal form when we started the exercise (right?).

推荐答案

你正在将 替代密钥 (或标识符)引入到 >名称 / 识别产品进入的颜色集合。替代品通常被认为是自然键(或标识符)。 (虽然不同的人使用这些术语有所不同,例如,一个名字/标识符永久分配一个指称和/或是它的唯一名称/标识符时,可能只使用代理,例如有些人会说外部可见的系统 - 生成的任意名称/标识符,如驱动程序标识号,都是代理和自然的。)

You are introducing a "surrogate key" (or identifier) to name/identify sets of colours that products come in. The alternative is usually considered to be a "natural key (or identifier)". (Although different people use these terms differently in detail. Eg some might only use "surrogate" when a name/identifier is assigned a referent permanently and/or is its referent's only name/identifier. Eg some would say that an externally visible system-generated arbitrary name/identifier like a Driver Identification Number is both a surrogate and natural.)

代理键通常被称为无意义(标识符)。这反映出混乱的想法。所有不是由先前命名方案生成的名称是无意义的&任意的Nicholas在选择之前没有意思是你被选中,它就是你。这适用于任何名称/标识符。所以无意义/有意义不是一个有益的区别。系统中的代理名称/标识符只是系统启动后所选择的名称/标识符。在系统中称为有意义的(sic)在被分配时被称为无意义在以前存在的任何系统中(因为分配在之后开始)。

Surrogate keys are often called "meaningless (identifiers)". This reflects muddled thinking. All names not generated by an a priori naming scheme are "meaningless" & arbitrary. "Nicholas" did not "mean" you until it was chosen; having been chosen, it "means" you. This goes for any name/identifier. So "meaningless"/"meaningful" is not a helpful distinction. A surrogate name/identifier in a system is just one that got chosen after the system started. What gets called "meaningful" [sic] in a system would have been called "meaningless" [sic] when assigned in whatever system existed before (since assignment was after it started).

有一个透视图,您正在删除冗余信息但是它不是归一化所解决的那种冗余。你正在用其他表来替换一个表,但它不是正则化分解。引用代理不是正常化的一部分。规范化不会引入新的列名。它只是在替换它的表中重新使用原始表的名称。 (你能够清楚地和准确地描述你在这里的冗余的含义吗?)

There is a "perspective" in which you are "removing redundant information", but it's not the kind of redundancy that normalization addresses. You are replacing a table by other tables, but it's not normalization decomposition. Introduction of surrogates is not part of normalization. Normalization does not introduce new column names. It just reuses an original table's names in the tables that replace it. (Are you able to clearly and exactly describe just what you mean by "redundant" here?)

有时人们认为,如果相同的小数值可以出现不止一次在列集或表中,那些这些子值需要被作为FK的ids代替到将id值映射到子值的新表。 (也许甚至对于单列子行,即当一个值在列或表中出现多次时)。他们认为多个子值出现是冗余的,或者只有id可以重复而不是冗余。 (id设计被看作是原始数据压缩的一种)。他们可能认为这是归一化的一部分。 这些都不是这样的。

Sometimes people think that if the same subtuple of values can appear more than once in a column set or table then those subrow values need to be replaced by ids that are FKs to a new table that maps id values to subrow values. (Maybe even for single-column subrows, ie when a single value appears more than once in a column or table.) They think that multiple subrow value appearances are "redundant" or that only ids can repeat without being "redundant". (The id design is seen as a kind of data compression of the original.) They may think that this is part of normalization. None of this is so.

这不是冗余你应该打扰通过桌面设计来解决。 如果您知道DBMS 的表格的实现选项,您就可以了解应用程序的使用模式并且有意义地比某些选项更糟糕,这些选项恰好是少量冗余(为什么更多冗余选项不会更好?)然后您应该告诉DBMS您想要的设计选项如果可以的话,不改变架构。 (这通常通过索引和/或视图完成。)例如,在ColorId上索引原始的Product_Color可以实现与您在第二个设计中手工创建的实现基本相同的结构,但是自动生成和管理。 (您可能会为其他原因引入替代品,例如更简洁地替换多列外键,尽管更为晦涩难懂,有限制。)

This is not redundancy that you should bother to address via table design. If you know the implementation options of your tables by the DBMS and you know the usage patterns of your application and you know that the original is demonstrably and meaningfully worse than some option that happens to be "less redundant" (and why wouldn't a "more redundant" option be better?) then you should tell the DBMS what option you want for your design without changing the schema if you can. (This is typically done via indexes and/or views.) Eg indexing your original Product_Color on ColorId leads to essentially the same structure in the implementation as you have created by hand in your second design, but automatically generated and managed. (You might introduce surrogates for other reasons, eg to replace multiple-column foreign keys by more concise although more obscurely valued and constrained ones.)

Re选项:您的新设计将在查询文本中使用更多操作(例如连接和投影),并且(对于典型的DBMS实现)执行而不是原始的(例如,查询原始表),但是更少的(例如,将一个产品的颜色集复制到另一个)。所以再一次是关于多个观点的权衡。

Re options: Your new design will use more operations (eg joins and projections) in query text and (for typical DBMS implementations) execution than the original (eg to query for the original table) but fewer elsewhere (eg in copying one product's colour set to another's). So again it is all about tradeoffs of multiple "perspectives".

其实你在另一个意义上引入冗余与代理。还有一些列包含一些不在原来的id值,但是记录了相同的情况。您还为用户增加了更多命名和引用的设计。代理设计肯定在这个透视图中与原始设计有很多冗余信息。

In fact you have in another sense introduced redundancy with the surrogates. There are additional columns holding a bunch of id values that are not in the original yet that record the same situations. You have also burdened the user with a design with more naming and indirections. The surrogate design certainly has a lot of "redundant information" in this "perspective" compared to the original.

即使你的起始设计可能已经引入了代理,即颜色颜色名称。 (如果颜色添加了信息,即通知您不仅仅是相关联的名称,那么它们将不是代理,并且将是必要的。)即使颜色id被任意选择,那么您可以只需要:

Even your starting design has probably introduced surrogates, namely colour ids of color names. (If color ids added "information", ie "informed" you beyond just their associated names, then they would not be surrogates and would be necessary.) Ie if colour ids are chosen arbitrarily then you could just have:

Product_Color
╔════════════╦═════════════╗
║ Product*   ║ ColorName*  ║
╠════════════╬═════════════╣
║ 1          ║ Blue        ║
║ 1          ║ Green       ║
║ 2          ║ Blue        ║
║ 2          ║ Green       ║
╚════════════╩═════════════╝

你应该有一个原因介绍颜色标识,对于这个问题,产品ID,而不是已经存在的自然键。你可以证明你的多个表格,名称和排列方式只是一个吗?

You should have a reason to introduce colour ids, and for that matter product ids, rather than natural keys already existing. Can you justify your multiple tables, names and indirections vs just one?

这篇关于什么是将重复的行信息集合到进行数据库归一化时调用的新实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆