(De)两个关系的归一化 [英] (De)Normalization of two relations

查看:135
本文介绍了(De)两个关系的归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读CJDate数据库系统简介或类似级别的书籍的人不应该对正常化和非规范化的定义有问题。



然而,内存不是什么曾经是,我发现自己经常看着一些设计,并说即使我找不到正常的形式,也没有正常化。



说明实际的例子是:



如果我们有关系



r1(A ,B,C) r2(A,D)



AB-> C和A-> D



r1 表示详细数据,而 r2 是该数据的概要(换句话说,D的每个实例都是r1中的值的函数,在本例中,它是根据r1中的A从小于C的小计)。



示例

  r1 = 
ABC
1 1 10
1 2 20
2 1 10
2 2 25

r2 =
AD
1 30
2 35

所以,即使我不能说它打破了例如2NF或3NF,我似乎被卡住关于设计在以下意义上仍然被非规范化的想法(从Codd,EF数据库关系模型的进一步规范化,p。 34,评论超出1NF正常化的原因):



  1. 释放关系的集合从不需要的插入,
    更新和删除依赖关系;

  2. 为了减少重组
    关系的需要,因为新类型的数据是
    引入,从而增加应用程序的使用寿命
    ;

  3. 使关系模型对用户有更多的信息;

  4. 为了使关系的收集中立到查询
    的统计信息,这些统计数据是
    随着时间的推移而变化。




所以问题是


  1. 我们可以叫r1和r2进行非规范化吗?

  2. 如果是,为什么?如果没有,为什么? (根据哪个规则或根据哪个定义?)

注意:

对于那些发现有问题的人来说,我可以提供一些有价值的东西,或是以某种具体的假设和结论来表达(或者换句话说,如果你要去在您的意见中,请遵循一些推理)。



编辑
我接受了dportas答案。我会尝试在这里添加一点:$ b​​ $ b CJDate可以做出明确和严格的区分:


设计理论与
有关,减少冗余;标准化
减少了相关资源中的冗余,
正交性减少了
relvars。


a href =http://books.google.com/books?id=FU7uuHc3oNcC&lpg=PA151&ots=Rr_7_3VEMt&dq=Update%20Anomaly%20c.j.date&pg=PA158#v=onepage&q=更新%20Anomalies& f = falserel =noreferrer>深入数据库:从业者的关系理论



和下一页


正如所有的
方法的正常化一样,意味着冗余,并可能导致
某些异常,所以也可以
无法遵守正交性。



解决方案

假设AB是r1和A是r2中的关键,那么它似乎是6NF的模式。关系数据库字典(Date)将非规范化定义为:


替换一组relvars R1,R2,。 。
。,Rn通过它们的连接R,使得对于
,所有R对R i的
属性的投影保证为
等于Ri(i = 1, 2,...,n)。


从根本上说,归一化/非规范化是关于使用投影连接运算符。在这个例子中,你有一个由不同的运算符引起的冗余:summation。我预计原则上很可能形成除了投影和加入之外的运营商的规范化理论,甚至可以用于非关系函数,如求和。这不是常规定义的规范化,在没有任何可靠的基础的情况下,我认为我们应该按照上述报价中的Date定义的技术含义非规范化。


People who read C.J.Date's Introduction to Database System or books of similar level should not have problems with definition of normalization and denormalization.

However, memory is not what it used to be and I find myself often looking at some design and saying that it is not normalized even though I can not find which of the normal forms it is breaking.

The actual example that illustrate it is:

If we have relations

r1 (A, B, C) and r2 (A, D)

with FDs: AB->C and A->D

and r1 represent detailed data, while r2 is summary of that data (in another words each instance of D is a function of values in r1. in this example let it be subtotal of values C according to A from r1).

Example instance

r1 = 
A  B  C  
1  1  10
1  2  20
2  1  10
2  2  25

r2 =
A  D
1  30
2  35

So, even though I can not say that it breaks for example 2NF or 3NF, I seem to be stuck on the idea that the design is still denormalised in the following sense (from Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34, commenting on the reasons to normalize beyond 1NF):

  1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
  2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
  3. To make the relational model more informative to users;
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

As I can say, that if we define D as a sum of all Cs from r1 where A from r1 is equal to A from r2 then, if we update C in r1 and we don't update D in r2, we can end up with undesirable update dependency and the data ends up in inconsistent state I find this reason to call r1 and r2 denormalized and to think of them as denormalized. (In fact whole r2 is a function of r1 and bring zero new facts into the model; r2 = f(r1))

So the questions are

  1. can we call r1 and r2 denormalized?
  2. if yes, why? if not, why? (according to which rule? or according to which definition?)

NOTE:
To those who find the question(s) interesting enough to put in an answer, I kindly ask to provide either something quotable or to put it in a form of specific assumptions and conclusions (or in another words, if you are going to put in your opinion, please follow it with some reasoning).

EDIT I accepted dportas answer. I'll try to add a bit to it here: C.J.Date can makes a clear and strict distinction:

Much of design theory has to do with reducing redundancy; normalization reduces redundancy within relvars, orthogonality reduces it across relvars.

quoted from Database in depth: relational theory for practitioners

and on the next page

just as a failure to normalize all the way implies redundancy and can lead to certain anomalies, so too can a failure to adhere to orthogonality.

解决方案

Assuming AB is a key in r1 and A is a key in r2 then it seems that the schema is in 6NF. The Relational Database Dictionary (Date) defines denormalization as:

Replacing a set of relvars R1, R2, . . ., Rn by their join R, such that for all i the projection of R on the attributes of Ri is guaranteed to be equal to Ri (i = 1, 2, . . ., n).

Fundamentally, normalization/denormalization is about composition and nonloss decomposition using projection and join operators. In this example you have redundancy caused by a different operator: summation. I expect it would be quite possible in principle to form a theory of "normalization" for operators other than projection and join, perhaps even for non-relational functions like summation. That's not how normalization is conventionally defined however and in the absence of any sound basis for doing otherwise I think we ought to apply the technical meaning denormalization as defined by Date in the above quotation.

这篇关于(De)两个关系的归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆