什么是规范化(或归一化)? [英] What is Normalisation (or Normalization)?

查看:268
本文介绍了什么是规范化(或归一化)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据库家为什么要继续规范化?



是什么?它如何帮助?



是否适用于数据库以外的任何内容?

解决方案

规范化基本上是设计数据库模式,以避免重复和冗余数据。如果某些数据在数据库中重复了几个地方,则存在在一个地方而不是另一个更新的风险,导致数据损坏。



从正常形式到正常形式,有一些标准化级别。每个正常形式描述如何摆脱一些特定的问题,通常与冗余有关。



一些典型的归一化错误:



(1)在单元格中具有多个值。示例:

  UserId | Car 
---------------------
1 |丰田
2 |福特,凯迪拉克

这里Car列(这是一个字符串)有几个值。这冒犯了第一种正常形式,即每个单元格应该只有一个值。我们可以通过每个车辆单独一行来规范这个问题:

  UserId | Car 
---------------------
1 |丰田
2 |福特
2 |凯迪拉克

在一个单元格中具有多个值的问题是更新难度,难以查询反对,您不能应用索引,约束等。 (2)具有冗余的非密钥数据(即数据不必要地在几行中重复)。示例:

  UserId | UserName | Car 
-----------------------
1 |约翰|丰田
2 | Sue |福特
2 | Sue |凯迪拉克

此设计是一个问题,因为每个列都重复名称,即使名称总是由UserId确定。这使得理论上可以改变一行中的Sue的名称而不是另一个,这是数据损坏。通过将表分成两部分,并创建主键/外键关系来解决问题:

  UserId(FK)| Car UserId(PK)| UserName 
--------------------- -----------------
1 |丰田1 | John
2 |福特2 | Sue
2 |凯迪拉克

现在看起来我们仍然有冗余数据,因为UserId被重复;然而,PK / FK约束确保不能独立更新值,因此完整性是安全的。



是重要吗?是的,它是非常很重要。通过拥有一个具有规范化错误的数据库,您可能会发现数据库无效或损坏的风险。由于数据永远存在,因此首先进入数据库时​​,很难摆脱损坏的数据。



不要害怕正常化。标准化水平的官方技术定义是相当钝的。这使得它听起来像归一化是一个复杂的数学过程。然而,归一化基本上只是常识,你会发现,如果你使用常识来设计数据库模式,它通常将被完全归一化。



有关标准化的一些误解:




  • 认为规范化数据库较慢,非规范化提高了性能。但是,在非常特殊的情况下,这是真的。通常,归一化的数据库也是最快的。


  • 有时标准化被描述为渐进的设计过程,您必须决定何时停止。但实际上,正常化水平只是描述不同的具体问题。第三个NF上正常形式解决的问题首先是非常罕见的问题,所以很可能你的架构已经在5NF。




它是否适用于数据库之外的任何内容?不是直接,否。规范化的原则对于关系数据库是非常具体的。然而,一般底层主题 - 如果不同的实例可能不同步,则不应该具有重复的数据 - 可以广泛应用。这基本上是 DRY原则


Why do database guys go on about normalisation?

What is it? How does it help?

Does it apply to anything outside of databases?

解决方案

Normalization is basically to design a database schema such that duplicate and redundant data is avoided. If some piece of data is duplicated several places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.

There is a number of normalization levels from 1. normal form through 5. normal form. Each normal form describes how to get rid of some specific problem, usually related to redundancy.

Some typical normalization errors:

(1) Having more than one value in a cell. Example:

UserId | Car
---------------------
1      | Toyota
2      | Ford,Cadillac

Here the "Car" column (which is a string) have several values. That offends the first normal form, which says that each cell should have only one value. We can normalize this problem away by have a separate row per car:

UserId | Car
---------------------
1      | Toyota
2      | Ford
2      | Cadillac

The problem with having several values in one cell is that it is tricky to update, tricky to query against, and you cannot apply indexes, constraints and so on.

(2) Having redundant non-key data (ie. data repeated unnecessarily in several rows). Example:

UserId | UserName | Car
-----------------------
1      | John     | Toyota
2      | Sue      | Ford
2      | Sue      | Cadillac

This design is a problem because the name is repeated per each column, even though the name is always determined by the UserId. This makes it theoretically possible to change the name of Sue in one row and not the other, which is data corruption. The problem is solved by splitting the table in two, and creating a primary key/foreign key relationship:

UserId(FK) | Car               UserId(PK) | UserName
---------------------          -----------------
1          | Toyota            1          | John
2          | Ford              2          | Sue
2          | Cadillac

Now it may seem like we still have redundant data because the UserId's are repeated; However the PK/FK constraint ensures that the values cannot be updated independently, so integrity is safe.

Is it important? Yes, it is very important. By having a database with normalization errors, you open the risk of getting invalid or corrupt data into the database. Since data "lives forever" it is very hard to get rid of corrupt data when first it has entered the database.

Don't be scared of normalization. The official technical definitions of the normalization levels are quite obtuse. It makes it sound like normalization is a complicated mathematical process. However, normalization is basically just the common sense, and you will find that if you design a database schema using common sense it will typically be fully normalized.

There are a number of misconceptions around normalization:

  • some believe that normalized databases are slower, and the denormalization improves performance. This is only true in very special cases however. Typically a normalized database is also the fastest.

  • sometimes normalization is described as a gradual design process and you have to decide "when to stop". But actually the normalization levels just describe different specific problems. The problem solved by normal forms above 3rd NF are pretty rare problems in the first place, so chances are that your schema is already in 5NF.

Does it apply to anything outside of databases? Not directly, no. The principles of normalization is quite specific for relational databases. However the general underlying theme - that you shouldn't have duplicate data if the different instances can get out of sync - can be applied broadly. This is basically the DRY principle.

这篇关于什么是规范化(或归一化)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆