什么是归一化(或归一化)? [英] What is Normalisation (or Normalization)?

查看:53
本文介绍了什么是归一化(或归一化)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么数据库人员会继续规范化?

Why do database guys go on about normalisation?

是什么?它有什么帮助?

What is it? How does it help?

它是否适用于数据库之外的任何内容?

Does it apply to anything outside of databases?

推荐答案

规范化基本上是设计一个数据库模式,以避免重复和冗余数据.如果相同的信息在数据库中的多处重复,则存在一处更新而另一处未更新的风险,从而导致数据损坏.

Normalization is basically to design a database schema such that duplicate and redundant data is avoided. If the same information is repeated in multiple places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.

有许多标准化级别,从 1. 范式到 5. 范式.每个范式都描述了如何摆脱某些特定问题.

There is a number of normalization levels from 1. normal form through 5. normal form. Each normal form describes how to get rid of some specific problem.

第一范式 (1NF) 很特别,因为它与冗余无关.1NF 不允许嵌套表,更具体地说,不允许将表作为值的列.SQL 首先不支持嵌套表,因此大多数普通关系数据库默认为 1NF.所以我们可以在接下来的讨论中忽略 1NF.

First normal form (1NF) is special because it is not about redundancy. 1NF disallows nested tables, more specifically columns which allows tables as values. Nested tables are not supported by SQL in the first place, so most normal relational databases will be in 1NF by default. So we can ignore 1NF for the rest of the discussions.

范式 2NF 到 5NF 都涉及相同信息在同一张表中多次表示的场景.

The normal forms 2NF to 5NF all concerns scenarios where the same information is represented multiple times in the same table.

例如考虑卫星和行星的数据库:

For example consider a database of moons and planets:

Moon(PK) | Planet  | Planet kind
------------------------------
Phobos   | Mars    | Rock
Daimos   | Mars    | Rock
Io       | Jupiter | Gas
Europa   | Jupiter | Gas
Ganymede | Jupiter | Gas

冗余是显而易见的:木星是一颗气态行星的事实被重复了三遍,每个卫星一个.这是一种空间浪费,但更严重的是,这种模式使不一致信息成为可能:

The redundancy is obvious: The fact that Jupiter is a gas planet is repeated three times, one for each moon. This is a waste of space, but much more seriously this schema makes inconsistent information possible:

Moon(PK) | Planet  | Planet kind
------------------------------
Phobos   | Mars    | Rock
Deimos   | Mars    | Rock
Io       | Jupiter | Gas
Europa   | Jupiter | Rock <-- Oh no!
Ganymede | Jupiter | Gas

查询现在可以给出不一致的结果,这可能会产生灾难性的后果.

A query can now give inconsistent results which can have disastrous consequences.

(当然,数据库不能防止错误信息被输入.但它可以防止不一致信息,这同样是一个严重的问题.)

(Of course a database cannot protect against wrong information being entered. But it can protect against inconsistent information, which is just as serious a problem.)

标准化设计会将表格拆分为两个表格:

The normalized design would split the table into two tables:

Moon(PK) | Planet(FK)     Planet(PK) | Planet kind
---------------------     ------------------------
Phobos   | Mars           Mars       | Rock
Deimos   | Mars           Jupiter    | Gas
Io       | Jupiter 
Europa   | Jupiter 
Ganymede | Jupiter 

现在没有事实重复多次,所以不存在数据不一致的可能性.(由于行星名称重复,看起来仍然存在一些重复,但将主键值作为外键重复并不违反规范化,因为它不会引入数据不一致的风险.)

Now no fact is repeated multiple times, so there is no possibility of inconsistent data. (It may look like there still is some repetition since the planet names are repeated, but repeating primary key values as foreign keys does not violate normalization since it does not introduce a risk of inconsistent data.)

经验法则如果相同的信息可以用更少的单个单元格值表示,不计算外键,那么应该通过将其拆分为更多表来规范化该表.例如,第一个表有 12 个单独的值,而两个表只有 9 个单独的(非 FK)值.这意味着我们消除了 3 个冗余值.

Rule of thumb If the same information can be represented with fewer individual cell values, not counting foreign keys, then the table should be normalized by splitting it into more tables. For example the first table has 12 individual values, while the two tables only have 9 individual (non-FK) values. This means we eliminate 3 redundant values.

我们知道相同的信息仍然存在,因为我们可以编写一个 join 查询,该查询返回与原始未规范化表相同的数据.

We know the same information is still there, since we can write a join query which return the same data as the original un-normalized table.

如何避免此类问题?通过对概念模型稍加思考,可以轻松避免规范化问题,例如通过绘制实体关系图.行星和卫星是一对多的关系,这意味着它们应该用外键关联在两个不同的表中表示.当具有一对多或多对多关系的多个实体表示在同一表行中时,就会发生规范化问题.

How do I avoid such problems? Normalization problems are easily avoided by giving a bit of though to the conceptual model, for example by drawing an entity-relationship diagram. Planets and moons have a one-to-many relationship which means they should be represented in two different tables with a foreign key-association. Normalization issues happen when multiple entities with a one-to-many or many-to-many relationship are represented in the same table row.

规范化重要吗?是的,它非常重要.如果数据库存在规范化错误,就会面临将无效或损坏的数据导入数据库的风险.由于数据永远存在"损坏的数据第一次进入数据库时​​很难清除.

Is normalization it important? Yes, it is very important. By having a database with normalization errors, you open the risk of getting invalid or corrupt data into the database. Since data "lives forever" it is very hard to get rid of corrupt data when first it has entered the database.

但我真的认为区分从 2NF 到 5NF 的不同范式并不重要.当模式包含冗余时,通常很明显 - 只要问题得到解决,无论是 3NF 还是 5NF 被违反都不那么​​重要.

But I don't really think it is important to distinguish between the different normal forms from 2NF to 5NF. It is typically pretty obvious when a schema contains redundancies - whether it is 3NF or 5NF which is violated is less important as long as the problem is fixed.

(还有一些额外的范式,如 DKNF 和 6NF,它们仅与数据仓库等特殊用途系统相关.)

(There are also some additional normal forms like DKNF and 6NF which are only relevant for special purpose systems like data-warehouses.)

不要害怕规范化.标准化级别的官方技术定义非常模糊.这听起来像是归一化是一个复杂的数学过程.但是,规范化基本上只是常识,您会发现,如果您使用常识设计数据库架构,它通常会完全规范化.

Don't be scared of normalization. The official technical definitions of the normalization levels are quite obtuse. It makes it sound like normalization is a complicated mathematical process. However, normalization is basically just the common sense, and you will find that if you design a database schema using common sense it will typically be fully normalized.

关于规范化存在许多误解:

There are a number of misconceptions around normalization:

  • 有些人认为规范化的数据库速度较慢,而非规范化可以提高性能.然而,这仅适用于非常特殊的情况.通常,规范化数据库也是最快的.

  • some believe that normalized databases are slower, and the denormalization improves performance. This is only true in very special cases however. Typically a normalized database is also the fastest.

有时标准化被描述为一个渐进的设计过程,您必须决定何时停止".但实际上标准化级别只是描述了不同的具体问题.由 3rd NF 以上的范式解决的问题首先是非常罕见的问题,因此很可能您的模式已经在 5NF 中.

sometimes normalization is described as a gradual design process and you have to decide "when to stop". But actually the normalization levels just describe different specific problems. The problem solved by normal forms above 3rd NF are pretty rare problems in the first place, so chances are that your schema is already in 5NF.

它是否适用于数据库之外的任何内容?不是直接的,不是.规范化的原则非常适用于关系数据库.然而,一般的基本主题——如果不同的实例可能不同步,你不应该有重复的数据——可以广泛应用.这基本上是 DRY 原则.

Does it apply to anything outside of databases? Not directly, no. The principles of normalization is quite specific for relational databases. However the general underlying theme - that you shouldn't have duplicate data if the different instances can get out of sync - can be applied broadly. This is basically the DRY principle.

这篇关于什么是归一化(或归一化)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆