数据仓库是否需要满足2NF或其他范式? [英] Does a data warehouse need to satisfy 2NF or another normal form?

查看:87
本文介绍了数据仓库是否需要满足2NF或其他范式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调查数据仓库.我有一个关于星型模式的问题.

I'm investigating data warehouses. And I have an issue about star schemas.


Oracle®OLAP应用程序开发人员指南
10g第1版(10.1)
3.2.1尺寸表:TIME_DIM
https://docs.oracle.com/cd/B13789_01/olap.101/b10333/global.htm#CHDCGABE

要表示层次结构MONTH-> QUARTER-> YEAR,我们需要一些键,例如:YEAR_ID,QUARTER_ID.但是有些事情我不明白:

To represent the hierarchy MONTH -> QUARTER -> YEAR, we need some keys such as: YEAR_ID, QUARTER_ID. But there are some things that I do not understand:

1)为什么我们需要字段YEAR_DSC&QUARTER_DSC?我认为我们可以从YEAR&季度表.而且它破坏了2NF.

1) Why do we need field YEAR_DSC & QUARTER_DSC? I think that we can look up these values from YEAR & QUARTER TABLE. And it breaks 2NF.

2)数据仓库中的模式需要满足的正常形式是什么?(1NF,2NF,3NF或其他任何一种.)

2) What is the normal form that a schema in data warehouse needs to satisfy? (1NF, 2NF, 3NF, or any.)

推荐答案

NF(正常形式)与数据仓库基表无关.

NFs (normal forms) don't matter for data warehouse base tables.

我们归一化以减少某些种类的冗余,这样,当我们更新数据库时,我们不必在多个地方都说相同的话,因此,我们不会意外地在需要的地方不说相同的话.被说在多个地方.在查询结果中这不是问题,因为我们没有更新它们.数据仓库的基表也是如此.(这也只是对其原始数据库基表的查询.)

We normalize to reduce certain kinds of redundancy so that when we update a database we don't have to say the same thing in multiple places and so that we can't accidentally erroneously not say the same thing where it would need to be said in multiple places. That is not a problem in query results because we are not updating them. The same is true for a data warehouse's base tables. (Which are also just queries on its original database's base tables.)

数据仓库通常针对读取速度进行了优化,与原始数据库相比,这通常意味着一定程度的非规范化,从而避免了重新计算,而浪费了空间.(请注意,尽管有时重新读取较大的内容可能比读取较小的部分并重新计算较大的内容要慢.)我们可能不希望在移动到数据仓库时 drop 归一化表,因为它们回答很简单查询,我们不想通过重新计算它们来减慢速度.除了那些权衡之外,没有理由 not 进行非规范化.某些特殊的仓库设计方法可能对哪些零件应规格化多少金额有其自己的规则.

Data warehouses are usually optimized for reading speed, and that usually means some denormalization compared to the original database to avoid recomputation at the expense of space. (Notice though that sometimes rereading something bigger can be slower than reading smaller parts and recomputing the big thing.) We probably don't want to drop normalized tables when moving to a data warehouse, because they answer simple queries and we don't want to slow down by recomputing them. Other than those tradeoffs, there's no reason not to denormalize. Some particular warehouse design methods might have their own rules about what parts should be denormalized what amounts.

(无论选择哪种原始数据库设计NF,我们都应始终首先将其标准化为5NF,然后有意识地进行非正规化.我们无需进行标准化或知道约束来更新或查询数据库.)

(Whatever our original database design NF is chosen to be, we should always first normalize to 5NF then consciously denormalize. We don't need to normalize or know constraints to update or query a database.)

阅读一些教科书基础知识,了解我们为什么要规范化&为什么我们使用数据仓库.

Read some textbook basics on why we normalize & why we use data warehouses.

这篇关于数据仓库是否需要满足2NF或其他范式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆