事实表是规范化形式还是非规范化形式? [英] Is a fact table in normalized or de-normalized form?
问题描述
无论事实表是规范化还是非规范化的,我都对事实表进行了一些研发。
我遇到了一些发现,这使我感到困惑。
根据 Kimball :
<维度模型结合了规范化和非规范化的表格结构。描述性信息的维度表在同一张表中具有高度详细的规范化和详细的分层汇总属性。同时,通常将具有性能指标的事实表标准化。虽然我们建议不要在单独的表中使用雪花尺寸属性进行完全规范化(为业务用户创建类似于暴风雪的条件),但建议不要在同一个表中同时包含度量和描述的单个非规范化的宽表。
我也认为可以的另一个发现, fazalhp在GeekInterview上:
DW的主要基础是对报告工具可以更快地访问...因此,如果您要构建DW ..90%,则必须对其进行归一化,当然,事实表也必须被归一化...
所以我的问题是事实表是规范化还是非规范化?如果有任何这些,怎么办?为什么?
从关系数据库设计理论的角度来看,维度表通常位于2NF中,事实表通常位于2NF与6NF之间的任何位置。
但是,尺寸建模是 方法 ,它专门针对:
-
一个用例,即报告
-
主要是查询的一种基本类型(模式)
-
一个主要的用户类别-业务分析师或类似的
-
行存储RDBMS,例如Oracle,SQl Server,Postgres ...
-
一个独立控制的加载/更新过程(ETL);所有其他客户端都是只读的
还有其他DW设计方法,例如
-
Inmon的-数据结构驱动
-
数据保管库-数据结构驱动
-
锚点建模-模式演化驱动
<重要的是不要将数据库设计理论与特定的设计方法混淆。您可能会从数据库设计理论的角度看待某种方法,但是必须分别研究每种方法。
I did a bit R&D on the fact tables, whether they are normalized or de-normalized. I came across some findings which make me confused.
According to Kimball:
Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.
The other finding, which I also I think is ok, by fazalhp at GeekInterview:
The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...
So my question is, are fact tables normalized or de-normalized? If any of these then how & why?
From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.
However, dimensional modelling is a methodology unto itself, tailored to:
one use case, namely reporting
mostly one basic type (pattern) of a query
one main user category -- business analyst, or similar
row-store RDBMS like Oracle, SQl Server, Postgres ...
one independently controlled load/update process (ETL); all other clients are read-only
There are other DW design methodologies out there, like
Inmon's -- data structure driven
Data Vault -- data structure driven
Anchor modelling -- schema evolution driven
The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.
这篇关于事实表是规范化形式还是非规范化形式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!