非正规化时的深度有多深 [英] How deep to go when denormalising

查看:119
本文介绍了非正规化时的深度有多深的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将OLTP数据库非规范化以用于DWH. 目前,我正在使研究组不规范.

I denormalising a OLTP database for use in a DWH. At the moment I am denormalising studygroups.

  • 每个研究组都有一个指向一个项目的钥匙.
  • 每个项目都有一个指向1个部门的密钥.
  • 每个系都有一个指向一所大学的钥匙.
  • 每所大学都有指向一个城市的钥匙.

现在我知道您应该对OLTP的sh * t进行规范化处理,但是在这个dwh部门中,它将是一个独立的维度.这也适用于大学.从研究组添加指向部门的密钥就足够了吗?还是尽可能地进行规范化并将部门中的所有属性以及与M:1相关的表中的所有属性添加到维度研究组中是否更明智?即使部门和大学将自己确定规模?

Now I know that you are supposed to denormalize the sh*t out your OLTP but in this dwh department will be a dimension on its own. This goes for university also. Would it suffise to add a key from studygroup pointing at department or is it wiser to denormalize as far as you can and add all attributes from the department and all attributes from its M:1 related tables to the dimension studygroup? Even when department and university will be dimensions by themselves?

换句话说:非正规化时您走了多深?

In other words: how far/deep do you go when denormalizing?

推荐答案

维模型背后的关键概念是:

The key concept behind a dimensional model is:

  • 将事实表保留为3NF(第三范式);
  • 将尺寸反规范化为2NF(第二范式)

因此,理想情况下,模型中唯一应具有的联接是事实表和相关维之间的联接.

So ideally, the only joins you should have in your model are the joins between fact tables and relevant dimensions.

作为这种哲学的一部分:

As part of this philosophy:

  • 避免使用雪花"设计,因为其中的尺寸包含其他尺寸的关键.总是有可能想出一个数据模型,该模型具有与雪花相同的功能,而不会违反3NF/2NF规则;
  • 在两个单独的维度(即部门和研究小组)之间直接没有任何直接连接.维度之间的所有关系都必须通过事实表来解决;
  • 在两个单独的事实表之间绝对没有任何直接联接.事实表之间的任何关系都必须通过共享维来解决.

最后,考虑到维度设计除了用于查询的数据优化之外,还具有第二个重要目的:它是企业的语义模型(或其代表的任何其他事物).因此,在做出有关将数据元素组合为维度和事实的决策时,请考虑其逻辑亲和力"-它们对最终用户应具有直观意义.如果您很难向BI分析师解释维度或事实表的含义,则很可能是您犯了建模错误.

Finally, consider that dimensional design, besides optimization of the data for querying, serves a second important purpose: it's a semantic model of the business (or whatever else it represents). So, when making decisions about combining data elements into dimensions and facts, consider their "logical affinity" - they should make intuitive sense to the end users. If you have hard times explaining to a BI analyst the meaning of your dimension or fact table, most likely you've made a modeling mistake.

例如,在您的情况下,您应该考虑大学,系,研究组等之间的逻辑关系.大学/系很可能形成自然的等级体系.如果是这样,则它们应属于同一维度.另一方面,研究组可能不会-假设,有可能在多个大学和/或多个部门中组成研究组.如此之多:许多关系清楚地表明应该通过事实表来解决它们.此外,大学与系之间的关系是稳定的(很少发生变化),而研究组的形成和解散则非常频繁,因此应分别建模.

For example, in your case you should consider logical relations between universities, departments, study groups, etc. It's very likely that University/Department form a natural hierarchy. If so, they should belong to the same dimension. Study group, on the other hand, might not - let's assume, it's possible to form study groups across multiple universities and/or multiple departments. Such Many:Many relations are clear indication that they should be resolved via fact tables. In addition, relations between universities and departments are stable (rarely change), while study groups are formed and dissolved very often, and thus should be modeled separately.

通常,如果您看到维元素之间的1:1或1:M关系,通常表明它们应该被归一化到同一张表中(同样,仅当它们的组合具有逻辑意义时).如果关系是M:M,则它们很可能属于不同的表(您可以将它们强制放入同一表中,但此类表通常看起来像科学怪人生物).

In general, if you see 1:1 or 1:M relations between dimensional elements, it's often an indication that they should be de-normalized into the same table (again, only if their combination makes logical sense). If the relations are M:M, most likely they belong to different tables (you can force them into the same table, but often such tables look like Frankenstein creatures).

通过使问题更加具体化,您可以获得更好的帮助-绘制尺寸模型,发布并提出特定问题/挑战.对于一般概念,Kimball和Inmon的书是您最好的朋友.

You can get much better help by making your question more specific - draw your dimensional model, post it, and ask for specific issues/challenges you have. For general concepts, books from Kimball and Inmon are your best friends.

这篇关于非正规化时的深度有多深的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆