具有多个事实表的数据仓库的设计 [英] Design of a data warehouse with more than one fact tables

查看:113
本文介绍了具有多个事实表的数据仓库的设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是数据仓库的新手。首先,我要比将数据仓库工具包的副本精确到邮箱(蜗牛邮件:P)的方式要精确。但是我已经用我在网上找到的东西来研究所有这些东西。



但是,我在网上找不到的是当你在做什么在DW中似乎有多个事实。就我而言(保险),我会定期退款。一个客户三个月可以没有一个,然后在同一个月中可以有十个。另一方面,我有订阅费(不确定什么是正确的英语术语,但是您明白了),该费用每个月或每三个月发生一次。在我看来,这似乎是两个截然不同的事实。



这两个在某种程度上有些松散地耦合在一起,例如客户或保险产品。现在是这两个不同的仓库,我必须在它们上产生两个不同的报告,然后将这些报告连接到DW外部吗?或者有没有一种方法可以设计它以适合单个下降的DW。还是应该将这两个事实合而为一?



我读过的一些博客说DW总是有一个事实表。其他人提到了用S设计事实表的步骤,但是没有明确说明它们之间是否存在链接,或者它们只是同一DW项目的不同组件。



有人知道DW设计的精确部分吗?

解决方案

向后提您的问题。 / p>

一个数据仓库可以有多个事实表。但是,您确实希望最小化事实表之间的联接。可以在不同的事实表中复制事实信息。



在您提到的对象中:



退款是事实。时间戳是退款事实的维度。



订阅费是事实。时间戳记是订阅费事实的维度。



退款可能会发生多次。我猜每个客户都有一笔订阅费。到目前为止,看来我们有两个事实表:客户和客户退款。



如果您知道最多只能退款3次(例如),那么您将消除客户退款事实表,并在其中放入3个退款列客户表。



您还提到了保险。客户可以有多个保单。因此,我们有了第三张事实表。



数据仓库通常是使用星型模式。星型模式基本上是连接到一个或多个维表的一个事实表。因为我们已经定义了3个事实表,所以您在数据仓库中可能会拥有一颗以上的星星。


I'm new to data warehousing. First, I want to precise than my copy of The Data Warehouse Toolkit is on it's way to my mailbox (snail mail :P). But I'm already studying all this stuff with what I find on the net.

What I don't find on the net, however, is what to do when you seems to have more than one fact in a DW. In my case (insurance), I have refunds that occur on a non regular basis. One client can have none for 3 months and then ten in the same months. On the other hands, I have "subscription fee" (not sure what is the correct english term, but you get the point), that occur every month or every three months. That seems clearly like two distinct facts to me.

Those two are kind of loosely coupled by some dimensions, like the client or the "insurance product". Now are these two different warehouse, on which I have to produce two different report and then connect the reports outside of the DW ? Or is there a way to design this to fit a single descent DW. Or should I combine these two facts in one? I would probably lose granularity on refunds then.

Some blog I read said a DW always has one fact table. Others mention the step of designing what are the fact tables with a S, but there is no clear instruction of if there is a link between them or they are just distinct components of a same DW project.

Does anyone know some references on that precise part of DW design?

解决方案

Taking your questions backwards.

A data warehouse can have more than one fact table. However, you do want to minimize joins between fact tables. It's ok to duplicate fact information in different fact tables.

Of the objects you mentioned:

Refund is a fact. Timestamp is the dimension of the refund fact.

Subscription fee is a fact. Timestamp is the dimension of the subscription fee fact.

A refund can happen more than once. I'm guessing that each customer has one subscription fee. So it appears we have two fact tables so far, customer, and customer refund.

If you knew that there could only be at the most 3 refunds (as an example), then you would eliminate the customer refund fact table, and put 3 refund columns in the customer table.

You also mention insurance. A customer can have more than one policy. So we have a third fact table.

A data warehouse is usually designed using a star schema. The star schema is basically one fact table connected to one or more dimension tables. You'll probably have more than one star in a data warehouse, since we already defined 3 fact tables.

这篇关于具有多个事实表的数据仓库的设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆