事实表,其中包含可在源系统中定期更新的信息 [英] Fact table with information that is regularly updatable in source system

查看:109
本文介绍了事实表,其中包含可在源系统中定期更新的信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个维度数据仓库,并学习如何从我的仓库中的源系统中为各种业务流程建模。

I'm building a dimensional data warehouse and learning how to model my various business processes from my source system in my warehouse.

我目前正在为出价(工作出价)作为事实表从数据系统中的源系统中获取,其中包含以下信息:

I'm currently modelling a "Bid" (bid for work) from our source system in our data warehouse as a fact table which contains information such as:


  • 出价金额

  • 预计收入

  • 销售员工

  • 出价状态(有效,待处理,已拒绝等)

  • 等。

  • Bid amount
  • Projected revenue
  • Sales employee
  • Bid status (active, pending, rejected, etc)
  • etc.

问题是出价(或我尝试执行的大多数其他过程)模型)可以经历各种状态,并可以在源系统中的任何给定时刻更新其信息。根据Ralph Kimball的说法,只有在事实表被视为累积快照的情况下,才应更新事实表,而且我敢肯定,以下定义将并非所有这些过程都被视为累积快照。

The problem is that the bid (or most any other process I'm trying to model) can go through various states and have its information updated at any given moment in the source system. According to Ralph Kimball, fact tables should only be updated if they are considered "accumulating snapshot" and I'm sure that not all of these processes would be considered an "accumulating snapshot" by the definition below.

如何根据Kimball组的建议在数据仓库中对这些类型的流程进行建模?更进一步,哪种类型的事实表适用于出价(考虑到我上面概述的事实)?

How should these type of processes be modeled in the data warehouse according to the Kimball group's recommendations? Further more, what type of fact table would work for a bid (given the facts I've outlined above)?

http:/ /www.kimballgroup.com/2008/11/fact-tables/


交易金额对应于在单个
瞬间。杂货店的哔哔声是一笔交易。所测量的
事实仅在该瞬间和该事件中有效。下一次
测量事件可能在一毫秒后或下个月发生,或者永远不会发生。因此,交易粒度事实表是稀疏的或密集的。我们不能保证所有可能的外键都将代表
。交易谷物事实表可能非常庞大,其中最大的
包含数十亿条记录。

The transaction grain corresponds to a measurement taken at a single instant. The grocery store beep is a transaction grain. The measured facts are valid only for that instant and for that event. The next measurement event could happen one millisecond later or next month or never. Thus, transaction grain fact tables are unpredictably sparse or dense. We have no guarantee that all the possible foreign keys will be represented. Transaction grain fact tables can be enormous, with the largest containing many billions of records.

定期快照对应于预定义的时间跨度,
通常是财务报告期。图1说明了每月
帐户的定期快照。测得的事实总结了时间跨度或时间跨度内的活动
。定期快照颗粒
强有力地保证了所有报告实体(例如
如图1中的银行帐户)将出现在每个快照中,即使
没有活动。定期快照是可以预见的,并且
应用程序可以依赖始终存在的密钥组合。
定期快照事实表也可能很大。一家拥有20个
百万帐户且具有10年历史的银行,在月度帐户定期快照中将有24亿条记录

The periodic snapshot grain corresponds to a predefined span of time, often a financial reporting period. Figure 1 illustrates a monthly account periodic snapshot. The measured facts summarize activity during or at the end of the time span. The periodic snapshot grain carries a powerful guarantee that all of the reporting entities (such as the bank account in Figure 1) will appear in each snapshot, even if there is no activity. The periodic snapshot is predictably dense, and applications can rely on combinations of keys always being present. Periodic snapshot fact tables can also get large. A bank with 20 million accounts and a 10-year history would have 2.4 billion records in the monthly account periodic snapshot!

累积快照事实表对应于可预测的
流程,该流程具有明确的开始和结束。订单处理,
索赔处理,服务电话解决和大学入学是
的典型候选人。例如,用于订单
处理的累积快照的粒度通常是订单上的订单项。请注意,图1中的
有多个日期代表订单经历的标准
情况。随着过程从头到尾逐步执行
的过程,将重新访问并覆盖累积的快照记录
。由于这种覆盖
策略,累积快照事实表通常比其他两种类型的快照要小得多。$

The accumulating snapshot fact table corresponds to a predictable process that has a well-defined beginning and end. Order processing, claims processing, service call resolution and college admissions are typical candidates. The grain of an accumulating snapshot for order processing, for example, is usually the line item on the order. Notice in Figure 1 that there are multiple dates representing the standard scenario that an order undergoes. Accumulating snapshot records are revisited and overwritten as the process progresses through its steps from beginning to end. Accumulating snapshot fact tables generally are much smaller than the other two types because of this overwriting strategy.


推荐答案

就像评论中提到的一个一样,Change Data Capture是一个相当通用的术语,表示我如何随着时间的推移处理对数据实体的更改,并且整本书都在上面(还有数不胜数的书帖子和文章)。

Like one of the comments mention, Change Data Capture is a fairly generic term for "how do I handle changes to data entities over time", and there are entire books on it (and a gazillion posts and articles).

无论似乎有明确的黑白或总是这样做的陈述,都是真实的答案通常,取决于情况-在您的情况下,取决于特定事实表所需的粒度。

Regardless of any statements that seem to suggest a clear black-and-white or always-do-it-like-this answer, the real answer, as usual, is "it depends" - in your case, on what grain you need for your particular fact table.

如果数据以不可预测的方式或经常变化,实现Kimball的累积快照版本(说明您可能最终需要多少里程碑日期列等)可能变得具有挑战性。

If your data changes in unpredictable ways or very often, it can become challenging to implement Kimball's version of an accumulated snapshot (picture how many "milestone" date columns, etc. you might end up needing).

因此,如果您愿意,您可以决定将事实表设为交易事实表,而不是快照,即事实键将是(出价键,时间戳),然后在您的 application 层(无论是视图,mview,实际应用还是其他)中,您可以确保给定的查询仅获取最新的每个出价的版本(请注意,可以将其视为虚拟的累积快照)。如果您发现不需要以前的版本(每个出价的历史记录),则可以使用一个例程来修剪它们(即,将它们删除或移动到其他位置)。

So, if you prefer, you can decide to make your fact table be an transactional fact table rather than a snapshot, where the fact key would be (Bid Key, Timestamp), and then in your application layer (whether a view, mview, actual app, or whatever), you can ensure that a given query only gets the latest version of each Bid (note that this can be thought of as kind of a virtual accumulated snapshot). If you find that you don't need the previous versions (the history of each Bid), you can have a routine that prunes them (i.e. deletes or moves them somewhere else).

或者,您只能在事实(bid)处于最终状态时才允许添加事实(bid),但是在新的(可更新的)出价无法实现的情况下,您可能会滞后很长时间。

Alternatively, you can only allow the fact (Bid) to be added when it is in it's final state, but then you will likely have a significant lag where a new (updateable) Bid doesn't make it to the fact table for some time.

无论哪种方式,都有几种可靠的行之有效的处理方法-您只需清楚地确定业务需求和

Either way, there are several solid and proven techniques for handling this - you just have to clearly identify the business requirements and design accordingly.

祝你好运!

这篇关于事实表,其中包含可在源系统中定期更新的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆