设计具有多个数据源的维度 [英] Design a dimension with multiple data sources

查看：103 发布时间：2020/10/18 3:26:34 etl data-warehouse scd

本文介绍了设计具有多个数据源的维度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在设计具有多个数据源的几个维度，想知道其他人为对齐每个数据源的多个业务密钥做了什么。

我的示例：
我有2个数据源-订购系统和执行系统。订购系统包含有关付款以及应如何处理的详细信息；执行系统包含有关实际发生情况的详细信息（花费了多长时间等，由谁执行了订单）。来自两个系统的数据都需要创建一个单一事实。

在Ordering and Execution系统中，它们都是一个Location表。这两个系统的业务密钥都是通过esb映射的。这两个系统中的属性构成了有关单个位置的完整图片。帐单信息在订购系统中，纬度和经度在执行系统中。位置名称在两个系统中都存在。

您如何设计从两个系统到维度的SCD适应更改？

我们遵循相当严格的Kimball方法-fyi，但我愿意研究每个人的解决方案。

解决方案

不一定是答案，但这是我的想法：

您已经在评论中涵盖了实际选项。要么：

A。预先合并

您需要在分阶段中使用一些合并功能，该功能与两个（或更多）记录匹配，创建新的公共合并键并在维中使用它。除了正常的DW数据，还需要存储某种形式的查找或引用。

B。在维度中合并

将两个记录都放入维度中，并允许报告工具通过合并（例如，按位置名称分组）。这意味着您不需要先前的合并逻辑，只需将其转储到维度中即可。

但是，您有两个约束，我认为可以在A和A之间进行选择。 B清除器

首先，您需要一个SCD（我认为是Type 2）。这意味着选项B可能变得非常复杂，因为当一个源记录中有更改时，您必须去查找另一条记录并对其进行更改-对于选项B来说非常不愉快。您仍然需要某种预存储的密钥来链接它们，这意味着选项B不再简单

第二，考虑到一个属性（位置名称）有两个来源，您需要某种暂存逻辑来因此，在这两种情况下，我建议选择A最好-建立一些预合并逻辑，因为复杂度高

您会认为这是一个普遍的问题，但是我从来没有找到一个很好的在线参考资料来解释某人如何解决此问题。

I am designing a few dimensions with multiple data sources and wonder what other people have done to align the multiple business keys per data source.

My Example: I have 2 data sources - the Ordering System and the Execution System. The Ordering system has details about payment and what should happen; the Execution System has details on what actually happened (how long it took etc, who executed on the order). Data from both systems is need to created a single fact.

In both the Ordering and Execution system they is a Location table. The business keys from both systems are mapped via an esb . There are attributes in both systems that make up the complete picture about a single location. Billing information is in the Ordering system, latitude and longitude are in the Execution system. And Location Name exists in both systems.

How do you design a SCD accomodate changes from both systems to the dimension?

We follow a fairly strict Kimball methodology - fyi, but I am open to looking at everyone's solutions.

解决方案

Not necessarily an answer but here are my thoughts:

You've already covered the real options in your comment. Either:

A. Merge it beforehand

You need some merge functionality in staging which matches the two (or more) records, creates a new common merge key and uses that in the dimension. This requires some form of lookup or reference to be stored in addition to normal DW data

B. Merge it in the dimension

Put both records in the dimension and allow the reporting tool to 'merge' it by, for example, grouping by location name. This means you don't need prior merging logic you just dump it in the dimension

However you have two constraints that I feel makes the choice between A & B clearer

Firstly, you need an SCD (Type 2 I assume). This means Option B could get very complicated as when there is a change in one source record you have to go find the the other record and change it as well - very unpleasant for option B. You still need some kind of pre-stored key to link them, which means option B is no longer simple

Secondly, given that you have two sources for one attribute (Location Name), you need some kind of staging logic to pick a single name when these don't match

So given these two circumstances, I suggest that option A would be best - build some pre-merging logic, as the complexity of your requirements warrants it.

You'd think this would be a common issue but I've never found a good online reference explaining how someone solved this before.

这篇关于设计具有多个数据源的维度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

设计具有多个数据源的维度 [英] Design a dimension with multiple data sources

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

设计具有多个数据源的维度 [英] Design a dimension with multiple data sources

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭