如何在Azure Sql数据仓库中实现SCD类型2 [英] How to implement SCD type 2 in Azure Sql Data warehouse

查看:104
本文介绍了如何在Azure Sql数据仓库中实现SCD类型2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

亲爱的所有人,我必须使用ADF将数据从源数据库提取到数据湖存储,然后最终使用polybase将数据从数据库存储器中提取出来。据我所知,即使使用标识列,Azure数据仓库也不会生成正确的代理键。此外,azure数据仓库中没有MERGE
命令支持。我将如何实现SCD类型2。我看到很多与UPSERT相关的博客和视频,但与SCD类型2无关。任何形式的帮助都将受到高度赞赏。

Dear All, I have to pull data from sources to data lake store using ADF and then finally to Azure sql dwh using polybase. What I understand is Azure data warehouse doesn't generate proper surrogate keys even using identity columns. Also there is no MERGE command support in azure data warehouse. How will I implement SCD type 2 . I see a lot of blogs and videos related to UPSERT but nothing related to SCD type 2 . Any kind of help would be highly appreciated.

推荐答案

嗨Sanju,

Hi Sanju,

请参考以下Stack Overflow讨论:  数据仓库类型2 scd员工维度和人力资源事实
(Kimball's)

Please reference the following Stack Overflow discussion: Data warehouse type 2 scd Employee dimension and HR Facts (Kimball's)


在SCD(类型2或类型3)中,你想要的用2种键来思考;自然键和伪键。自然键是"真实世界"的标识符。我会理解,在Employee维度的例子中,这可能是一些
种类的Employee Id。每次向此表添加条目时,都会获得一个新的伪密钥,我想将其视为"as-was";键。当添加记录时,它表示该维度成员的状态"原样"。

In a SCD (type 2 or type 3), you want to think in terms of 2 types of key; natural keys and pseudo keys. The natural key is the identifier which the "real world" would understand, in the example of an Employee dimension, this would probably be some kind of Employee Id. Each time you add an entry to this table, you get a new pseudo key, and I like to think of this as the "as-was" key. It represents the state of that dimensional member "as it was", when the record was added.


随着时间的推移,你会有很多,每个自然键的许多记录,每个记录都有它自己的"原样"。键。考虑到最近的条目,它是"原样"的。密钥也是"按原样"。 key,因为它代表当前状态。

Over time in a SCD, you will have many, many records per natural key, each with it's own "as-was" key. Considering the most recent entry, it's "as-was" key is also the "as-is" key, as it represents the current state.


在事实表中,您应该总是期望找到"as-was";键。如果您要假设事实表将始终保持"原样"状态。键或最近的键,然后它假定您将返回并更新事实表中的历史记录
,因为维度的属性已更改。这对于开始来说是浪费资源,并且实际上是适得其反的,因为SCD的主要好处之一是能够"按原样"和"原样"进行操作。分析,并做
这需要保留"as-was";州。

In a fact table, you should ALWAYS expect to find the "as-was" key. If you're going to assume the fact table will always hold the "as-is" key, or the most recent key, then it assumes you're going to go back and update historical records in your fact table simply because an attribute of the dimension changed. This is a waste of resources for started, and is actually counter-productive as one of the major benefits of a SCD is the ability to do "as-was vs as-is" analysis, and to do this you need to preserve the "as-was" state.


这篇关于如何在Azure Sql数据仓库中实现SCD类型2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆