在Azure数据仓库中进行哪些区域的暂存和清理? [英] Which areas to conduct Staging and Cleansing in Azure Data Warehouse?

查看:82
本文介绍了在Azure数据仓库中进行哪些区域的暂存和清理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从服务器A复制数据:Azure SQL( OLTP)插入服务器B:Azure数据仓库.

Azure SQL已规范化关系OLTP表.

Azure数据仓库将具有已标准化的Kimball表(已加入表).

我们在公司内部典型的传统ETL流程将复制到这样的基于行的存储.

1.因此服务器A:OLTP

2.进入 服务器B:暂存区1  :  复制 服务器A的所有归一化数据(原始,尚无转换)

3. 插入并 另一组登台表,它将进行非正规化,进行转化,清洗等.

4.进入服务器B:数据仓库最终表:最后,我们对数据仓库进行干净的插入或更新.

问题是:

选项1:暂存区1在下面的Azure流中的哪个位置?会在Azure Blob存储中吗

暂存区2(已归一化,转换后)是否会在Azure数据仓库中作为Round-Robin Tables?

选项2:还是两个临时区域都在Azure数据仓库中?这将防止跨服务器转换.我在想这个.只是,我们仍然需要先将其复制到Azure Blob中 也.

解决方案

CarApple52,

通常,当您将数据导入SQL数据仓库时,会将数据从外部源(Blob,表,数据库存储)导入到作为SQL数据仓库一部分的外部表中.这是一个临时区域.然后,您可以通过以下方式完成导入: 数据到dbo模式(内部)表中:

Azure SQL数据仓库加载模式和策略

迁移数据

如果需要设置管道来加载循环数据集(需要更多数据清理),则Azure Data Factory更适用.

Azure数据工厂简介

简单的情况是将数据上传到Azure存储帐户,然后使用Polybase将数据导入到数据仓库外部表中.


I want to copy our data from Server A: Azure SQL (OLTP) into Server B: Azure Datawarehouse. 

Azure SQL has normalized relational OLTP Tables.

Azure Data Warehouse will have Kimball, denormalized, already joined tables.

Our typical legacy ETL Process at company on premise, would copy to row-based storage like this.

1. So Server A: OLTP

2. Into Server B: Staging Area1   : Copy all the normalized data (raw, no transformation yet) from Server A 

3. Into Server B: Staging Area 2: Then Copy into another set of staging tables, which will conduct Denormalized, have the transformations, cleansing etc.

4. Into Server B: Data Warehouse Final Tables: Finally, we do a clean insert or update into the data warehouse.

 Server A: OLTP----->   Server B Staging Area 1 (normalized, no transform raw tables)   ----> Server B Staging Area 2 (transformed, denormalized tables) ----> Server B Data Warehouse

Question is:

Option 1: Where would Staging Area 1: be on this Azure Flow below? Would it be in Azure Blob Storage

Would Staging Area 2 (Denormalized, transformed) be in Azure Data Warehouse as Round - Robin Tables? 

Option 2: Or would both staging areas be in Azure Data Warehouse? This would prevent cross server transformations. I am thinking this one. Just fyi, we would still need to copy into Azure Blob first also.

解决方案

Hi CarApple52,

Typically, when you import data into SQL Data Warehouse, you import data from an external source (Blob, Table, DB storage) in to an External Table that is part of SQL Data Warehouse. This is a staging area. You then complete the import by committing the data to the dbo schema (internal) tables:

Azure SQL Data Warehouse loading patterns and strategies

Migrate Your Data

If you need to set up a pipeline to load recurrent sets of data, where more data scrubbing is required, then Azure Data Factory is more applicable. 

Introduction to Azure Data Factory

The simple scenario is uploading your data to an Azure Storage account and then use Polybase to import that data into your Data Warehouse external tables. 


这篇关于在Azure数据仓库中进行哪些区域的暂存和清理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆