在设计数据仓库时使用分段数据库的好处 [英] Benefits of using Staging Database while designing Data Warehouse

查看:337
本文介绍了在设计数据仓库时使用分段数据库的好处的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在设计数据仓库架构。
在探索从生产中提取数据并进入数据仓库的各种选项时,我发现了许多文章,主要提出以下两种方法 -



  1. 生产DB ---->数据仓库(星型模式)----> OLAP Cube

  2. 生产DB ----> 分段数据库 ---->数据仓库(星型模式)----> OLAP Cube


我仍然不确定哪一个是在性能方面更好的方法,并减少生产数据库的处理负载。



您发现更好的方法设计数据仓库?

解决方案

以下内容取自 DWBI Organization的文章



可能需要分段区域如果您有以下任何一种情况:


  1. Delta Loading :您的数据是从源和您需要一个中间存储,其中可以临时存储数据的增量集,以进行转换目的

  2. 转换需要:您需要执行数据清理,验证等消耗仓库中的数据

  3. 取消耦合:您的处理需要很多时间,您不想保持连接到源系统(大概是源系统在实际的业务用户中被不断使用),因此,更愿意从源系统一次性读取数据,从源头断开连接,然后继续处理您自己的数据侧面

  4. 调试目的:您不必一直回到您的来源,您可以单独从分期区域解决问题(如果有的话)

  5. 故障恢复:源系统可能是暂时性的,数据的状态可能正在更改。如果您遇到任何上游故障,您可能无法根据当时源更改而重新提取数据。拥有本地副本有助于

性能和简化处理可能不仅仅是考虑因素。添加分期可能有时会增加延迟时间(即发生业务发生与报告之间的时间延迟)。但是我希望以上几点能帮助你做出更好的判断。


I am in process of designing a Data Warehouse Architecture. While exploring various options to Extract data from Production and putting into Data Warehouse, I came across many articles which mainly suggested following two approaches -

  1. Production DB ----> Data Warehouse (Star Schema) ----> OLAP Cube
  2. Production DB ----> Staging Database ----> Data Warehouse (Star Schema) ----> OLAP Cube

I am still not sure which one is the better approach in terms of Performance and reducing processing load on Production database.

Which approach you find better while designing Data Warehouse ?

解决方案

Below points are taken from, DWBI Organization's article

Staging area may be required if you have any of the following scenarios:

  1. Delta Loading: Your data is read incrementally from the source and you need an intermediate storage where incremental set of your data can be stored temporarily for transformation purpose
  2. Transformation need: You need to perform data cleansing, validation etc. before consuming the data in the warehouse
  3. De-coupling: Your processing takes lot of time and you do not want to remain connected to your source system (presumably the source system is being constantly used by the actual business users) during the entire time of your processing and, hence, prefer to just read the data from source system in one-go, disconnect from the source and then continue processing the data at your "own side"
  4. Debugging purpose: You need not go back to your source all the time and you can troubleshoot issues (if any) from staging area alone
  5. Failure Recovery: Source system may be transitory and the state of the data may be changing. If you encounter any upstream failure, you may not be in a position to re-extract your data as source has changed by that time. Having a local copy helps

Performance and reduced processing may not be only considerations. Adding a staging may sometimes increase latency (i.e. time delay between occurrence of a business incidence and it's reporting). But I hope above points will help you to make a better judgement.

这篇关于在设计数据仓库时使用分段数据库的好处的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆