从事务Flat DB填充事实和维度表的最佳实践 [英] Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

查看:119
本文介绍了从事务Flat DB填充事实和维度表的最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在SSIS / SSAS中填充星型架构/多维数据集。

I want to populate a star schema / cube in SSIS / SSAS.

我准备了所有维度表以及事实表,主键等。

I prepared all my dimension tables and my fact table, primary keys etc.

源是一个扁平(项目级别)表,现在我的问题是如何将其拆分成
并将其从一个表放入相应的表中。

The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables.

我做了很多谷歌搜索,但找不到令人满意的解决方案。有人会想象这是BI开发中一个相当普遍的问题/情况吗?!

I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?!

谢谢,
alexl

Thanks, alexl

推荐答案

首先,它取决于您要进行简单的初始数据传输还是进行更复杂的操作(例如增量操作)。我假设您正在进行初始数据传输。

For a start, it depends on whether you want to do a simple initial data transfer or something more sophisticated (e.g. incremental). I'm going to assume you're doing an initial data transfer.

说您的项目表中的列如下: id,cat1,cat2 ,cat3,cat4,... 假设类别1-4的列 id,cat_name ,则可以加载dim_cat1(项目类别的维表1)如下:

Say your item table has columns as follows: id, cat1, cat2, cat3, cat4, ... Assuming categories 1-4 have columns id, cat_name, you can load dim_cat1 (the dimension table of item category 1) as follows:

insert into dim_cat1 (cat_name)
  select distinct cat1 from item_table;

您可以对所有其他类别/维度表执行相同的操作。我假设您的维度表具有自动生成的ID。现在,要加载事实表:

You can do the same for all of the other categories/dimension tables. I'm assuming your dimension tables have automatically generated IDs. Now, to load the fact table:

insert into fact_table (id, cat1_id, cat2_id, cat3_id, cat4_id, ...)
  select id, dc1.id
    from item_table it
      join dim_cat1 dc1 on dc1.cat_name = it.cat1
      join dim_cat2 dc2 on dc2.cat_name = it.cat2
      join dim_cat3 dc3 on dc3.cat_name = it.cat3
      join dim_cat4 dc3 on dc4.cat_name = it.cat4
 ...

如果您有大量数据,则可以在item_table甚至维表中的类别名称上创建索引。

If you have a substantial amount of data, it might make sense to create indexes on the category names in the item_table and maybe the dimension tables.

顺便说一句,这是一个独立于数据库的答案,我不使用SSIS / SSAS:您可能拥有可用的工具来简化此过程的一部分,但实际上并不那么困难/费时用普通的SQL编写。

Btw, this is a database-independent answer, I don't work with SSIS/SSAS: you might have tools available which streamline parts of this process for you, but it's really not that difficult/time consuming to write in plain SQL.

这篇关于从事务Flat DB填充事实和维度表的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆