SQL数据仓库,需要使用TSQL SELECT或更好的替代方法来填充我的DIMENSION的帮助吗? [英] SQL Datawarehousing, need help populating my DIMENSION using TSQL SELECT or a better alternative?

查看:95
本文介绍了SQL数据仓库,需要使用TSQL SELECT或更好的替代方法来填充我的DIMENSION的帮助吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SQL Server中有一个表,用于从ERP系统中暂存我的数据仓库摘录。

I have a table in my SQL Server where I "stage" my datawarehouse extract from our ERP system.

从此暂存表(表名: DBO.DWUSD_LIVE ),我建立了维度并加载了事实数据。

From this staging table (table name: DBO.DWUSD_LIVE) , I build my dimensions and load my fact data.

示例维度表称为 SHIPTO,此维度包含以下列: :

An example DIMENSION table is called "SHIPTO", this dimensions has the following columns:

"shipto_id
"shipto"
"salpha"
"ssalpha"
"shipto address"
"shipto name"
"shipto city"

现在我有一个SSIS包,该包在上述各列之间执行SELECT DISTINCT来检索唯一数据,然后通过SSIS包将 shipto_id代理键分配给该键。

Right now I have an SSIS package that does a SELECT DISTINCT across the above columns to retrieve the "unique" data, then through the SSIS package I assign the "shipto_id" surrogate key to.

我当前的TSQL查询的示例是:

An example of my current TSQL Query is:

SELECT DISTINCT
"shipto", "salpha", "ssalpha", "shipto address", "shipto name", "shipto city"
FROM DBO.DWUSD_LIVE

这很好用,但不是快速的,有些尺寸s有10列,对它们进行独特的选择并不理想。

This works great but is not "speedy", some dimensions have 10 columns and doing a distinct select on those is not ideal.

在此维度中,我的业务键列为 SHIPTO, SALPHA 和 SSALPHA

In this dimension, my "Business Key" columns are "SHIPTO", "SALPHA", and "SSALPHA".

因此,如果我这样做:

SELECT DISTINCT
"shipto", "salpha", "ssalpha"
FROM DBO.DWUSD_LIVE

它产生与以下结果相同的结果:

It yields the same results as:

SELECT DISTINCT
"shipto", "salpha", "ssalpha", "shipto address", "shipto name", "shipto city"
FROM DBO.DWUSD_LIVE

是否有更好的方法来执行此TSQL查询?我需要所有列,但只需要业务键列上的DISTINCT。

Is there a better way to do this TSQL QUERY? I need all the columns, but only DISTINCT on the business key columns.

我们非常感谢您的帮助。

Your help is appreciated.

下图显示了如何在SSIS中设置我的项目,尺寸是SCD 1。

Below is an image of how my project is setup in SSIS, the Dimensions is a SCD 1.

推荐答案

我首先将其分为两个操作:生成代理密钥并填充尺寸表。然后,第一步将是仅3列的 DISTINCT ,第二步将成为 JOIN 。为两个操作中使用的列建立索引可能会给您带来一些改善。

I would start by splitting this into two operations: generating the surrogate key and populating the dimension table. The first step will then be a DISTINCT on only 3 columns, and the second step will become a JOIN. Indexing the columns used in both operations might then give you some improvement.

您可以将 DISTINCT 不存在,以避免处理已映射的行,例如:

You can combine the DISTINCT with NOT EXISTS to avoid processing rows that have already been mapped, something like this:

insert into dbo.KeyMappingTable (shipto, salpha, ssalpha)
select distinct shipto, salpha, ssalpha
from dbo.Source
where not exists (
    select *
    from dbo.KeyMappingTable
    where shipto = dbo.Source.shipto and salpha = dbo.Source.salpha and ssalpha = dbo.Source.ssalpha
 )

然后便有了映射,因此您可以执行以下操作:

Then you have the mapping, so you can do this:

insert into dbo.DimShipTo (shipto_id, shipto /*, etc. */)
select
    m.shipto_id,
    s.shipto -- etc.
from
    dbo.KeyMappingTable m
    join dbo.Source s
    on m.shipto = s.shipto and m.salpha = s.salpha and m.ssalpha = s.ssalpha
where
    not exists (
        select *
        from dbo.DimShipTo
        where shipto_id = m.shipto_id
    )

您还应该查看 MERGE ,如果您使用的是Type 1维度,并且只想在地址或其他属性更改时更新它们(这通常是一个有用的命令),这将非常方便。但这仅适用于SQL Server 2008;您没有提到您使用的SQL Server版本。

You should also look at MERGE, which is convenient if you're using a Type 1 dimension and just want to update addresses or other attributes when they change (and it's a useful command in general). But it's only available from SQL Server 2008; you didn't mention what version or edition of SQL Server you're using.

这篇关于SQL数据仓库,需要使用TSQL SELECT或更好的替代方法来填充我的DIMENSION的帮助吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆