在Oracle中实施Type 2 SCD [英] Implementing Type 2 SCD in Oracle

查看:71
本文介绍了在Oracle中实施Type 2 SCD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我想说我是stackoverflow社区的新手,而SQL本身是新手,所以请原谅我,如果我对问题的格式不正确或没有清楚说明我的要求.

First I would like to say that I am new to the stackoverflow community and relatively new to SQL itself and so please pardon me If I didn't format my question right or didn't state my requirements clearly.

我正在尝试在Oracle中实现2型SCD.源表( customer_records )的结构如下所示.

I am trying to implement a type 2 SCD in Oracle. The structure of the source table (customer_records) is given below.

CREATE TABLE customer_records(
    day date,
    snapshot_day number,
    vendor_id number,
    customer_id number,
    rank number
);

INSERT INTO customer_records 
(day,snapshot_day,vendor_id,customer_id,rank)
VALUES
(9/24/2014,6266,71047795,476095,3103),
(10/1/2014,6273,71047795,476095,3103),
(10/8/2014,6280,71047795,476095,3103),
(10/15/2014,6287,71047795,476095,3103),
(10/22/2014,6291,71047795,476095,3102),
(10/29/2014,6330,71047795,476095,3102),
(11/05/2015,6351,71047795,476095,3102),
(11/12/2015,6440,71047795,476095,3103);

上表每周更新一次,我已经提取了由 vendor_id customer_id 代表的特定客户的记录.这样,每个客户都有一个唯一的 vendor_id customer_id .我正在尝试跟踪客户层( rank )中的更改.可能会发生这样的情况,即客户的等级可能会保持几个星期不变,而我们仅愿意跟踪客户等级发生变化的时间.

The above table is updated weekly and I have pulled records for a particular customer represented by vendor_id and customer_id. Such that each customer will have a unique vendor_id and customer_id. I am trying to track the changes in the tier (rank) of a customer. It may so happen that the customer's tier may remain same for several weeks and we are only willing to track when there is a change in the tier of the customer.

所需的输出(尺寸表)如下所示:

The desired output (dimension table) would look something like this:

SK  Version   Date_From    Date_To    Vendor_id   Customer_Id  Rank_Id

1     1       9/24/2014    10/22/2014    71047795            476095       3103
2     2       10/22/2014   11/05/2015    71047795            476095       3102
3     3       11/05/2015   12/31/2199    71047795            476095       3103

这样,每当客户层进行更改时,我们都会在新表中进行跟踪.另外,要为最新层添加 current_flag ='Y'.

Such that whenever customer's tier hit a change we track that in a new table. Also, wanting to include the current_flag = 'Y' for the most current tier.

我希望能够使用合并做到这一点.

I want to be able to do it using merge.

推荐答案

此处是一种在检测到更改时对具有相同层的连续记录进行分组的方法.

Here is an approach to group consecutive records having the same tier, while detecting changes.

这个想法是自联接表,并将每个记录与具有不同层的下一个记录相关联.这是通过使用 NOT EXISTS 条件和相关子查询来完成的.

The idea is to self-join the table, and to relate each record to the next record that has a different tier. This is done using a NOT EXISTS condition with a correlated subquery.

LEFT JOIN ,以避免过滤出没有下一个记录的最后一条记录(拥有当前层):对于此记录,我们使用 COALESCE()设置默认的结束日期.

LEFT JOIN is needed, to avoid filtering out the last record (that owns the current tier), which does not have a next record yet : for this record, we use COALESCE() to set up a default end date.

SELECT 
    c1.day day_from,
    COALESCE(c2.day, TO_DATE('2199-12-31', 'yyyy-mm-dd')) day_to,
    c1.Vendor_ID,
    c1.Customer_ID, 
    c1.rank
FROM customer_records c1
LEFT JOIN customer_records c2 
    ON  c2.Vendor_ID = c1.Vendor_ID
    AND c2.Customer_ID         = c1.Customer_ID
    AND c2.rank <> c1.rank
    AND c2.DAY                 > c1.DAY
    AND NOT EXISTS (
        SELECT 1
        FROM customer_records c3
        WHERE
                c3.Vendor_ID = c1.Vendor_ID
            AND c3.Customer_ID         = c1.Customer_ID
            AND c3.rank <> c1.rank
            AND c3.DAY                 > c1.DAY
            AND c3.DAY                 < c2.DAY
    )

这将返回:

 DAY_FROM  | DAY_TO    | Vendor_ID | Customer_ID | rank
 :-------- | :-------- | ------------------: | ----------: | -----------------:
 24-SEP-14 | 22-OCT-14 |            71047795 |      476095 |               3103
 01-OCT-14 | 22-OCT-14 |            71047795 |      476095 |               3103
 08-OCT-14 | 22-OCT-14 |            71047795 |      476095 |               3103
 15-OCT-14 | 22-OCT-14 |            71047795 |      476095 |               3103
 22-OCT-14 | 12-NOV-15 |            71047795 |      476095 |               3102
 29-OCT-14 | 12-NOV-15 |            71047795 |      476095 |               3102
 05-NOV-15 | 12-NOV-15 |            71047795 |      476095 |               3102
 12-NOV-15 | 31-DEC-99 |            71047795 |      476095 |               3103

现在,我们可以按记录层和结束日期对记录集进行分组,以生成预期结果. ROW_NUMBER()可以为您提供版本号.如上所述,也很容易检查哪个记录是当前记录.

Now we can group the record set by tier and end date to generate the expected results. ROW_NUMBER() can give you the version number. It is also easy to check which record is the current one, as explained above.

SELECT 
    ROW_NUMBER() OVER(ORDER BY c2.day) version,
    DECODE(c2.day, NULL, 'Y') current_flag,
    MIN(c1.day) day_from,
    COALESCE(c2.day, TO_DATE('2199-12-31', 'yyyy-mm-dd')) day_to,
    c1.Vendor_ID,
    c1.Customer_ID, 
    c1.rank
FROM customer_records c1
LEFT JOIN customer_records c2 
    ON  c2.Vendor_ID = c1.Vendor_ID
    AND c2.Customer_ID         = c1.Customer_ID
    AND c2.rank <> c1.rank
    AND c2.DAY                 > c1.DAY
    AND NOT EXISTS (
        SELECT 1
        FROM customer_records c3
        WHERE
                c3.Vendor_Id = c1.Vendor_Id
            AND c3.Customer_ID         = c1.Customer_ID
            AND c3.rank <> c1.rank
            AND c3.DAY                 > c1.DAY
            AND c3.DAY                 < c2.DAY
    )
GROUP BY
    c1.Vendor_Id, 
    c1.Customer_ID, 
    c1.rank, 
    c2.day
ORDER BY
    day_from

结果:


VERSION | CURRENT_FLAG | DAY_FROM  | DAY_TO    | Vendor_ID | Customer_ID | rank
------: | :----------- | :-------- | :-------- | ------------------: | ----------: | -----------------:
      1 | N            | 24-SEP-14 | 22-OCT-14 |            71047795 |      476095 |               3103
      2 | N            | 22-OCT-14 | 12-NOV-15 |            71047795 |      476095 |               3102
      3 | Y            | 12-NOV-15 | 31-DEC-99 |            71047795 |      476095 |               3103


在Oracle中,您可以使用


In Oracle you can turn any select into a merge query using the MERGE syntax. You can match on all columns expected current_flag and day_to, and update these if a record already exists ; else, just insert a new one.

MERGE INTO dimensions dim
USING (
   -- above query goes here --
) cust 
    ON  dim.DAY_FROM            = cust.DAY_FROM
    AND dim.vendor_id = cust.vendor_id
    AND dim.Customer_ID         = cust.Customer_ID
    AND dim.rank  = cust.rank
WHEN MATCHED THEN UPDATE SET 
    dim.DAY_TO = cust.DAY_TO,
    dim.CURRENT_FLAG = cust.CURRENT_FLAG
WHEN NOT MATCHED THEN 
    INSERT (
        dim.DAY_FROM, 
        dim.VERSION, 
        dim.CURRENT_FLAG, 
        dim.DAY_FROM, 
        dim.DAY_TO, 
        dim.vendor_id, 
        dim.customer_id, 
        dim.rank
    ) VALUES (
        cust.DAY_FROM, 
        cust.VERSION, 
        cust.CURRENT_FLAG, 
        cust.DAY_FROM, 
        cust.DAY_TO, 
        cust.vendor_id, 
        cust.Customer_ID, 
        cust.rank
    )

这篇关于在Oracle中实施Type 2 SCD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆