21 Mar 2016

How to load Slowly Changing Dimension Type 2 using Oracle Merge Statement

By using Oracle Merge statement, we are able to perform an insert and update statements (sometime referred to as “upsert”) in one query. The merge statement primarily useful in data warehousing situations, especially when maintaining Type 2 Slowly changing dimension (SCD2), where large amounts of data are commonly inserted and updated

To explore how the merge statement works, let’s use the following set of tables

Product dimension table: dim_product.

CREATE TABLE DIM_PRODUCT
(
PRODUCT_SID NUMBER NOT NULL,
PRODUCT_ID NUMBER NOT NULL,
PRODUCT_CATEGORY_ID NUMBER,
PRODUCT_NAME VARCHAR2(50 CHAR),
PRODUCT_DESCRIPTION VARCHAR2(200 CHAR),
PRICE NUMBER(8,2),
PRICE_EFFECTIVE_DATE DATE,
LAST_MODIFIED_DATE DATE,
EFFECTIVE_START_DATE DATE,
EFFECTIVE_END_DATE DATE,
IS_CURRENT_ROW VARCHAR2(1 CHAR),
CREATED_DATE DATE,
UPDATED_DATE DATE
);

Source stage table: stg_product.
This is a stage table use to store the newly modified product records from the product table.

CREATE TABLE STG_PRODUCT
(
PRODUCT_ID NUMBER NOT NULL,
PRODUCT_CATEGORY_ID NUMBER,
PRODUCT_NAME VARCHAR2(50 CHAR),
PRODUCT_DESCRIPTION VARCHAR2(200 CHAR),
PRICE NUMBER(8,2),
PRICE_EFFECTIVE_DATE DATE,
LAST_MODIFIED_DATE DATE,
CREATED_DATE DATE
);



Let’s also created a sequence to use for the dim_product dimension table ‘s surrogate keys.


CREATE SEQUENCE s_dim_product
START WITH 1
MAXVALUE 9999999999999999999999999999
MINVALUE 1
NOCYCLE
CACHE 20
NOORDER;

2. SCD Logics and the Merge Query

A slowly changing dimension type 2 (SCD2) is defined to keep track changes of attributes in the underlying source table. While there are many attributes in a table, the business rules decide what changes to track. Here, for demonstration purpose, let’s choose to keep track of changes of each attribute in the underlying product source table, which is to keep history of production name, product description, and product category and price changes.

The merger statement provides the abilities to perform insert and update with only one select statement based upon a join condition. Rows already in the target table that match the join condition are updated; otherwise, a row is inserted using values from a separate subquery.

An SCD2 normally has three additional columns. The data effective start and end date of the record, and a flag to indicate the current record. To maintain histories, a new record is created to store the changing attributes while we keep the exiting records as histories. To help achieve that, a small utility table with only tow records in it is created. In this utility table, one record indicates the insert flag and the other indicates the update flag. Here the table structure:

CREATE TABLE SCD_ROW_TYPE
(
SCD_ROW_TYPE_ID NUMBER NOT NULL,
SCD_ROW_TYPE_DESCRIPTION VARCHAR2(20 CHAR)
);

After creating the table, insert the following records into the table:

Insert into scd_row_type(scd_row_type_id,SCD_ROW_TYPE_DESCRIPTION) values(1,'For Insert');
Insert into scd_row_type(scd_row_type_id,SCD_ROW_TYPE_DESCRIPTION) values(2,'For Update');
commit;

To make it easier to understand how the merge statement work when we actually perform some tests with data. First, let’s issue the following insert statements to create two records in the stage table and dim_product table.

 insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID,  PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE,  LAST_MODIFIED_DATE,  CREATED_DATE)
        values (1,8,'Apple iPad 36 GB','Apple iPad 36 GB with case', 800,sysdate,sysdate,sysdate);

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID,  PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE,  LAST_MODIFIED_DATE,CREATED_DATE)
         values (2,7,'Canon 12 MPIX Digital Camera','Canon 12 MPIX Digital Camera, Zoon Len, Case', 150,sysdate,sysdate,sysdate);

commit;

Insert into DIM_PRODUCT (PRODUCT_SID,PRODUCT_ID,PRODUCT_CATEGORY_ID,PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE,LAST_MODIFIED_DATE,EFFECTIVE_START_DATE,EFFECTIVE_END_DATE,IS_CURRENT_ROW,CREATED_DATE,UPDATED_DATE) values (19,2,7,'Canon 12 MPIX Digital Camera','Canon 12 MPIX Digital Camera, Zoon Len, Case',150,to_date('21-MAR-16','DD-MON-RR'),to_date('21-MAR-16','DD-MON-RR'),to_date('21-MAR-16','DD-MON-RR'),to_date('31-DEC-99','DD-MON-RR'),'Y',to_date('21-MAR-16','DD-MON-RR'),to_date('21-MAR-16','DD-MON-RR'));

Insert into DIM_PRODUCT (PRODUCT_SID,PRODUCT_ID,PRODUCT_CATEGORY_ID,PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE,LAST_MODIFIED_DATE,EFFECTIVE_START_DATE,EFFECTIVE_END_DATE,IS_CURRENT_ROW,CREATED_DATE,UPDATED_DATE) values (20,1,8,'Apple iPad 36 GB','Apple iPad 36 GB with case',800,to_date('21-MAR-16','DD-MON-RR'),to_date('21-MAR-16','DD-MON-RR'),to_date('21-MAR-16','DD-MON-RR'),to_date('31-DEC-99','DD-MON-RR'),'Y',to_date('21-MAR-16','DD-MON-RR'),to_date('21-MAR-16','DD-MON-RR'));

commit;

Now the below statements updated a record where product_id equal to 1 to change it price to 700, and inserted one more new record, product_id 3, into the stg_product table. To help understand what the records set looks like from the using clause of the merge statement, let’s copy the query and execute it to get the record set.
---
update stg_product set price = 700 , price_effective_date = sysdate, last_modified_date = sysdate where product_id = 1;

insert into stg_product (
PRODUCT_ID,
PRODUCT_CATEGORY_ID,
PRODUCT_NAME,
PRODUCT_DESCRIPTION,
PRICE,
PRICE_EFFECTIVE_DATE,
LAST_MODIFIED_DATE,
CREATED_DATE)
values (3,6,'EPSon Color Printer',
'EPSon color print with high definition photo prints', 200,sysdate,sysdate,sysdate);
 commit;

 Now we are ready to construct the merge query

MERGE INTO dim_product p
USING ( SELECT DECODE(s.scd_row_type_id,1,-6789,m.product_sid) as product_sid,
                   PRODUCT_ID,
                   PRODUCT_CATEGORY_ID,
                   PRODUCT_NAME,
                   PRODUCT_DESCRIPTION,
                   PRICE,
                   PRICE_EFFECTIVE_DATE,
                   LAST_MODIFIED_DATE,
                   CREATED_DATE,
                   m.scd_row_type_id
              FROM    (SELECT dp.product_sid,
                              sp.PRODUCT_ID,
                              sp.PRODUCT_CATEGORY_ID,
                              sp.PRODUCT_NAME,
                              sp.PRODUCT_DESCRIPTION,
                              sp.PRICE,
                              sp.PRICE_EFFECTIVE_DATE,
                              sp.LAST_MODIFIED_DATE,
                              sp.CREATED_DATE,
                              CASE
                                 WHEN dp.product_id IS NULL
                                 THEN
                                    1
                                 WHEN (dp.product_category_id !=
                                          sp.product_category_id
                                       OR dp.product_name != sp.product_name
                                       OR DP.PRODUCT_DESCRIPTION !=
                                             sp.product_description
                                       OR dp.price != sp.price
                                       OR dp.price_effective_date !=
                                             sp.price_effective_date)
                                 THEN
                                    2
                                 ELSE
                                    0
                              END
                                 AS scd_row_type_id
                         FROM    stg_product sp
                              LEFT JOIN
                                 Dim_product dp
                              ON (sp.product_id = dp.product_id and  dp.is_current_row = 'Y')
                     ) m
                   JOIN  scd_row_type s
                   ON (s.scd_row_type_id <= m.scd_row_type_id)
                   ) mp
        ON  (p.product_sid = mp.product_sid)
when matched then
   update set P.EFFECTIVE_END_DATE = mp.LAST_MODIFIED_DATE, is_current_row = 'N', updated_date = sysdate
when NOT matched then
insert (P.PRODUCT_SID,P.PRODUCT_ID,P.PRODUCT_CATEGORY_ID, P.PRODUCT_NAME, P.PRODUCT_DESCRIPTION, P.PRICE, p.PRICE_EFFECTIVE_DATE,P.LAST_MODIFIED_DATE,
 p.effective_start_date,P.EFFECTIVE_END_DATE,is_current_row, created_date,
 updated_date )
 values (s_dim_product.nextval,mp.PRODUCT_ID,mp.PRODUCT_CATEGORY_ID,mp.PRODUCT_NAME,
 mp.PRODUCT_DESCRIPTION, mp.PRICE,mp.price_effective_date,
mp.LAST_MODIFIED_DATE,mp.last_modified_date,
 to_date('2099-12-31 00:00:00','YYYY-MM-DD HH24:MI:SS'),'Y', sysdate,sysdate);
  commit;

No comments:

Post a Comment