SCD
Type 2, step by step
Type 2 (historical attribute): when we need to maintain the history
of records, whenever some particular column value changes.
By
considering Type2, I assume that the employee gets promotion, Job Title changes.
In such case we need to maintain the history of the employee, that with which
designation or job Title he joined, and when his designation or Job Title changed.
For making such scenario, I have created two tables;
one is tmpPerson, which is created as follows by using Adventure Works database’s
tables:
---Start-TmpPerson
table ---
create table tmpPerson (EmpId int, Title varchar(10) Null,FirstName varchar(20), MiddleName varchar(20),LastName varchar(20),
JobTitle varchar(100),BirthDate Date,MaritalStatus char(1),Gender char(1),NationalIDNumber varchar(20)
,CurrentFlag
int , StartDate
date,EndDate date default ('2050-12-31'))
insert into tmpPerson (EmpId
, Title ,FirstName
, MiddleName ,LastName,
JobTitle ,BirthDate ,MaritalStatus
,Gender ,NationalIDNumber
,CurrentFlag, StartDate )
select e.BusinessEntityID,per.Title,per.FirstName,per.MiddleName,per.LastName,
e.JobTitle,e.BirthDate,e.MaritalStatus,e.Gender,e.NationalIDNumber,
e.CurrentFlag,e.ModifiedDate
from
HumanResources.Employee e inner join Person.Person per on e.BusinessEntityID=per.BusinessEntityID
----End
–TmpPerson table----
I have
also created one table tmpPersonStage similar to tmpperson, which stores the
records which are changed by changing any of the column value. For Type 2, I
have made some changes in Job Title, as below:
---Start--tmpPersonStage
table----
create table tmpPersonstage (EmpId int, Title varchar(10) Null,FirstName varchar(20), MiddleName varchar(20),LastName varchar(20),
JobTitle varchar(100),BirthDate Date,MaritalStatus char(1),Gender char(1),NationalIDNumber varchar(20)
,CurrentFlag
int , StartDate
date,EndDate date default ('2050-12-31'))
insert into tmpPersonstage (EmpId
, Title ,FirstName
, MiddleName ,LastName,
JobTitle ,BirthDate ,MaritalStatus
,Gender ,NationalIDNumber
,CurrentFlag, StartDate,EndDate )
select EmpId , Title ,FirstName , MiddleName ,LastName,
JobTitle ,BirthDate ,MaritalStatus
,Gender ,NationalIDNumber
,CurrentFlag, StartDate,EndDate
from dbo.tmpperson where
empid in (3,6,8,58,49)
----changes for
type 2
update
tmpPersonstage set JobTitle ='Design Engineer' where EmpId=3
update
tmpPersonstage set JobTitle ='Production Technician -
WC10' where EmpId=6
update
tmpPersonstage set JobTitle ='Production Technician -
WC50' where EmpId=8
update
tmpPersonstage set JobTitle ='Research and Development
Engineer' where EmpId=49
update
tmpPersonstage set JobTitle ='Engineering Manager'
, LastName='Keill' where EmpId=58
---End---tmpPersonstage
table -------
In the
above script, I have updated the JobTitle of some employees, for which initially
the job titles in tmpPerson table were as below
After
preparing the initial steps for the scenario, let’s move to SSIS package to
apply SCD transformation. The below steps are the implementation of SCD Type 2
Step 1:
Drag and drop one Data Flow Task in the
Control flow tabs.
And
name this DFT as ‘scd type 2’
Step2:
Go to
Data Flow tab, Drag and drop OleDB Source and Slowly Changing Dimension
transformation from Data flow transformations.
Step 3:
Configure OleDB source, here the source will be the
table which has changed or new records, and for which history will be maintained,
or inserted in the main table. In our scenario the source table is
‘tmpPersonStage’, which keeps some updated and new records for updated records
history will be maintained and new records will be inserted into the main table
‘tmpPerson’
Step 4:
Configure the SCD
transformation;
double click on SCD transformation, one wizard will open as below:
Click
on next and create either new connection or already created connection to
AdventureWorks2008, and while mapping the input columns to dimension column, at least one column need to be mentioned
as ‘Business Key’.
Business key is the column on the basis of
which the updation or insertion into ‘tmpPerson’ will be made. Here we have
mentioned EMPID as BusinessKey, so if EmpId of input table ‘tmpPersonStage’ is found in ‘tmpPerson’ table then the respective
row in ‘tmpPerson’ will be updated with
the changes from ‘tmpPersonStage’ input table
Click
on next. In the below window, change type need to be mentioned on certain
column. Change Type can be ‘Fixed attribute’, ‘Changing attribute’ and
‘Historical Attribute’
As in this scenario is based on Historical
attribute we will select some columns for which history will be maintained if
found modified and new record will be inserted.
Go to
Next and select one of the below option,
As in ‘tmpPerson’,
we have both the option available one through ‘CurrentFlag’ column and another
through ‘Startdate’ and ‘EndDate’.
We can
opt for first option and use column ‘CurrentFlag’ as the indicator of the
current record. By this the updated record will be inserted with ‘CurrentFlag’
value as ‘1’ and the records which became old for the respective ‘EmpID’ the ‘CurrentFlag’
will be set to ‘0’.
And if
we opt for second option then we need to update the old record’s EndDate with ‘GetDate
()’ and need to insert the updated record with ‘StartDate’ as ‘GetDate ()’. For
new records, we need to use StartDate as ‘GetDate ()’ .
Through
this post, I am selecting first option, and I will be using ‘CurrentFlag’ Column to show the current record.
Go to
next and don't select ‘Enable inferred member support’, as we don’t require it for
this scenario.
Click next,
which displays New Records, other Outputs(Historical Attribute Output), which
means there will be two arrows from SCD, one will take the new records, another
will take the records for which history needs to be maintained.
Click next
and Finish the wizard. As we finish the wizard we see some transformation like
‘OleDb command’ and ‘Insert Destination’ are automatically created.
Oledb command will update the ‘CurrentFlag’
column of old records and will set it to ‘0’ for the ‘EmpID’ for which records
are updated. These all will be union all with all new records along with the
new updated records having ‘CurrentFlag’ 1 and will be inserted into the main
table ‘tmpPerson’
Step 5:
Step 5.a:
Double
click on Derived Column, and we see that one column with ‘Replace Currentflag’
has been derived from current flag and the value is set to ‘0’.
Step 5.b:
Double
click on Oledb command, and it will be configured automatically as below. The
connection will be automatically configured to table ‘tmpPerson’ of Adventure
works 2008, in which the records will be updated.
Also
when clicked on Component Properties tab then the update command is also
configure, which we can see on string value editor.
The
update command updates Current Flag to 0 on the basis of EmpId , and Current
Flag(value 1)
In
column mapping tab the parameter are mapped with the columns mentioned in
query. These Parameters contain the Replace Current Flag column from derived
transformation and EmpID from SCD
Step
5.c:
When
clicked on next Derived Column transformation, the same Current Flag column is
derived column of Replace Current flag, and set it to 1, for new and updated
records.
Step
5.d:
Insert
Destination will also be configured automatically, to insert the new records
coming from SCD to tmpPerson table.
Step 7:
Execute the DFT
After
execution, we can see that, we updated 5 records in the tmppersonstage table,
which are inserted into tmpperson table, along with the old records current
flag updated as 0. As there was no new records, so no row is inserted into
tmpperson.