Wednesday, May 15, 2013

SSIS: SCD-Slowly Changing Dimension

SCD-Slowly Changing Dimension

In this post I will try to include everything about SCD.

What is SCD?

SCD is Slowly Changing Dimension. As the name suggests, a dimension which changes slowly. For Example, say there is a table Employee, which stores information regarding employee as below:

BusinessEntityID, NationalIDNumber, First_Name, last_Name LoginID, OrganizationNode
OrganizationLevel, JobTitle, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag
CurrentFlag, ModifiedDate

In this Employee table, the data for an employee doesn't change very often, but yes we can’t say that the changes won’t be there. The changes, which may happen, are

·         Mistakenly spelling of First_Name is stored incorrect.
·         The employee gets married and marital status changes.
·         Last_Name changes.
·         The employee gets promotion and job designation changes and organization level changes.
·         The columns which doesn't change except if we assume that no mistake happens while data entry are HireDate, Gender, NationalIDNumber

The changes discussed don’t happen frequently, but may happen after certain time.

SCD supports four types of changes
changing attribute, historical attribute, fixed attribute, and inferred member.

Type 1 (changing attribute): When the changes in any attribute or column overwrites the existing records.

For example; as discussed first name of employee is misspelled and wrong spelling is stored in first name of that employee. For making the first name correct, we don’t need to add one more record for the same employee, so we can overwrite the first name. SCD which does this kind of changes comes into type 1 category. This SCD transformation directs these rows to an output named Changing Attributes Updates Output.

Emp ID
First Name
Last Name
1
Rajan
Gupta
1
Ranjan
Gupta

This SCD transformation directs these rows to an output named Changing Attributes Updates Output.

Type 2 (historical attribute): when we need to maintain the history of records, whenever some particular column value changes.

For example the employee gets promotion, designation changes and organization level changes. In such case we need to maintain the history of the employee, that with which designation he joined, and when his designation and organizational level changes.

For these kinds of changes, there will be multiple records for the same employee with different designation. Then to indentify the current records, we can either add a column as current flag, which will be ‘y’ for the current or latest records, Or else we can add two column as start date and end date (expiry date), through which we can maintain history of employees records. This SCD directs these rows to two outputs: Historical Attribute Inserts Output and New Output.

EmpID
FirstName
DEsignation
StartDate
EndDate
Current
1
Ranjan
Graduate Engineer
20-01-2010
25-01-2011
N
1
Ranjan
Analyst Programmer
25-01-2011
25-01-2012
N
1
Ranjan
Business Analyst
25-01-2012
1-01-2099
Y


Fixed attribute: when the attribute must not change.

For example HireDate, Gender, NationalIDNumber should never change. So whenever changes will occur in these columns value then either it should throw error or the changes can be saved in some other destination. But changes should not be applied in the columns.

This SCD transformation detects changes and directs the rows with changes to an output named Fixed Attribute Output.

Inferred member:  are those records of the dimension, which are found missing during fact load. 

For example, say there is a fact table which contains employee and department information. While generating the fact table from employee_stg table and department_stg table , sometimes happenes that employee_stg contains some departments name which has no records in department table , and during fact table generation those records are found missing from department table, these kind of member of dimension department are called inferred member. It’s like ‘Fact arriving earlier than dimensions’.

 This SCD transformation directs these rows to an output named Inferred Member Updates. When data for the inferred member is loaded, you can update the existing record rather than create a new one.


Please provide your feedback for the post, if you find this post useful.

16 comments:

  1. it is very useful.Now i got clear understand about SCD.Thank you so much.Really it was very excellent.

    ReplyDelete
  2. Anurag Khare, Aug 1st, 2013 @ 10:05 AM

    Its really very clear and useful information about the SCD. I need some option for incremental load. I have implemented this in my project and its working fine...
    How can I validate about Inferred Member Updates. Pls reply if possible...

    Thanks in Anticipation...

    ReplyDelete
  3. It is very clear to understand.Thank you so much.

    ReplyDelete
  4. Excellent article..very nice. Thanks for posting.

    ReplyDelete
  5. This is the article i am looking for. Thank you.

    ReplyDelete
  6. Brilliant Article ....Thanks for Posting..

    ReplyDelete
  7. Nicely articulated, it can be very well understood by a novice. Thanks for your efforts.

    ReplyDelete
  8. Hey Ragini, many thanks for the post. extremely helpful.

    ReplyDelete
  9. Its is very useful with simple and clear information to understand.. Thank you.

    ReplyDelete
  10. Which SCD types is not supported by SSIS, type 3 or type 4.

    ReplyDelete
  11. what are the other alternatives for scd and which one is used in real time... please let me know....thanks

    ReplyDelete
  12. its really clear please also write basics, like behind the scene also if possible

    ReplyDelete