Surrogate key
- What is surrogate key?
- Where do we use a surrogate key?
- How it is different from the natural key?
- How do we implement surrogate key?
- What is the advantage of using a surrogate key?
- What is surrogate key?
As the name suggests Surrogate means ‘substitute’.
The surrogate key is just another substitute key, that will be used to uniquely identify the
rows in a dimension table. The surrogate key will have no business meaning.
It’s
internally generated by the system and is invisible to the user or application.
Thus its value cannot be modified by user or application.
- Where do we use a surrogate key?
If the table is not having any attribute (or natural key), that uniquely identify a
row.
If the
attribute used as unique identifier uses much storage than required. For
example, if the attribute is alphanumeric.
When
there is doubt of changes in the values of the attribute (used for unique
identification of the row)
If the
table participates in more than one relationship from the logical data model.
Then data needs to be fetched from multiple related tables using joins.
If we
want to improve the performance of data retrieval from multiple tables.
- Difference between Surrogate key and natural key
Surrogate key
|
Natural key
|
Don’t have
business meaning
|
Have business
meaning, for e.g. SSN number
|
|
These can be
changed.
|
Small in size,
generally of integer type. Can be created as identity column
|
Bigger in size,
generally alphanumeric or string,
|
Comparison
takes less time , as it compares to integer, joins are fast
|
Comparison takes much time, as it converts string or
alphanumeric to Ascii , and then comparison takes place, joins are slow
|
No locking
contentions because of unique constraint, as the surrogates get generated by the
DB and are cached – very scalable.
|
Locking
contention may occur, as two sessions may try to insert the same unique
business key,
|
Not searchable
from application
|
searchable
|
An additional index is needed.
|
No additional
index required
|
There are often generated by the system .
|
Records cannot be inserted untill the key is not known, as it
has got business meaning
|
Let's take a simple, classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's
what would be in your Employee Dimension). This employee has a turnover
allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee
'E1' is moved from Business Unit 'BU1' to Business Unit 'BU2.' The entire new
turnovers have to belong to the new Business Unit 'BU2' but the old one should
belong to the Business Unit 'BU1.'
If
you used the natural business key 'E1' for your employee within your
datawarehouse everything would be allocated to Business Unit 'BU2' even what
actually belongs to 'BU1.'
If
you use surrogate keys, you could create on the 2nd of June a new record for
the Employee 'E1' in your Employee Dimension with a new surrogate key.
This
way, in your fact table, you have your old data (before 2nd of June) with the
SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take
the SID of the employee 'E1' + 'BU2.'
You
could consider Slowly Changing Dimension as an enlargement of your natural key:
natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee
Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with
the natural key enlargement process, is that you might not have all part of
your new key within your fact table, so you might not be able to do the join on
the new enlarge key -> so you need another id.