Wednesday, July 31, 2013

Lookup vs Join

Lookup transformation in ssis vs. join in T-sql

Lookup transformation in SSIS and joins in sql server, both are used to reference some table or referece set. Still both have adavantages over each other.

T-sql join

If the reference dataset is small, then use lookup file. As the whole file is loaded into main memory at the time of processing. So it can easily search and get back the matched record quickly than getting from the local disks.

If the reference datasets are big enough to cause trouble, use a join. This can involve I/O if the data is big enough, but the I/O is all highly optimized and sequential. Join processing is very fast and never involves paging or other I/O.

 SSIS can well handle the source dataset coming from a heterogeneous database like text file, DB2 etc. and can reference the tables via lookup.

Its complex to implement through join in sql server

TSQL can cope with more complicated business logic especially in the case of dependency between two datasets.

SSIS buffer cannot hold all dimension data (>1M rows), and can the whole package can stuck for some time at lookup.

It can handle as many data as required.

Blocking(or say “asynchronous”) components used like “Sort”, “aggregation

Error out No-Match rows – we can use this facility of lookup transformation, and use the rows which have no matching key in the reference table, we can redirect the no match rows to different destination for further analysis


via lookup in SSIS solution, we can easily configure the DB connections to solve this problem, which makes the design more loose-coupled and flexible to the changes like DB name changing or even DB migrate to the other server.

If using JOIN, what you have to do is to hard code the DB name or say to use corss-DB join.

Use if you have got SSIS developerJ

Use join when customer requests to wrap the business logic outside of SSIS and put more logics close to Database layer as much as possible.