Wednesday, July 31, 2013

Lookup vs Join

Lookup transformation in ssis vs. join in T-sql


Lookup transformation in SSIS and joins in sql server, both are used to reference some table or referece set. Still both have adavantages over each other.


Lookup
T-sql join

If the reference dataset is small, then use lookup file. As the whole file is loaded into main memory at the time of processing. So it can easily search and get back the matched record quickly than getting from the local disks.


If the reference datasets are big enough to cause trouble, use a join. This can involve I/O if the data is big enough, but the I/O is all highly optimized and sequential. Join processing is very fast and never involves paging or other I/O.

 SSIS can well handle the source dataset coming from a heterogeneous database like text file, DB2 etc. and can reference the tables via lookup.


Its complex to implement through join in sql server


TSQL can cope with more complicated business logic especially in the case of dependency between two datasets.


SSIS buffer cannot hold all dimension data (>1M rows), and can the whole package can stuck for some time at lookup.


It can handle as many data as required.

Blocking(or say “asynchronous”) components used like “Sort”, “aggregation



Error out No-Match rows – we can use this facility of lookup transformation, and use the rows which have no matching key in the reference table, we can redirect the no match rows to different destination for further analysis


N/A

via lookup in SSIS solution, we can easily configure the DB connections to solve this problem, which makes the design more loose-coupled and flexible to the changes like DB name changing or even DB migrate to the other server.


If using JOIN, what you have to do is to hard code the DB name or say to use corss-DB join.

Use if you have got SSIS developerJ

Use join when customer requests to wrap the business logic outside of SSIS and put more logics close to Database layer as much as possible. 

Thursday, June 27, 2013

SSIS: Logging in SSIS

What is Logging?

SSIS includes log provider, which can be used to implement logging in Package, container and tasks. With logging, one can capture run-time information about a package, helping audit and troubleshoot a package every time it is run. For example, a log can capture the name of the operator who ran the package and the time the package began and finished.

Also when SSIS log provider for SQL Server is enabled, then one table ‘sysssislog’ is created in the database. This table can be customized by adding some more columns like number of records transferred.

When you add the log to a package, you choose the log provider and the location of the log. 

Integration Services includes the following log providers:

  • The Text File log provider, which writes log entries to ASCII text files in a comma-separated value (CSV) format. The default file name extension for this provider is .log.

  • The SQL Server Profiler log provider, which writes traces that you can view using SQL Server Profiler. The default file name extension for this provider is .trc.

  • The SQL Server log provider, which writes log entries to the sysssislog table in a SQL Server database.

  • The Windows Event log provider, which writes entries to the Application log in the Windows Event log on the local computer.

  • The XML File log provider, which writes log files to an XML file. The default file name extension for this provider is .xml.

How to implement Logging in Package 

This post will show how to configure logging through SQL Server log provider.

Step1: 
Create one SSIS package create some DFTs. Or we use any already available SSIS package. My package includes two DFTs SCD Type1 and SCD Type 2.



Step 2:
 Right click, and select Logging



Step 3:
a.    Select provider Type ‘SSIS log provider for SQL Server’ from dropdown, and click on ‘Add’ Button


b.    Check the checkbox and select ‘New Connection’ from configuration dropdown


c.    Create a new OleDB connection to the database in which you want ‘sysssislog’ table to be created.

d.    Then go details tab and select the events which you want to be logged during package execution. One can select events for package, container or some tasks also.

In the below snapshot, I selected events for package and two DFT’s SCD type 1 and SCD type 2. One can select various events like ’On error’,’On Post Validate’, ‘On Pre Validate’ etc.



e.    If we click on ‘Advanced’ button, then one can select the events along with the information which values will be recorded in sysssislog table.



After everything is done, click on Ok button. But this will not create sysssislog table in the database (for this post I have used adventure works) also the table will not be created if we only execute some task like DFT etc. The table sysssislog will only get created, if we execute the whole package.

 So when you execute the package you can check the sysssislog table in your database, which you configured.


We will get to see the below information in sysssislog table


Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.