Tuesday, April 30, 2013

SSIS: Data Profiling task


Data Profiling task


Data profiling task, as names suggests Profile of Data. Here using this task we can get information on data in any table.

If this task is performed for say Employee table, then profile of data may contain

  •          The number of rows in the table.
  •          The number of distinct values in the State column.
  •          The number of null or missing values in the Zip column.
  •          The distribution of values in the City column.
  •          The strength of the functional dependency of the State column

As quality of data is very important when one tries to do analysis on data for business purpose. Then a valid data will always improve the business decision making.

By using these information, quality issues can be minimized that might occur from using the source data.

We can use the output of this task in various scenarios for making decisions on the data; some of them are as below:

  •        Checking data quality before an incremental load. Use the Data Profiling task to compute the Column Null Ratio Profile of new data intended for the Customer Name column in a Customers table. If the percentage of null values is greater than 20%, send an e-mail message that contains the profile output to the operator and end the package. Otherwise, continue the incremental load.

  •          Automating cleanup when the specified conditions are met. Use the Data Profiling task to compute the Value Inclusion Profile of the State column against a lookup table of states, and of the ZIP Code/Postal Code column against a lookup table of zip codes. If the inclusion strength of the state values is less than 80%, but the inclusion strength of the ZIP Code/Postal Code values is greater than 99%, this indicates two things. First, the state data is bad. Second, the ZIP Code/Postal Code data is good. Launch a Data Flow task that cleans up the state data by performing a lookup of the correct state value from the current Zip Code/Postal Code value.



Below are the steps to be followed for Data Profiling task
Step 1: Drag and drop a Data Profiling Task from Control flow item.



Step 2: Double click on the task and click on destination, and create a new file connection, or use the existing one


Step 3: Click on ‘Quick Profile’

Create a new ADO.Net Connection or, use the existing one to one database, and select the table or view, for which data profiling will be done.
and  select all the Compute


Step 4: Execute the Task
Step 5:
Then go to AllPrograms->SqlServer 2008->IntegrationServices ->Data Profile Viewer

And open Data Profile Viewer, and open the text file, which you have used to store the data profile of table.


It looks like below


Now you can go through various properties of the table’s columns, data etc

Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.

SSIS: New connection


New connection

Apart from the connection manager like oledb, flat file, file connection, ADO.Net, connection from data source. There are other connections also available, which can be created by select New Connection in Connection manager, which will pop up a window as below. Now any kind of connection mentioned can be created.

Enjoy going through all the connection.  Especially the MultiFile and MultiFlatFile connection, through which, we don’t need to create multiple connection to flat file or files. It’s Awsome! J


For all other connection click on below links,
OleDB and ADO.Net Connection
Flat File Connection
File Connection
Connection from Data source

Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.

SSIS: New Connection from Data source


New Connection from Data source


For making connection to Data source, we need to create one data source.

Data source is nothing but a connection reference that is created outside a package. It can be created in the same manner as we did for OleDb connection.

The advantage of using data source connection is that it can be shared between many packages in a project. Therefore the synchronization becomes easier.

For example, if we have to change the server name, then we don't need to change it in all the connection in the packages that reference the same data source, instead we just need to change the server name in the data source, and it will reflect in all other connections in packages, which reference to the same data source.

 For this, go to the solution explorer and right click on Data source and select New Data Source. It will pop up a wizard.


In the Data Source Wizard, Click on new, and you will get the same window, which was used during OleDb connection.
  


After creating the Data Source, Go to the connection manager and select ‘New Connection from data source’, which will pop up one window as below with available data sources. You just need to select the data source, and your connection is created.


Now if we change the database or server name or any other property in Data Source, then it will also reflect in all the connection referencing the same data source. J

For all other connection manager click on below links
OleDb and ADO.Net connection
Flat File Connection
File Connection
New Connection

Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.

SSIS: File Connection

File Connection

File connection is used to make connection to any type of file, be it excel, flat, image, any type. Even we can use it to make connection to a folder instead of file. If the file or folder doesn’t exist then new folder or file can also be created.

File can also be created dynamically by providing expression in the properties of file connection manager.

Use in real time scenario:
Sometime it happens, we need to run our package on a scheduled basis say daily. Also, we need to save the result of the package in some folder to do analysis on the results generated.

The solution is to create the file dynamically, that is the file name will be generated on the basis of some system or user variable. It will be different each day, if we also use date in the file name. This can be done by setting the expression in file connection properties as shown below.

For creating the connection, go to the Connection Manager and select File Connection.

Go to the properties of File Connection Manager, and then go to expression. It will open a Property Expression Editor, where we can set property like connection string etc.

SSIS : Quick view on Flat File Connection

Quick view on Flat File Connection

Go to the Connection Manager, and select Flat File Connection. Then follow the steps as mentioned below:
  • Provide File path in file name.
  • Select format ‘delimited’, if the rows in the file are delimited by comma, space, tab etc.Fixed width:  if the Columns have a fixed width.Ragged right: Ragged right files are files in which every column has a fixed width, except for the last column. It is delimited by the row delimiter. 
  • Delimiter is nothing but separator character, which separated, rows, columns.
  • Select the Header row delimiter from the drop-down  say if the header in my file is delimited by comma, then select comma.
  • In Columns tab, Column and row delimiter can also be set. Here row delimiter is set to CR [LF], which is ‘enter’ and column delimiter is comma.
  • In Advanced tab, we can see the data type length etc assigned to the column, also we can change the data type and length of the column to prevent truncation of data or excess column width.
  • By default data type is string and minimum length 50.
  •  By clicking on ‘Suggest Column Types, it will assign column width to the columns, which will be maximum length of the text in that column, Also we can change the data type here only as per the destination column requirement. It will reduce overhead of Data conversion thing, if we do it here only.

















For all other connection click on below links,
OleDB and ADO.Net Connection
File Connection
Connection from Data source

New Connection

Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.

SSIS: OleDb Connection and ADO.NET Connection



Quick View on OleDb Connection and ADO.NET Connection

Both Oledb and ADO.NET connection can be created in the same manner, as shown below:
  
                             1                                                        2
                                                                  
     
3                                                            4

In 4th snapshot, one can also provide the connection string instead, if the connection string is in some file.

Saturday, April 27, 2013

SSIS : Using table variable in OLEdb source in data flow task doesn’t return rows to destination



Using table variable in OLEdb source in data flow task doesn’t return rows to destination

There was a scenario, where I need to transfer some data as the result of the complex query to a destination table. As the query was complex, so I was using table variable instead of temporary table like #tmptable1 etc. Although unlike table variable, we can’t create #temp table in the oledb source of SSIS. If we use temporary table like #tmptable, it shows invalid object #tmptable.

Well coming to the point, I created the below query using table variable @table1. I was using this query in the OleDb source of SSIS package as shown below .


 




I also created one destination table as shown below, which I will use as the destination for oledb source query.




When executed the package, it executed successfully. But in the data flow path it was not showing the number of rows returned from source. When checked in the destination table EmpDptInfo, It was empty.







Then I tried to use ‘SET NOCOUNT ON’ before all the queries in oledb source, which prevents it to return the number of rows.  When SET NOCOUNT is OFF, the count is returned.

I again executed the package and it was executed successfully along with result returning from source to destination.




Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.