Data Profiling task
Data
profiling task, as names suggests Profile of Data. Here using this task we can
get information on data in any table.
If
this task is performed for say Employee table, then profile of data may contain
- The number of rows in the table.
- The number of distinct values in the State column.
- The number of null or missing values in the Zip column.
- The distribution of values in the City column.
- The strength of the functional dependency of the State column
As quality of data is very important when one
tries to do analysis on data for business purpose. Then a valid data will
always improve the business decision making.
By using these information, quality issues can
be minimized that might occur from using the source data.
We can use the output of this task in various
scenarios for making decisions on the data; some of them are as below:
- Checking data quality before an incremental load. Use the Data Profiling task to compute the Column Null Ratio Profile of new data intended for the Customer Name column in a Customers table. If the percentage of null values is greater than 20%, send an e-mail message that contains the profile output to the operator and end the package. Otherwise, continue the incremental load.
- Automating cleanup when the specified conditions are met. Use the Data Profiling task to compute the Value Inclusion Profile of the State column against a lookup table of states, and of the ZIP Code/Postal Code column against a lookup table of zip codes. If the inclusion strength of the state values is less than 80%, but the inclusion strength of the ZIP Code/Postal Code values is greater than 99%, this indicates two things. First, the state data is bad. Second, the ZIP Code/Postal Code data is good. Launch a Data Flow task that cleans up the state data by performing a lookup of the correct state value from the current Zip Code/Postal Code value.
Below are the steps to be
followed for Data Profiling task
Step 1: Drag and drop a Data Profiling
Task from Control flow item.
Step 2: Double click on the task and
click on destination, and create a new file connection, or use the existing one
Step
3: Click on ‘Quick Profile’
Create a new ADO.Net Connection or, use the existing one to one database, and select the table or view, for which data profiling will be done.
and select all the Compute
Step 4:
Execute the Task
Step 5:
Then
go to AllPrograms->SqlServer 2008->IntegrationServices ->Data Profile
Viewer
And
open Data Profile Viewer, and open the text file, which you have used to store
the data profile of table.
It
looks like below
Now
you can go through various properties of the table’s columns, data etc
Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.
Thank you so much for providing information about SSIS and other similar aspects.
ReplyDeleteSSIS postgresql read