Showing posts with label sqlserver 2008. Show all posts
Showing posts with label sqlserver 2008. Show all posts

Monday, November 11, 2013

Surrogate key

Surrogate key

  • What is surrogate key?
  • Where do we use a surrogate key?
  • How it is different from the natural key?
  • How do we implement surrogate key?
  • What is the advantage of using a surrogate key?


  • What is surrogate key?

As the name suggests Surrogate means ‘substitute’.
The surrogate key is just another substitute key, that will be used to uniquely identify the rows in a dimension table. The surrogate key will have no business meaning.
It’s internally generated by the system and is invisible to the user or application. Thus its value cannot be modified by user or application.

  • Where do we use a surrogate key?

If the table is not having any attribute (or natural key), that uniquely identify a row.
If the attribute used as unique identifier uses much storage than required. For example, if the attribute is alphanumeric.
When there is doubt of changes in the values of the attribute (used for unique identification of the row)
If the table participates in more than one relationship from the logical data model. Then data needs to be fetched from multiple related tables using joins.
If we want to improve the performance of data retrieval from multiple tables.


  • Difference between Surrogate key and natural key

Surrogate key
Natural key
Don’t have business meaning
Have business meaning, for e.g. SSN number

These can be changed.
Small in size, generally of integer type. Can be created as identity column
Bigger in size, generally alphanumeric or string,
Comparison takes less time , as it compares to integer, joins are fast
Comparison  takes much time, as it converts string or alphanumeric to Ascii , and then comparison takes place, joins are slow
No locking contentions because of unique constraint, as the surrogates get generated by the DB and are cached – very scalable.

Locking contention may occur, as two sessions may try to insert the same unique business key,
Not searchable from application
searchable
An additional index is needed.  
No additional index required
There are often generated by the system .
Records cannot be inserted untill the key is not known, as it has got business meaning



Let's take a simple, classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is moved from Business Unit 'BU1' to Business Unit 'BU2.' The entire new turnovers have to belong to the new Business Unit 'BU2' but the old one should belong to the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.
This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.






Thursday, October 17, 2013

How to stop any user from dropping any database



How to stop any user from dropping any database


you can prevent casual database deletions with a trigger:

CREATE TRIGGER STOP_DB_DROP
ON ALL SERVER
FOR DROP_DATABASE
AS
PRINT 'You are not allowed to DROP databases'
ROLLBACK;

Monday, August 26, 2013

Sql Server: index part 4

a.    What is unique index?
b.    What are the advantages and disadvantages of index?


Unique index

  •            By default, Primary key constraint creates a unique clustered index on a table, whereas unique constraint creates unique non clustered index.
  •       When the index is dropped, primary key constraint is also dropped.
  •       Unique is a property of an index (for both clustered and non clustered index.
  •      There is no difference between unique constraint and unique index. When unique constraint is added, then index also gets created behind the scene.
  •       If a UNIQUE constraint is added to a column that has duplicated values, the Database Engine returns an error and does not add the constraint.

Advantages of indexes
  • Indexes provide faster data access during specific data fetch and retrieval form huge data tables. Below are the scenarios in which it works well.
  • Update and delete commands also work well when specific records or range of records are searched for modification or deletion. For example, if an index is created on salary column of employee table, and one has to modify salary for the employees who are having salary 10000, so it will be easier and faster to look for particular records and apply delete and update operations.
  • In case of sorting also, when we ask for a sorted dataset, the database will try to find an index and avoid sorting the results during execution of the query. 
  • In case of grouping also, say one has to count the number of employee as per annual compensation (salary column), and the index is created on salary. Since matching salary appears in consecutive index entries, then the database will be able to count the number of employee at each salary quickly. While if index was not there on salary then it will first sort the results.


Disadvantage
  •  Additional disk space is required in case of non-clustered index.
  • Insert, update and delete can become slow. As it has to locate the records then it takes time in case of huge data and too many indexes are there, because on each DML operation indexes will also be updated.
  • If we update a record and change the value of an indexed column in a clustered index, the database might need to move the entire row into a new position to keep the rows in sorted order. This behavior essentially turns an update query into a DELETE followed by an INSERT, with an obvious decrease in performance. 
for more questions and answers on indexes
Previous      Next

Sql Server: index part 3

a.    What is non clustered index?

b.    How many non clustered indexes can be created on one table?

c.    Can you tell me some example on non clustered index that relates to real life experience?

d.    How non-clustered is different from clustered indexes?



Non-Clustered Index


 non-clustered index can be created as shown below:
create nonclustered index indexname on [TableName](ColumnName)

Some points regarding non-clustered indexex 
  • It’s similar to index in textbook, index at one place and data at other place. These indexes will have the pointers to the storage location of the actual data.

  • A table can have more than 1 non-clustered index, just like in a book, one index in starting of the book, another at the end of the book.

  • A table can only have one Clustered index and up to 249 Non-Clustered Indexes.  If a table does not have a clustered index it is referred to as a Heap.


Difference between Clustered and non clustered index
  • Max one clustered index and max 249 non clustered can be created on each table.

  • Clustered indexing is faster than non-clustered index. As clustered index will always cover  a query ,since it contains all of the data in the leaf node itself.

  •  Non-clustered indexes requires extra disk storage space.

for more questions and answers on indexes
Previous      Next

Sql Server: index part 2

a.    What is a clustered index?
b.    How many clustered indexes can be there for one table?
c.    Can you tell some example on clustered index that relates to real life experience?
d.    When you declare primary key constraint, then which index is created on that column?


Clustered Index

Clustered index is created on Primary key automatically, if there is no clustered index already exists on table. This index determines physical order of data in a table. As there can be only one way, in which data can be arranged on disk or memory, therefore there can be only one clustered index on each table.

One can create clustered index explicitly as shown below
create clustered index indexname on [TableName](ColumnName)

 Some points on clustered indexes
  •  Even when the data is inserted in random order, it will be arranged as per the clustered index created on a particular column. For example, a clustered index is created on empid of employee table, then even though the order of records(empid) inserted is for e.g. 1,4,5,3,2, But when the data will be fetched from the table through a select query, you will see the order of records as 1,2,3,4,5. Data is automatically arranged on sequential ascending order after each insert.

  •  It’s similar to a telephone directory in which data is arranged by the last name.

  • Clustered an index that includes more than one column are called Composite Clustered Index. 

  • Only one clustered index can be created on a table, before creating another, one has to drop the existing clustered index. 

  • A clustered index can be created on a maximum of 16 columns.

  • Clustered the index will always cover a query, since it contains all of the data in the leaf the node itself, or should say table itself

for more questions and answers on indexes
Previous      Next

Sql Server: index part 1

a.    What is index? Why do we need indexes?
b.    What are the various types of indexes in sql server?
c.    How will you check if indexes already exist on a table
d.   
What will happen in case of no indexes?


Indexes are created to provide faster data access during specific data fetch and retrieval form huge data tables.

There are various types of indexes in sql server
a.    Clustered Index
b.    Non clustered index
c.    Unique
d.    Filtered
e.    Spatial
f.     Xml
g.    Full text
h.    Index with computed column
i.     Index with non computed column
  
    Using below procedure we can identify, if indexes exist on a table
Execute sp_helpindex table_name
For example:


In case of No index
If no index exists on a table, a table scan must be performed for each table referenced in a database query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially. Although a table scan might be more efficient for a complex query that requires most of the rows in a table, for a query that returns only some table rows an index scan can access table rows more efficiently.

Next for more questions and answers on indexes
Understanding on Indexes


Friday, June 14, 2013

SqlServer: return range of records

Problem:
How to get top 10 records then records from 11-20 then 21-30
The scenario is like, if the index from front end is given, then it should return records accordingly,
Index   rows
1              1-10
2              11-20
3              21-30
Solution:
declare @index int
set @index=2

;with cte
as
(select *, ROW_NUMBER() over(order by BusinessEntityId )as row_num from HumanResources.Employee
)

select * from cte where row_num>=(@index*10-9) and row_num<=(@index*10)

In the above solution, I have applied this on table ‘HumanResources.Employee’ of AdventureWorks2008 database. It returns records for index 2 thus returns records from 11-20
We can make this solution generic,
select *, ROW_NUMBER() over(order by [Column_NameForOrderBy] )as row_num from [Table_Name]

Please provide your feedback for the post, if you find this post useful.


Monday, May 20, 2013

Sql Server: Local temporary table behavior in nested stored procedure with example


Local temporary table behavior in nested stored procedure with example


local temporary table scope remains only inside the stored procedure, in which it is created. Any inner stored procedure can access the local temporary table created in outer procedure. But any outer stored procedure can not access the local temporary table created in inner procedure.

Error you may getMsg 208, Level 16, State 0, Procedure procedure1, Line 6
Invalid object name '#temp1'.

Scenario 1: when one procedure calls another procedure and temp table is created in outer procedure procedure1 and accessed by inner procedure procedure2
Scenario 2: when one procedure calls another procedure and temp table is created in inner procedure procedure2 and accessed by outer procedure procedure1
create proc procedure1
as
begin
create table #temp1(name varchar(30),age int)
insert into #temp1
select 'ram',20
union all
select 'shyam',23
union all
select 'anjum',28
exec procedure2
end

GO;

create proc procedure2
as
begin
select * from #temp1
end

GO;
create proc procedure2
as
begin
create table #temp1(name varchar(30),age int)
insert into #temp1
select 'ram',20
union all
select 'shyam',23
union all
select 'anjum',28
end

GO;

create proc procedure1
as
begin
exec procedure2
select * from #temp1
end

GO;


As local temporary table scope is inside the stored procedure, in which it is created.

Procedure2 is nested inside procedure1 , thus procedure2 has access to temp table #temp1.
As local temporary table scope is only inside the stored procedure in which it is created.

Temp table #temp1 is created in procedure2, which is called by procedure1, thus any procedure inside procedure2 can access that table, but when procedure 1 tries to access the table #temp1 , which is an outer procedure, doesn’t find the table #temp1


Please provide your feedback for the post, if you find this post useful.

Friday, May 17, 2013

Sql Server: Difference between Row_Number, Rank, Dense_Rank


Difference between Row_Number, Rank, Dense_Rank

Syntax and use:

Row_Number 
Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

ROW_NUMBER ( )     OVER ([<partition_by_clause>] <order_by_clause>)

Rank 
Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.

RANK ( )    OVER ([< partition_by_clause >] < order_by_clause >)

Dense_Rank
 Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.

DENSE_RANK ( )    OVER ([<partition_by_clause> ] < order_by_clause > )

NTILE 
Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
NTILE (integer_expression) OVER ([<partition_by_clause>] < order_by_clause >)

Where
<partition_by_clause>
Divides the result set produced by the From clause into partitions to which the Row_Number/ Rank/ Dense_Rank/ Ntile function is applied.
<order_by_clause>
Determines the order in which the Row_Number/ Rank/ Dense_Rank/ Ntile values are applied to the rows in a partition. 

We will apply these function on the below customer product table CustProd.

name
Product
cust1
decoder
cust2
cable
cust1
cable
cust2
package
cust3
decoder
cust3
cable

Please see the below snapshot for understanding of these function through example



With partition by product and order by name,

When we use partition by product, then it divides the result on the basis of product, as there are three distinct products then there will be 3 partitions.

After partition, order by name is used, that means, in the partitions Row Number, Rank or Dense Rank will be assigned as per the order of name. Here in the below result we see that rank ,row number and dense rank, all are having same value, It’s because in each partition there are distinct name given, if name would have been repeated for the same product then those records will have same rank and dense rank, but row number would have been same as shown below.


When used order by product instead of name , then we see in the below result that, the Rank and dense Rank were 1, Because we did partition of result by product , that means there will be common product in each partition , and rank and dense rank will also be same for same product.

Please provide your feedback for the post, if you find this post useful.