SQL Query Optimization Techniques

In this article we will learn SQL query optimization techniques. There are some useful practices to reduce the cost of the query so that it can perform better. To achieve this, you need to write the query, check query performance using IO statistics and using execution plan, and then optimize it. Let’s discuss few of the points below.

Indexing:

An indexing is one of the best ways to optimize your query and it enhance data retrieval speed too. In indexing there are two operations scan or a seek. “Index seek” is used for filtering rows on a matching filter. “Index scan” is used for scanning the entire index for matching criteria. Overall, there are many types of “Indexes” but most common are:

  • Cluster Index: (It can be only one and applied on “Primary key” column of the table)
  • Non-Clustered Index: (There can be multiple Non-Clustered Index and generally applied on where clause columns)

Important points which one must be taken care while creating indexes:

  • We should create Indexes on keys that we use in WHERE clause and in our join statements. It can be either one cluster index or multiple non-cluster index.
  • Multiple non-cluster index should be avoided because it can reduce the performance rather than boosting. Specially your insert/update/ delete operation will slow down because of the logically reordering of the data.
  • If required, multiple non-cluster Indexes should NOT be made on columns that are frequently modified i.e., columns in which INSERT/UPDATE is occurring frequently.
  • Non-cluster Indexes should be made on ordering key values (like ASC or DESC order). This will enhance your query performance.
  •  Indexes should be made on WHERE clause and other columns which we use using AND operator within the query to filter data.

Avoid using SELECT with * Asterisks:

In next part of SQL Query Optimization techniques, we should not use * Asterisk in SQL select Query. It is big overhead because it first scans whole table i.e., it’s all columns and all rows then after this it will start fetching the data from table.  It better to pass the column names as per the requirement. If we’ll pass the columns names explicitly then column scanning cost of the table can be avoided.

See below simple select query is showing the”7 rows affected” means first it scan the table then then it fetches the data of table (which can be seen in results tab).

To avoid the scan overhead we can use “set no count on” and using this we get rid of table scan and there is no message of ”7 rows affected” in below screen shot.

SQL Query Optimization Techniques

Avoid using SELECT with DISTINCT:

Select DISTINCT command in SQL query is used for fetching unique results and remove duplicate rows from the returned result set. Internally DISTINCT get all rows and then groups together related rows and then removes duplicate form it. Additionally, GROUP BY operation can also be used to get unique data but is also a costly operation. We should avoid using DISTINCT in our query considering its high execution cost. You can observe execution time cost with and without DISTINCT in below screen.

SQL Query Optimization Techniques

Without Distinct time is 122 ms and with DISTINCT it is 175 ms. So, with this example we observe that DISTINCT provides the extra overhead to the SQL Query and we should avoid this as much as possible.

SQL Query Optimization Techniques

Inner joins vs WHERE clause:

We should use inner join for merging two or more tables rather than using the WHERE clause. WHERE clause creates the CROSS join/ CARTESIAN product for merging tables. CARTESIAN product of two tables takes a lot of time. SQL Joins performance is far better than WHERE clause and CARTESIAN product of tables.

IN versus EXISTS:

IN operator is more costly than EXISTS in terms of scans cost. Moreover, cost increases when subquery returns a large dataset, so we should try with EXISTS rather than using IN for fetching results with a subquery.

--query with IN
SET STATISTICS TIME ON
Select  * from [dbo].[Orders]
 where orderId in(
			      Select  orderId from [dbo].[Order Details]
				 )  
--Query with EXISTS
 
SET STATISTICS TIME ON
Select  * from [dbo].[Orders]
 where  exists(
			      Select  orderId from [dbo].[Order Details]

)  

Output of above query will give same result set but EXISTS is more efficient and faster way of getting result.

Loops versus Bulk insert:

The loops must be avoided because it requires running the same query many times. Instead, we should opt for bulk inserts for inserting the large data into SQL tables.

--Example of inserting DATA using SQL loops:
SET STATISTICS TIME ON
DECLARE @Counter INT 
SET @Counter=1
WHILE ( @Counter <= 20)
BEGIN
    PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
INSERT INTO [dbo].[Customers]
           ([CompanyName]
           ,[ContactName]
		   ,[Country])
     VALUES
           ('ABC Tech. Pvt. Ltd.'
		   ,'Rakesh kumar'
           ,'INDIA')
    SET @Counter  = @Counter  + 1

END

--Example of Bulk insert:
BULK INSERT Sales
FROM 'C:\Rakesh\Sales.csv'
WITH (FIRSTROW = 2,
    FIELDTERMINATOR = ',',  -- each column data will be separated with comma 
    ROWTERMINATOR='\n',     -- each row data will ends with \n
    BATCHSIZE=25000,        -- this is number of records inserted in one go
    MAXERRORS=2);           -- Max error supported before complete

Note: You can create sales table with required columns, each row’s data will end with “\n” and each column data will be separated with “,” comma. Datatype and number of columns should match in CSV and table to work it correctly. 

I hope this article has given you enough understanding of optimizations of your SQL queries. You can also refer HERE for more information.

For more knowledge of MS SQL, you can check this section HERE

Leave a Reply

Your email address will not be published. Required fields are marked *