Thursday, September 3, 2015

Database Indexing Basics

Database Indexing allows us to cut down the number of rows/records that need to be examined when a select query with a where clause is executed.
Let's try to understand what happens when a select query is executed on a database table without an index.
Suppose we have an Employee table, with fields 'Employee_Id' and 'Employee_Name'. We want to execute the following select query on this table:


When this query is executed, what goes on behind the scenes? Every single row is checked to see if the employee_name matches with 'abc'. Which effectively means that the entire table will have to be scanned. This is known as full table scan.

An index helps us avoid a full table scan. It is a data structure, that stores the values for a certain specific column of a table. Let's look at the common data structures that are used for database indexing.

Data Structures for Indexing


1. B-trees
B-trees are the most commonly used data structures for indexes. As they are time-efficient for lookups, deletions and insertions. All these operations can be done in logarithmic time. Also, the data being stored inside B-tree can be sorted.

2. Hash Tables
The indexes that use hash tables are generally referred to as hash index. Hash tables are extremely efficient for looking up values. So queries that look for an exact match can be executed very fast. The way a hash index works is that the key is the column value, and the value in the hash table is a pointer to the row data in the table. However, hash tables are not sorted data structures. And hence, for other type of queries, they may not be efficient.

3. R-tree
This is commonly used with spatial databases. It helps with queries like "find all the coffee shops within 2 kilometers of my location".

4. Bitmap Index
Works for columns that have many instances of those values, that is columns with low selectivity. For example, a column with boolean values.

So, what is the cost of having a database index?
The first thing is the index takes additional space, and the larger your table, the larger is your index. Every time you perform an add, delete or update operation, the same operation needs to be performed on the index as well.

Creating an index
The following snippet shows how to create an index on a single column and multiple columns.


As a general rule, an index should only be created on a table if the data in the indexed column will be queried frequently.

References:
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/