The index is actually an orderly data structure that speeds up the query. The implementation of the index relies on the tree.
0x00 Disk I/Os
The data of the database is stored in the directory. That is, the data is the file.
Ones who have studied the principles of computer composition would know that the MySQL searches files by reading the file data.
The storage location of the data on the disk is not continuous.
When data needs to be read from the disk, the system transmits the logical address to the disk. The control circuit of the disk translates the logical address into a physical address according to the addressing logic. That is, to determine which track and sector the data to be read is on. In order to read the data in this sector, the magnetic head needs to be placed above this sector. To achieve this, the magnetic head needs to move to align with the corresponding track. This process is called seek, and the time it takes is called seek time. Then the disk rotates to rotate the target sector under the magnetic head. The time this process takes is called rotation time. Coupled with the process of transferring data in the disk, it is the entire process required for the disk to read the data once.
The disk reads data by pre-reading according to pages. When a piece of data is used, the nearby data is usually used immediately.
A relational database is similar to a two-dimensional table structure. If there is no index, then the disk needs to traverse the entire data table to realize a conditional search.
For example, 100 pieces of data can be considered as 100 disk I/Os, so the efficiency is too low, which needs to be improved!
0x01 Why not use a binary tree?
The characteristic of the binary tree is that the keys of the nodes presenting on the left sub-tree are smaller than the keys of their parent node, and the right sub-tree has key values greater than the parent node.
If the index is based on a binary tree, then there will be the following problems:
If the index is a continuous and ordered sequence, then the height of the binary tree is exactly equal to the number of sequences, and it grows unilaterally.
0x02 Red/Black Tree
A red-black tree is a kind of self-balancing binary search tree that regulates the construction of the tree through its own rotation.
The red-black tree can avoid the "unilateral growth" problem in an ordinary binary tree, but it does not limit the height of the tree. For example, if there are 20 million pieces of data, then the height of this red-black tree is about 25 layers. That means 25 I/Os. So is there a better way to improve?
0x03 B/B+ Tree
According to the leftmost prefix principle, we generally put the column with the highest sorting frequency on the left, and so on.