The optimization of IS NOT NULL
An SQL database in a client's system, after data masking:
In the SQL database, the TEST11 table is associated with the DUAL table. The OWNER field in the WHERE condition has an index in which SQL processes the execution plan of the range scan. The SQL Server Engine performs 2117 logical reads during one scan,with an execution frequency as high as tens of thousands of times per minute. The execution plan is as follows:
- Initial optimization
WHERE clause consists of two conditions, which are [OWNER='OUTLN'] and [OBJECT_TYPE IS NOT NULL]. Since the field outputted from the query is CREATED, a three-column OWNER+OBJECT_TYPE+CREATED joint index, which can eliminate the cost of returning to the table, would be a better option. After creating the index, logical reads reduce from 2117 to 82. The execution plan is as follows:
- The exploration of extreme optimization-indexing principle
By analyzing this SQL, it can be found that SQL merely need the minimum value of the time column CREATED and other values are not necessary. So, is there a way to take out only the minimum value and ignore other data? If it can be done, then logical read will be further reduced.
The index is composed of a root block, a branch block, and a leaf block. The data of the index is arranged in order in the leaf block, which means the data with the minimum value will be stored at the minimum end of the index block. Theoretically, you can take a block from one of the leaf blocks, and you can get the minimum value of a specific index.
- Simplified version of index optimization to retrieve min/max
For a better understanding, we simplify the problem to retrieve the minimum (or maximum) value of CREATED from the table.
The maximum/minimum value of CREATED in TEST11 table needs to be retrieved:
Assuming that there is an index of the CREATED field, then it is entirely possible to take only the outermost block of the leaf block to get the required value.
Let's have a test and create a test table:
Create a CREATED index, and then run the simplified SQL. According to the index principle, it can be told that 3-4 logical reads should be required: start from the root node -> find the rightmost Branch (may be 0-2 reads, according to the level of the index) -> find the rightmost Leaf Block.
The execution is shown as follow, and the result is the same as we imagined before. The execution plan is INDEX FULL SCAN (MIN/MAX).
Imagine a slightly more complicated scenario: What if what you need to get is the maximum CREATED value that meets the specified conditions?
If we need to take the largest CREATED value that meets OWNER ='OUTLN'. The SQL is as follows:
If the value needs to be retrieved is the maximum that meets the condition of OWNER = 'OUTLN’, the SQL clause is shown as follow:
If there is a (OWNER, CREATED) composite index, the database can use a similar method to take only one of the leaf nodes. The execution plan is INDEX RANGE SCAN (MIN/MAX), and it performs 3 logical reads:
Then if it is SELECT MAX(CREATED) FROM TEST11 WHERE OWNER= ‘OUTLN’ AND OBJECT_TYPE = ‘TABLE’, a new index (OWNER, OBJECT_TYPE, CREATED) is needed to complete the same action. The execution plan is INDEX RANGE SCAN (MIN/MAX), which requires 3 logical reads:
In other words, if the column in the condition matches the leading column of the index exactly and the MAX/MIN value of the last column in the index is taken, then SQL will apply the optimal INDEX RANGE SCAN (MIN/MAX).
- Convert the condition of IS NOT NULL to the index
Back to original SQL:
In addition to OWNER='OUTLN' in the original SQL, there is also an OBJECT_TYPE IS NOT NULL. Directly creating a (OWNER, OBJECT_TYPE, CREATED) joint index consumes 82 logical reads, which cannot achieve the optimal effect. This is because OBJECT_TYPE IS NOT NULL cannot correspond to a specific value of the index. The execution plan is as follows:
We need to determine the OBJECT_TYPE IS NOT NULL condition as a specific value so that it can be matched with a specific value in the index. SQL needs to be rewritten:
1). Remove DUAL, which will not affect the result set:
2). Turn OBJECT_TYPE IS NOT NULL into a specific value, where CASE WHEN is used.
The SQL becomes:
3). Create a functional index. That is to say, the record of OBJECT_TYPE IS NOT NULL is stored as 1 in the index.
At this time, the logical reads drop to 3:
- Optimization Result
After creating a CASE WHEN index for production SQL and rewriting it, the logical reads drop to 75, which do not follow the optimal INDEX RANGE SCAN (MIN/MAX) execution plan. This is because MAX (T1. CREATED) is placed at the outermost layer in this way, that is to say, the value of MAX is based on the associated result set instead of the TEST11 table.
In this SQL, the TEST11 table is associated with the DUAL table. The value of NVL (MAX (T1. CREATED), SYSDATE) is retrieved. Even if there is no data, the value of SYSDATE will be returned, which means the DUAL table will not be affected the result. After removing the DUAL table, the SQL logic reads will drop to 3:
Finally, an extreme optimization of high-frequency SQL from 2117 to 3 has been completed, which can be improved by hundreds of times.
Optimization for IS NULL
Another SQL was found in the system. After data masking, the SQL is shown as follow:
SQL executes the execution plan of the full table scan. The TEST table has 11 million data with 0 data updated each time. The slowest step of the execution plan is the full table scan step of the TEST table with ID=3. In the WHERE clause, TCODE IS NULL can filter out all data. But we know that NULL value does not exist in the index. Under normal circumstances, IS NULL can only scan the entire table. Yet SQL performance will be poor if there is a lot of data in the entire table. Like IS NOT NULL, can this part of the NULL values with very good filterability be stored in the index?
In other words, is it possible to use a function to convert the NULL value to a fixed value that does not exist in the field in the table (assuming 0)? By converting the non-NULL value to NULL, the original NULL value is stored in the index, which is 0 after the function conversion. Consequently, it can ensure that the index is minimized because the original IS NOT NULL data does not exist in it.
The conversion is as follows:
CODE IS NULL CASE WHEN TCODE IS NULL THEN 0 END
Using CASE WHEN instead of NVL (TCODE, 0) function can minimize the index, because existing value of the original TCODE does not need to be stored in the index, and there is no need to consider the situation that the NVL may be the same as the original value.
Create a functional index and modify the original SQL, the logical read is reduced from the original value of more than 800,000 to 1:
After optimization, the SQL performance has improved tens of thousands of times.