How MySQL works - Chapter 7

Posted by transparencia on Tue, 01 Feb 2022 10:29:12 +0100

This chapter mainly talks about some application rules of index.

7.1

CREATE TABLE single_table (
	id INT NOT NULL AUTO_INCREMENT,
	key1 VARCHAR(100),
	key2 INT,
	key3 VARCHAR(100),
	key_part1 VARCHAR(100),
	key_part2 VARCHAR(100),
	key_part3 VARCHAR(100),
	common_field VARCHAR(100),
	PRIMARY KEY (id),
	KEY idx_key1 (key1),
	UNIQUE KEY uk_key2 (key2),
	KEY idx_key3 (key3),
	KEY idx_key_part(key_part1, key_part2, key_part3)
) Engine=InnoDB CHARSET=utf8;

This table will also be used extensively in subsequent chapters.

7.2

Although the index can facilitate us to search quickly, maintaining the index also has a price.
Spatially, we definitely need a separate space to maintain the index.
In terms of time, when we modify the record, the index will also change.

In addition, you need to generate a query plan before executing a query. In each query plan, only one secondary index is usually used.
If we build a large number of indexes, it will slow down the process of analyzing the query plan overhead.

7.3

Scanning interval

select * from single_table where id >= 2 and id <= 100;

The scan interval of this sql is [2, 100], and the boundary conditions are ID > = 2 and ID < = 100

select * from single_table where key2 in (1438, 6328) or (key2 >= 38 and key2 <= 79);

Among them, 1438 6328 points themselves are also considered as the scanning interval, which is called the single point scanning interval. The boundary condition is [1438, 1438].

Not all search conditions can be boundary conditions. For example, where key1 < 'a'. The scan interval is an open interval from negative infinity to 'a'.
At this time, there are no boundary conditions.

To sum up, when using an index to execute a query, the key problem is to find the appropriate scanning interval through the search conditions, and then scan the records of the index column values in these scanning intervals in the corresponding B + tree. First find out all available indexes and scanning intervals, and then select them comprehensively.

All search conditions can generate appropriate scanning intervals

select * from single_table where key2 > 100 and key2 > 200;
select * from single_table where key2 > 100 or key2 > 200;

In both sql, search criteria can be used to help reduce the search scope.

Some search criteria can not generate appropriate scanning interval

select * from single_table where key2 > 100 and common_field = 'abc';

In this sql, common_ The field field does not help reduce any scanning range because the index on key2 does not involve common_field
Equivalent to the intersection of [100, INF] and (- INF, INF).
In this case, we can directly regard the second search condition as true.

Find out the scanning interval from the complex search conditions

select * from single_table where 
(key1 > 'xyz' and key2 = 748) or
(key1 < 'abc' and key1 > 'lmn') or
(key1 like '%suf' and key1 > 'zzz' and (key2 < 8000 or common_field = 'abc'));

This sql can be said to be very complex. However, we may also use the index of key1 or the index of key2.
Suppose we use the index of key1, it can be reduced to

(key1 > 'xyz' and true) or (key1 < 'abc' and key1 > 'lmn') or (true and key1 > 'zzz' and (true or true))
(key1 > 'xyz') or (key1 < 'abc' and key1 > 'lmn') or (key1 > 'zzz')
(key1 > 'xyz') or (key1 > 'zzz')
key1 > 'xyz'

This can be continuously simplified.

Similarly, for key2 analysis, the process is as follows

(true and key2 = 748) or (true and true) or (true and true and (key2 < 8000 or true))
true

The corresponding scan interval when executing a query using a joint index

select * from single_table where key_part1 = 'a';

We can locate the first record of = a and scan it one by one.

select * from single_table where key_part1 = 'a' and key_part2 = 'b';

The reason is the same as above. Locate the first record that meets the conditions and scan it later.

select * from single_table where key_part1 = 'a' and key_part2 = 'b' and key_part3 = 'c';

ditto

select * from single_table where key_part1 < 'a';

For this search condition, we need to find the first record less than 'a', and then scan backward until a record does not meet the condition.

select * from single_table where key_part1 = 'a' and key_part2 > 'a' and key_part2 < 'd';

We still locate the first qualified record, and then query backward until it does not meet the conditions

select * from single_table where key_part2 = 'a';

It doesn't match the leftmost prefix. We have to scan it all.

select * from single_table where key_part1 = 'a' and key_part3 = 'c';

Here is also the leftmost prefix that does not fully match. First locate the first record that meets part1='a ', and then look for it one by one until it does not match part1='a '

select * from single_table where key_part1 < 'b' and key_part2 = 'a';

Scan back from the first record satisfying Part1 < 'b'. Until a record does not meet Part1 < 'b'.

select * from single_table where key_part1 <= 'b' and key_part2 = 'a';

When a line is scanned, part1 = 'b' and part2! = ' The recording of a 'can be stopped. Because part123 is ordered.

Index for sorting

In MySQL, the method of sorting in memory / disk is called file sorting. However, because the index itself is ordered, we may omit the sorting step when searching on the index. For example, searching a primary key or a joint index is a case where the index is overwritten.

Note that when using the joint index for sorting, if you want the index to help us omit the sorting steps, the order of the columns in order by should be consistent with that in the index.

Several situations in which indexes cannot be used for sorting

1. Mixed use of ASC desc. This is supported in mysql 8.0, and the descending index is introduced, which was not supported before.
The 2-row sequence contains columns with different indexes

select * from single_table order by key1, key2 limit 10;

In this example, index sorting cannot be used.
Three row sequence is the index column of a joint index, but the order of these row sequences is different from that in the joint index.
4 the index column used to form the scanning interval is different from the sorting column

select * from single_table where key1 = 'a' order by key2 limit 10;

Form the scanning interval with key1, and then sort with key2.

The 5-Row sequence does not appear in the order by clause as a separate column name.

select * from single_table order by upper(key1) limit 10;

Index for grouping

select key_part1, key_part2, key_part3, count(*) from single_table group by key_part1, key_part2, key_part3;

If there is no idx_key_part index, you have to create a temporary table for statistics, and fill the intermediate results of statistics into this temporary table when scanning the records of cluster index.
But now there is a joint index, so there is no need to create a temporary table.

7.4

Although the normal scanning of the whole table does not accelerate the search, the disk io is sequential and relatively fast.
You can return to the table every time. Because the primary key value is irregular, the disk io is random when returning to the table.
Therefore, when scanning a large number of columns, the overhead of returning to the table may be greater than sequential scanning of the whole table.
The query optimizer will do this work to see whether to use the secondary index + back to the table or directly scan on the clustered index.

7.5

This section lists some tips that use indexes

Create indexes only for columns used for searching, sorting, joining, and grouping
Consider the number of non duplicate values in the index column
Here's an explanation. For example, there are only two values for gender. If we build an index for gender, the index will hardly help us speed up the search, and the search method of secondary index + back table will be particularly deficient. If the potential is relatively high, the secondary index + back table will not be so deficient
The type of index column should be as small as possible
Especially the primary key. If the primary key is small, the space occupied by the secondary index will laugh, and the B + tree consumes relatively few pages
Index column prefixes

alter table single_table drop index idx_key1;
alter table single_table add index idx_key1 (key1(10));

For example, if the varchar column is too long, we can index only the first 10 letters. This saves space, but it can't be used to simplify sorting

Overlay index
The index contains all the columns to query. We can build a joint index of commonly used combinations to reduce the overhead of returning to the table
Let the index appear separately in the search criteria in the form of column names

select * from single_table where key2 * 2 < 4;
select * from single_table where key2 < 4/2;

The first one won't go to the index

The newly inserted value should preferably be outside the existing interval
For example, insert it into the leftmost or rightmost end, which can avoid large-scale internal division and affect the lower edge at most
Redundant and duplicate indexes
For columns that have been covered by the joint index or primary key index, there is usually no need to create a separate index

Topics: Database MySQL

Programmer Think