3, Index optimization analysis

Posted by markszy on Mon, 07 Mar 2022 22:53:31 +0100

1. Reasons for performance degradation and slow SQL query

① Low performance of query statements;

② Index failure single value / composite index;

③ Too many associated queries and too many JOIN associated databases will also lead to too slow queries (design defects or business requirements)

④ Server tuning and parameter setting (buffer / thread pool, etc.);

2. Common Join queries

2.1 SQL execution sequence

① Incoming SQL statement

SELECT DISTINCT
<select_list>
FROM
<left_table>
<join_type> JOIN <right_table> ON <join_condition>
WHERE
<where_condition>
GROUP BY
<group_by_list>
HAVING
<having_conditon>
ORDER BY
<order_by_condition>
LIMIT <limit_number>

② Optimized SQL statement

With the iterative upgrade of MySQL version, its optimizer is constantly upgrading. The optimizer will analyze the performance consumption caused by different execution sequences and dynamically adjust the execution sequence. The following is the frequently occurring query sequence:

FROM
<left_table>
<join_type> JOIN <right_table> ON <join_condition>
WHERE
<where_condition>
GROUP BY
<group_by_list>
HAVING
<having_conditon>
SELECT DISTINCT
<select_list>
ORDER BY
<order_by_condition>
LIMIT <limit_number>

SQL parsing flow chart

When executing SQL statements, MySQL server starts FROM the FROM;

2.2 SQL JOIN

Database creation Table 1.2:

CREATE TABLE `t_dept` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`deptName` VARCHAR(30) DEFAULT NULL,
`address` VARCHAR(40) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `t_emp` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
`age` INT(3) DEFAULT NULL,
`deptId` INT(11) DEFAULT NULL,
empno int not null,
PRIMARY KEY (`id`),
KEY `idx_dept_id` (`deptId`)
#CONSTRAINT `fk_dept_id` FOREIGN KEY (`deptId`) REFERENCES `t_dept` (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
INSERT INTO t_dept(deptName,address) VALUES('Huashan Mountain','Huashan Mountain');
INSERT INTO t_dept(deptName,address) VALUES('a group of beggars','Luoyang');
INSERT INTO t_dept(deptName,address) VALUES('Mount Emei','Mount Emei');
INSERT INTO t_dept(deptName,address) VALUES('Wudang','Wudang Mountain');
INSERT INTO t_dept(deptName,address) VALUES('Ming religion','Bright Summit');
INSERT INTO t_dept(deptName,address) VALUES('Shaolin Temple','Shaolin Temple');
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('Breezy',90,1,100001);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('yue buqun',50,1,100002);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('linghu chong',24,1,100003);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('master hongqi',70,2,100004);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('Qiao Feng',35,2,100005);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('Abbess extinction',70,3,100006);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('Zhou Zhiruo',20,3,100007);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('Zhang Sanfeng',100,4,100008);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('zhang wuji',25,5,100009);
INSERT INTO t_emp(NAME,age,deptId,empno) VALUES('Wei Xiaobao',18,null,100010);

Table structure after creation:

mysql> show tables;
+---------------------+
| Tables_in_base_crud |
+---------------------+
| tbl_dept            |
| tbl_emp             |
+---------------------+
2 rows in set (0.00 sec)

mysql> select * from tbl_emp;
+----+--------------+------+--------+--------+
| id | name         | age  | deptId | empno  |
+----+--------------+------+--------+--------+
|  1 | Breezy       |   90 |      1 | 100001 |
|  2 | yue buqun       |   50 |      1 | 100002 |
|  3 | linghu chong       |   24 |      1 | 100003 |
|  4 | master hongqi       |   70 |      2 | 100004 |
|  5 | Qiao Feng         |   35 |      2 | 100005 |
|  6 | Abbess extinction     |   70 |      3 | 100006 |
|  7 | Zhou Zhiruo       |   20 |      3 | 100007 |
|  8 | Zhang Sanfeng       |  100 |      4 | 100008 |
|  9 | zhang wuji       |   25 |      5 | 100009 |
| 10 | Wei Xiaobao       |   18 |   NULL | 100010 |
+----+--------------+------+--------+--------+
10 rows in set (0.00 sec)

mysql> select * from tbl_dept;
+----+----------+-----------+
| id | deptName | address   |
+----+----------+-----------+
|  1 | Huashan Mountain     | Huashan Mountain      |
|  2 | a group of beggars     | Luoyang      |
|  3 | Mount Emei     | Mount Emei    |
|  4 | Wudang     | Wudang Mountain    |
|  5 | Ming religion     | Bright Summit    |
|  6 | Shaolin Temple     | Shaolin Temple    |
+----+----------+-----------+
6 rows in set (0.00 sec)

2.2.2 internal connection:

INNER JOIN keywordreturns a row when there is at least one match in the table. If the rows in the "TableA" table do not match in "TableB", they are not listed, that is, the common part of TableA and TableB.

SELECT
<select_list>
FROM
TableA A
INNER JOIN TableB B ON A.key = B.key

Example:

mysql> select * from tbl_emp a inner join tbl_dept b on a.deptId = b.id;
+----+--------------+------+--------+--------+----+----------+-----------+
| id | name         | age  | deptId | empno  | id | deptName | address   |
+----+--------------+------+--------+--------+----+----------+-----------+
|  1 | Breezy       |   90 |      1 | 100001 |  1 | Huashan Mountain     | Huashan Mountain      |
|  2 | yue buqun       |   50 |      1 | 100002 |  1 | Huashan Mountain     | Huashan Mountain      |
|  3 | linghu chong       |   24 |      1 | 100003 |  1 | Huashan Mountain     | Huashan Mountain      |
|  4 | master hongqi       |   70 |      2 | 100004 |  2 | a group of beggars     | Luoyang      |
|  5 | Qiao Feng         |   35 |      2 | 100005 |  2 | a group of beggars     | Luoyang      |
|  6 | Abbess extinction     |   70 |      3 | 100006 |  3 | Mount Emei     | Mount Emei    |
|  7 | Zhou Zhiruo       |   20 |      3 | 100007 |  3 | Mount Emei     | Mount Emei    |
|  8 | Zhang Sanfeng       |  100 |      4 | 100008 |  4 | Wudang     | Wudang Mountain    |
|  9 | zhang wuji       |   25 |      5 | 100009 |  5 | Ming religion     | Bright Summit    |
+----+--------------+------+--------+--------+----+----------+-----------+
9 rows in set (0.00 sec)

2.2.3 left connection:

LEFT JOIN keyword returns all rows from the left table (TableA), even if there is no match in the right table (TableB). If there is no match in the right table, the result is NULL

SELECT
<select_list>
FROM
TableA A
LEFT JOIN TableB B ON A.key = B.key

Include only the left table:

Unique to TableA but not in TableB

SELECT
<select_list>
FROM
TableA A
INNER JOIN TableB B ON A.key = B.key
WHERE
B.key IS NULL

Example:

# Left connection
mysql> select * from tbl_emp a left join tbl_dept b on a.deptId = b.id;
+----+--------------+------+--------+--------+------+----------+-----------+
| id | name         | age  | deptId | empno  | id   | deptName | address   |
+----+--------------+------+--------+--------+------+----------+-----------+
|  1 | Breezy       |   90 |      1 | 100001 |    1 | Huashan Mountain     | Huashan Mountain      |
|  2 | yue buqun       |   50 |      1 | 100002 |    1 | Huashan Mountain     | Huashan Mountain      |
|  3 | linghu chong       |   24 |      1 | 100003 |    1 | Huashan Mountain     | Huashan Mountain      |
|  4 | master hongqi       |   70 |      2 | 100004 |    2 | a group of beggars     | Luoyang      |
|  5 | Qiao Feng         |   35 |      2 | 100005 |    2 | a group of beggars     | Luoyang      |
|  6 | Abbess extinction     |   70 |      3 | 100006 |    3 | Mount Emei     | Mount Emei    |
|  7 | Zhou Zhiruo       |   20 |      3 | 100007 |    3 | Mount Emei     | Mount Emei    |
|  8 | Zhang Sanfeng       |  100 |      4 | 100008 |    4 | Wudang     | Wudang Mountain    |
|  9 | zhang wuji       |   25 |      5 | 100009 |    5 | Ming religion     | Bright Summit    |
| 10 | Wei Xiaobao       |   18 |   NULL | 100010 | NULL | NULL     | NULL      |
+----+--------------+------+--------+--------+------+----------+-----------+
10 rows in set (0.00 sec)
# Include left side only
mysql> select * from tbl_emp a left join tbl_dept b on a.deptId = b.id where b.id is null;
+----+-----------+------+--------+--------+------+----------+---------+
| id | name      | age  | deptId | empno  | id   | deptName | address |
+----+-----------+------+--------+--------+------+----------+---------+
| 10 | Wei Xiaobao    |   18 |   NULL | 100010 | NULL | NULL     | NULL    |
+----+-----------+------+--------+--------+------+----------+---------+
1 row in set (0.01 sec)

2.2.4 right connection:

RIGHT JOIN keyword returns all rows from the right table (table b), even if there is no match in the left table (table a). If there is no match in the left table, the result is NULL.

SELECT
<select_list>
FROM
TableA A
RIGHT JOIN TableB B ON A.key = B.key

Include only the right table:

TableB is unique to those not in TableA

SELECT
<select_list>
FROM
TableA A
INNER JOIN TableB B ON A.key = B.key
WHERE
A.key IS NULL

Example:

# Right connection
mysql> select * from tbl_emp a right join tbl_dept b on a.deptId = b.id;
+------+--------------+------+--------+--------+----+----------+-----------+
| id   | name         | age  | deptId | empno  | id | deptName | address   |
+------+--------------+------+--------+--------+----+----------+-----------+
|    1 | Breezy       |   90 |      1 | 100001 |  1 | Huashan Mountain     | Huashan Mountain      |
|    2 | yue buqun       |   50 |      1 | 100002 |  1 | Huashan Mountain     | Huashan Mountain      |
|    3 | linghu chong       |   24 |      1 | 100003 |  1 | Huashan Mountain     | Huashan Mountain      |
|    4 | master hongqi       |   70 |      2 | 100004 |  2 | a group of beggars     | Luoyang      |
|    5 | Qiao Feng         |   35 |      2 | 100005 |  2 | a group of beggars     | Luoyang      |
|    6 | Abbess extinction     |   70 |      3 | 100006 |  3 | Mount Emei     | Mount Emei    |
|    7 | Zhou Zhiruo       |   20 |      3 | 100007 |  3 | Mount Emei     | Mount Emei    |
|    8 | Zhang Sanfeng       |  100 |      4 | 100008 |  4 | Wudang     | Wudang Mountain    |
|    9 | zhang wuji       |   25 |      5 | 100009 |  5 | Ming religion     | Bright Summit    |
| NULL | NULL         | NULL |   NULL |   NULL |  6 | Shaolin Temple     | Shaolin Temple    |
+------+--------------+------+--------+--------+----+----------+-----------+
10 rows in set (0.00 sec)
# Include right table only
mysql> select * from tbl_emp a right join tbl_dept b on a.deptId = b.id where a.deptId is null;
+------+------+------+--------+-------+----+----------+-----------+
| id   | name | age  | deptId | empno | id | deptName | address   |
+------+------+------+--------+-------+----+----------+-----------+
| NULL | NULL | NULL |   NULL |  NULL |  6 | Shaolin Temple     | Shaolin Temple    |
+------+------+------+--------+-------+----+----------+-----------+
1 row in set (0.00 sec)

2.2.5 full connection:

FULL OUTER JOIN keyword returns rows as long as there is a match in one of the left table (table1) and the right table (table2). The FULL OUTER JOIN keyword combines the results of LEFT JOIN and RIGHT JOIN.

SELECT
<select_list>
FROM
TableA A
FULL OUTER JOIN TableB B ON A.key = B.key

Removal of public parts:

SELECT
<select_list>
FROM
TableA A
FULL OUTER JOIN TableB B ON A.key = B.key
WHERE
A.key IS NULL
OR
B.key IS NULL

Example:

# Full connection
mysql> select * from tbl_emp a left join tbl_dept b on a.deptId = b.id
    -> union
    -> select * from tbl_emp a right join tbl_dept b on a.deptId = b.id;
+------+--------------+------+--------+--------+------+----------+-----------+
| id   | name         | age  | deptId | empno  | id   | deptName | address   |
+------+--------------+------+--------+--------+------+----------+-----------+
|    1 | Breezy       |   90 |      1 | 100001 |    1 | Huashan Mountain     | Huashan Mountain      |
|    2 | yue buqun       |   50 |      1 | 100002 |    1 | Huashan Mountain     | Huashan Mountain      |
|    3 | linghu chong       |   24 |      1 | 100003 |    1 | Huashan Mountain     | Huashan Mountain      |
|    4 | master hongqi       |   70 |      2 | 100004 |    2 | a group of beggars     | Luoyang      |
|    5 | Qiao Feng         |   35 |      2 | 100005 |    2 | a group of beggars     | Luoyang      |
|    6 | Abbess extinction     |   70 |      3 | 100006 |    3 | Mount Emei     | Mount Emei    |
|    7 | Zhou Zhiruo       |   20 |      3 | 100007 |    3 | Mount Emei     | Mount Emei    |
|    8 | Zhang Sanfeng       |  100 |      4 | 100008 |    4 | Wudang     | Wudang Mountain    |
|    9 | zhang wuji       |   25 |      5 | 100009 |    5 | Ming religion     | Bright Summit    |
|   10 | Wei Xiaobao       |   18 |   NULL | 100010 | NULL | NULL     | NULL      |
| NULL | NULL         | NULL |   NULL |   NULL |    6 | Shaolin Temple     | Shaolin Temple    |
+------+--------------+------+--------+--------+------+----------+-----------+
11 rows in set (0.00 sec)
# Remove public parts
mysql> select * from tbl_emp a left join tbl_dept b on a.deptId = b.id where b.id is null
    -> union
    -> select * from tbl_emp a right join tbl_dept b on a.deptId = b.id where a.deptId is null;
+------+-----------+------+--------+--------+------+----------+-----------+
| id   | name      | age  | deptId | empno  | id   | deptName | address   |
+------+-----------+------+--------+--------+------+----------+-----------+
|   10 | Wei Xiaobao    |   18 |   NULL | 100010 | NULL | NULL     | NULL      |
| NULL | NULL      | NULL |   NULL |   NULL |    6 | Shaolin Temple     | Shaolin Temple    |
+------+-----------+------+--------+--------+------+----------+-----------+
2 rows in set (0.00 sec)

FULL OUTER JOIN is not supported in MySQL, so UNION is used to associate left and right connections to find all operations.

3. Introduction to index

3.1 general

MySQL's official definition of Index: Index is a data structure that helps MySQL obtain data efficiently. The essence of Index is data structure, which can be understood as a well ordered and fast search data structure.

The index will affect the search after WHERE and the sorting after ORDER BY

In addition to the data itself, the database also maintains a data structure that meets the specific search algorithm. These data structures point to the data in some way, so that the advanced search method can be realized on the basis of these data structures. This data structure is the index, that is, BTREE index.

Generally speaking, the index itself is also large, and it is impossible to store it all in memory. Therefore, the index file is usually stored on disk in the form of index file, that is, myi file.

It is better not to delete the data in the database. Set the mark position for logical deletion. The deleted data may also be used for data analysis. More importantly, frequent addition and deletion operations will cause index failure;

The index we usually talk about, if not specified, refers to the index organized by BTREE (multi-channel search tree, not necessarily binary tree) structure. The clustered index, secondary index, composite index, prefix index and unique index are all B + tree indexes by default, collectively referred to as indexes. In addition to the B + tree index structure, there are also indexes such as hash index.

Advantages of indexing:

  • Improve the data retrieval efficiency and reduce the IO copy of the database;
  • Sorting the data through the index column reduces the cost of data sorting and reduces the consumption of CPU;

Disadvantages of index:

  • In fact, the index is also a table, which saves the primary key and index fields and points to the records of the entity table, so the index column also takes up space;
  • Although the index greatly improves the query speed, it will reduce the UPDATE efficiency of the table, such as INSERT, UPDATE and DELETE operations on the table. Because when updating tables, MySQL needs to save not only data, but also index files. Every time you UPDATE a field with an index column added, you will adjust the index information after the key value changes caused by the UPDATE.
  • Indexing is only a factor to improve efficiency. If MySQL has a large amount of tables, establish excellent indexes or optimize query statements.

Index classification

3.2.1 single value index

An index contains only a single column, and a table can have multiple single column indexes.

Create command: CREATE INDEX idx_ Table name_ Field name: ON table name (field name);

3.2.2 unique index

Index column values must be unique, but null values are allowed. Such as: bank card number, ID number, etc.

Create command: CREATE UNIQUE INDEX idx_ Table name_ Field name: ON table name (field name);

3.3.3 composite index

An index contains multiple columns

Create command: CREATE INDEX idx_ Table name_ Field 1_ Field 2_ Field n ON table name (field 1, field 2,..., field n);

① Creating a composite index is better than a single value index, and establishing a single value index for frequently used fields;

② The index of a table shall not exceed 5 at most;

Only one index will be used at the same time;

3.3.4 primary key index

When set as the primary key, the database will automatically create an index, and InnoDB is a clustered index.

Create a separate primary key index: ALTER TABLE name ADD PRIMARY KEY table name (field name);

Delete primary key index: ALTER TABLE table name DROP PRIMARY KEY;

Modify primary key index: first delete the primary key index and then create a new primary key index.

3.3.5 basic syntax of index

3.3.5.1 create index

CEREATE [UNIQUE] INDEX indexName ON tableName(columnName(length))

ALTER tableName ADD [UNIQUE] INDEX [indexName] ON (columnName(length))

If the field is of char or varchar type, the length can be less than the actual length; If it is blob or text type, length must be specified

indexName is generally named idx_tableName_columnName

3.3.5.2 delete index

DROP INDEX [indexName] ON tableName

3.3.5.3 view index

SHOW INDEX FROM tableName

3.3.5.4 using the ALTER command

Adding a primary key means that the index value is unique and cannot be NULL

ALTER TABLE table_name ADD PRIMARY KEY (column_list)

To create a unique index, the index value must be unique, but it can be NULL or multiple times

ALTER TABLE table_name ADD UNIQUE index_name (column_list)

Add a normal index. The index value may appear multiple times

ALTER TABLE table_name INDEX index_name (column_list)

Specifies that the index is FULLTEXT, which is used for full-text indexing

ALTER TABLE table_name ADD FULLTEXT index_name (column_list)

3.3 index structure

3.3.1 B Tree index

Illustration:

  • Light blue -- disk block; Dark blue -- data item; Yellow -- pointer;
  • For example, disk block 1 includes data items 17 and 35 and pointers P1, P2 and P3;
  • P1 represents a disk block less than 17; P2 indicates a disk block greater than 17 and less than 35; P3 represents a disk block greater than 35;
  • Non leaf nodes do not store real data, but only store data items that guide the search direction. For example, data items 17 and 35 do not exist in the data table
  • The real data is stored in the leaf node, that is, between disks 5-11;

Searching process: such as searching data 60

① First, load disk 1 into the memory to generate an IO operation. Use binary search in the memory to determine that 60 is greater than 35 (the comparison time in the memory can be ignored soon), and lock the P3 pointer;

② Find disk 4 through P3 pointer, load disk 4 into memory, perform the second disk IO, use binary search in memory to determine that 60 is less than 65, and lock P1 pointer;

③ Find disk block 9 through P1 pointer and perform the third disk IO. At this time, the query ends.

In the actual MySQL query, you usually face millions of data. If you traverse in turn, you will carry out a large number of disk IO, which will seriously affect the performance. However, you can find the corresponding data items through three disk IO, which is a very significant improvement in performance.

3.3.2 B+ Tree index

Difference between B+ Tree and B- Tree:

① The keywords and records of B-tree are put together, and the leaf node can be regarded as an external node without any information; In the non leaf node of B+ Tree, there are only keywords and indexes pointing to the next node, and records are only placed in the leaf node.

② In B-tree, the closer to the root node, the faster the search time of the record. As long as the keyword is found, the existence of the record can be determined; The search time of each record in B+Tree is basically the same. You need to go from the root node to the leaf node, and you need to compare keywords in the leaf node. From this point of view, the performance of B-tree seems to be better than that of B+Tree, but in practical application, the performance of B+Tree is better. Because the non leaf nodes of B+Tree do not store actual data, so each node can hold more elements than B-tree and the tree height is smaller than B-tree. The advantage of this is to reduce the number of disk accesses. Although B+Tree takes more time to find a record than B-tree, the time of disk IO is equivalent to the time of hundreds of memory comparisons. Therefore, in practice, the performance of B+Tree may be better, and the leaf nodes of B+Tree are connected together with pointers to facilitate traversal (for example, viewing all files in a directory, all records in a table, etc.), It is also the reason why many databases and file systems use B+Tree.

The reason why B+ Tree is more suitable for file index and database index in operating system than B-tree:

① The disk cost of B+ Tree is less

The internal node of B+ Tree does not have a pointer to the specific information of the keyword. Therefore, its internal nodes are smaller than B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold, the more keywords to be searched to read into memory at one time, and the relative disk IO times will be reduced.

② The query efficiency of B+ Tree is more stable

Because the non endpoint is not the node that finally points to the file content, but the index of keywords in the leaf node. Therefore, the search of any keyword must go from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency of each data.

3.3.3 clustered and non clustered indexes

Clustered index is not a single index type, but a way of data storage. "Clustered" data rows are stored together with adjacent key value clusters. As shown in the following figure, the index on the left is the cluster index, because the arrangement of data rows on the disk is consistent with the index sorting.

Benefits of clustered indexing:

According to the order of cluster index, when querying and displaying a certain range of data, because the data are closely connected, the database does not need to extract data from multiple data blocks, saving a lot of IO operations.

Restrictions on clustered indexes:

At present, only InnoDB data storage engine supports clustered index in MySQL database, while Myisam does not support clustered index.

Because there can only be one arrangement of data physical storage, each MySQL table can only have one cluster index, which is generally the primary key of the table.

In order to make full use of the characteristics of cluster index, the primary key column of InnoDB table should choose sequential i instead of unordered id, such as uuid.

Other index types:

  • Hash index
  • Full text index
  • R-Tree index

3.4 timing of indexing

3.4.1 need to establish index

① The primary key automatically creates a unique index;

② Fields frequently used as query criteria should be indexed;

③ For the sorted fields in the query, if the sorted fields are accessed through the index, the sorting speed will be greatly improved;

④ Statistics or grouped fields in query;

Create a single column index or a composite index? Composite indexes tend to be created in high concurrency scenarios

3.4.2 no index is required

① Too few table records (MySQL has enough performance to support about 3 million data)

② Do not create an index for fields that are not used in the where condition;

③ Frequently added and deleted tables or frequently updated fields are not suitable for index creation;

④ Table fields with repeated data and evenly distributed;

When updating the field, not only the record but also the index will be updated, which will increase the IO burden.

Note: if a data column contains many duplicate contents, creating an index for this purpose will not have much practical effect.

Index selectivity: the ratio of the number of different values in the index column to the number of records in the table. For example, if there are 2000 records in a table and the index of the table has 1980 different values, the selectivity of the index is 1980 / 2000 = 0.99. The closer the selectivity of the index is to 1, the higher the efficiency of the index.

Topics: MySQL