Step by step! The evolution of MySQL performance optimization steps is taught by Tencent T2

Posted by mezise on Sat, 25 Dec 2021 03:13:55 +0100

Compact row overflow:

Dynamic row overflow:

Under what circumstances will overflow
 Principle: as long as the sum of one line of records exceeds 8 k,Will overflow.

Variable field types include blob,text,varchar

Index optimization

The main function of index is to find and sort. Index optimization is the most basic optimization means, and it is also the basic knowledge that programmers must master. It is the focus of this paper

Index classification

Database version: MySQL 8.0.17
CREATE TABLE `indexs` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'primary key ',
  `unique_index` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '' COMMENT 'unique index',
  `normal_index` int(11) NOT NULL DEFAULT '0' COMMENT 'General index',
  `union_index_a` int(11) NOT NULL DEFAULT '0' COMMENT 'Joint index a',
  `union_index_b` int(11) NOT NULL DEFAULT '0' COMMENT 'Joint index b',
  `union_index_c` int(11) NOT NULL DEFAULT '0' COMMENT 'Joint index c',
  `prefix_index` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '' COMMENT 'Prefix index',
  `fulltext_index` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '' COMMENT 'Full text index',
  `func_index` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Functional index',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_index` (`unique_index`),
  KEY `normal_index` (`normal_index`),
  KEY `prefix_index` (`prefix_index`(10)),
  KEY `union_index_abc_desc` (`union_index_a`,`union_index_b`,`union_index_c` DESC),
  KEY `func_index` ((date(`func_index`))),
  FULLTEXT KEY `fulltext_index` (`fulltext_index`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Spatial index

CREATE TABLE nodes (  
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'primary key ',
    `geom` geometry NOT NULL,
    `geohash` VARCHAR(10) AS (st_geohash(geom, 6)) VIRTUAL,
    PRIMARY KEY (`id`),
    SPATIAL KEY idx_nodes_geom (geom),
    key idx_nodes_geohash(geohash)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

n := time.Now().Nanosecond()
longitude := fmt.Sprintf("116.%d", n)
latitude := fmt.Sprintf("39.%d", n)
sql := fmt.Sprintf("insert into nodes (geom) value(ST_GeometryFromText('POINT(%s %s)'))", longitude, latitude)

SELECT id, geohash, ST_Distance_Sphere(Point(116.4909484, 39.4909484), geom) as distance, ST_AsText(geom) geom FROM nodes WHERE geohash IN ('wtmknk','wtmkn6','wtmkne','wtmkn5','wtmknh','wtmkns','wtmknd','wtmkn4','wtmkn7') ORDER BY distance LIMIT 10;

LBS Most applications are based on MongoDB database MongoDB Built in geoindex It's very easy to use, plus the slicing feature, which is very suitable for LBS Such an application is simple for business MySQL Can use( todo)

Functional index

descending index

Unexpectedly, in the application scenario, the time field sorting is in reverse order in most business scenarios

Leftmost matching principle of joint index:

Thinking: why does this happen? (there are problems in the figure below, which have not been corrected)

a: 234455555567
b: 644566667766
c: 919647890619
 Prerequisite for using index: orderly data
a Order can be used a,b Order can be used b,c Order can be used c,order by and group by The premise of using index is that the data is orderly
a Always in order, so it can always be used a
a = 5, b = 666677 b Ordered, so you can use b
a = 5 and b = 6, c = 4789,c Ordered, so you can use c
a > 4, b = 66667766, b Unordered, so it cannot be used
 From the index search process, a Uncertain b = 6,Look at the figure above. Which branch does node 568 go to? There are also six on the left and six on the right

What is the need to create an index

1,Primary key automatically creates a primary key index
2,Fields frequently used as query criteria
3,Fields associated with other tables when querying
4,Fields to be sorted when querying (accessing indexed fields will greatly improve the sorting speed)
5,Fields counted or grouped during query (essentially, the grouping process is sorting)

Under what circumstances do you not create an index

1,Too few table records (less than 1000)
2,Frequently updated fields are not suitable for fields (the index improves the query speed, but also reduces the update speed. The update operation not only updates the records, but also updates the index)
3,where Fields not used by conditions
4,Fields where data is repeated and evenly distributed, such as gender

Several cases of index failure

1,Calculation, function and type conversion (including implicit conversion) will cause index invalidation
 Analysis: the field type is a string, and the absence of quotation marks leads to implicit type conversion; Function: left(), right(),substring(),substring_index(),mid(),substr()
2,The union index does not meet the leftmost matching principle, resulting in partial invalidation of the index (explained in detail below)
3,like Query is based on%Leading to index invalidation
4,Query criteria include!=,<>,or,May cause index invalidation
 analysis: where a != 1 The optimizer thinks that the query result probability is most of the data in the table. It's better to go back to the table when going through the index than to scan the whole table directly
5,Use on index fields is null, is not null,May cause index invalidation
 analysis: null The value does not appear on the leaf node of the index tree like other values
6,If the optimizer estimates that using a full table scan is faster than using an index, the index is not used
 Analysis: when the table index is queried, the best index will be used unless the optimizer uses full table scanning more effectively. The optimizer optimizes the full table scan depending on whether the data found using the best index exceeds 30% of the table%Data.

Open the slow query log and record the slow query sql

How slow is considered to be slow. In most business scenarios, range range range queries cannot be avoided. The optimization of the index also reaches the range level, which is a balance point. Therefore, according to the index query time above, the slow query time is a reasonable value:

vi /etc/my.cnf.d/mysql-server.cnf
 stay[mysqld] Add the following configuration under the project
slow_query_log = 1
slow-query-log-file = /var/lib/mysql/mysql_slow.log
long_query_time = 0.01
log-queries-not-using-indexes = true

systemctl restart mysqld

Of the 10 items with the slowest execution time sql: mysqldumpslow -s t -t 10 /var/lib/mysql/mysql_slow.log

How to optimize slow sql?
EXPLAIN: analyze the usage of sql indexes. See EXPLAIN for details
show profiles: SET profiling = 1; Analyze the sql execution process (not much used in actual work)

COUNT(*) query

COUNT(constant) and COUNT(*)It refers to the number of rows of qualified database tables directly queried, which will be counted NULL Line of
COUNT(Listing)Indicates that the value of the qualified column of the query is not NULL Number of rows

COUNT(*)Optimization: select the smallest secondary index for optimization count If there is no secondary index, cluster index will be selected

Optimize LIMIT paging

LIMIT M,N Performance problems: full table scanning, Limit The principle is from the result set M Take out at N Bar output,Others abandoned
1,Fast paging with secondary index
ids = select id from articles order by created_at desc limit 10000,10
select * from articles where id in (ids)
Disadvantages: LIMIT M,N The problem already exists, only slowing down, 10 million data 1-2 second
2,Using the beginning and end records of data, there is a change from full index scanning to range scanning
15 Article 21 2020-01-01 19:00:00
12 Article 20 2020-01-01 18:00:00
11 Article 18 2020-01-01 16:00:00

9  Article 17 2020-01-01 16:00:00
8  Article 15 2020-01-01 15:00:00
6  Article 14 2020-01-01 14:00:00

4  Article 13 2020-01-01 14:00:00
3  Article 12 2020-01-01 13:00:00
1  Article 11 2020-01-01 00:00:00
 Note: 11, 9 and 6, 4 are the same

first page: select * from articles order by created_at desc, id desc limit 3
 Page 2: select * from articles where created_at <= '2020-01-01 16:00:00' and (created_at < '2020-01-01 16:00:00' or id < 11) order by created_at desc, id desc limit 3

previous page: select * from articles where created_at >= '2020-01-01 16:00:00' and (created_at > '2020-01-01 16:00:00' or id > 9) order by created_at desc, id desc limit 3
 Current second page: select * from articles where created_at <= '2020-01-01 16:00:00' and (created_at < '2020-01-01 16:00:00' or id < 11) order by created_at desc, id desc limit 3
 Next page: select * from articles where created_at <= '2020-01-01 14:00:00' and (created_at < '2020-01-01 14:00:00' or id < 6) order by created_at desc, id desc limit 3

Disadvantages: only the previous page and the next page, no page number

What is index selectivity?

Index selectivity( Selectivity),It refers to index values that are not repeated (also known as cardinality, Cardinality)And table records(#T) Ratio of. Selectivity is an indicator of index filtering ability. The value range of the index is 0-1. When the selectivity is greater and closer to 1, the index value will be greater.
Index selectivity( Index Selectivity)= Cardinality( Cardinality)/ Total number of rows(#T)
SQL = SELECT COUNT(DISTINCT(field))/COUNT(*) AS Selectivity FROM Table name;

Index selectivity and prefix index

select count(distinct left(prefix_index, 1))/count(*) as sel1, count(distinct left(prefix_index, 2))/count(*) as sel2, count(distinct left(prefix_index, 3))/count(*) as sel3, count(distinct left(prefix_index, 4))/count(*) as sel4 from indexs

Single column index VS composite index (composite index with high concurrency tendency)
MySQL can only use one index when executing a query. If there are three single column indexes, MySQL will try to select the most restrictive index. Even the most restrictive single column index, its restrictive ability must be much lower than the multi column index on these three columns.

Index push down technology:

No index push down query process:
The storage engine layer matches in the index union_index_a=4366964 Find a piece of data in the table, and then go back to the table to check all the data recorded in this row, and then return it to server Layer judgment where Meet the conditions union_index_c=1562544 Then filter out the data;
Index push down query process:
The storage engine layer matches in the index union_index_a=4366964 Find out a piece of data and judge whether the data is consistent union_index_c=1562544 If the condition is not met, filter it, and then go back to the table to query all the data recorded in this row, and then return it to server Layer judgment where Conditions;
Index push down technology: reduce the number of table returns, reduce the storage engine layer and server Data transmission of layer

Code optimization

SQL precompiled

1,immediate SQL
 One SQL stay DB After receiving the final execution return, the general process is as follows:
1\. Lexical and semantic analysis;
2\. optimization SQL Statement and formulate execution plan;
3\. Execute and return results;
As above, one SQL Directly follow the process, compile once and run once. Such ordinary statements are called Immediate Statements (immediate SQL). 
2,precompile SQL
 However, in most cases, one SQL The statement may be called and executed repeatedly, or only individual values are different each time (for example select of where Different clause values, update of set Different clause values, insert of values Different values). Each time, you need to go through the above word meaning analysis, sentence optimization, formulation of execution plan, etc SQL Throughout the execution of the statement, Optimizer Is the most time-consuming.
The so-called precompiled sentence is to translate this kind of sentence SQL The value in the statement is replaced with a placeholder, which can be regarded as SQL Statement templating or parameterization is generally called this kind of statement Prepared Statements. 

The advantages of precompiled statements are summarized as follows: one compilation and multiple runs, eliminating the process of parsing and optimization.

In addition, precompiled statements prevent SQL injection (think about the reason)

Most programming languages support precompiling, which can improve database execution efficiency through SQL precompiling

Use ORM framework to avoid associated queries and standardize API interfaces

Table splitting optimization

MySQL has the best efficiency when the capacity of a single table is 10 million. If it exceeds 10 million, the table needs to be disassembled

Horizontal dismantling table

The data of a single table can be maintained at a certain level, which is helpful to improve the performance
 The table structure of segmentation is the same, and the transformation of application layer is less. Only routing rules need to be added

Question: how many sheets are appropriate to open? 10, 50, 100?
Analysis: it mainly depends on the business growth, such as user order table, 100000 orders a day, and 10 million data in 100 days. Four tables can support one year, eight tables can support two years, and so on (the number of sub tables remains a multiple of 4, and the reasons are explained below)

Hot and cold data separation
Split the hot data to reduce the amount of data in the hot data table and ensure the read-write performance of the hot data. The cold data has relatively few accesses and can be optimized in business. As shown in the figure below, JD splits the historical order data according to the time dimension

Split Vertically
Page overflow in the case of many fields, splitting small tables through large tables is more convenient for development and maintenance, and can also avoid cross page problems. The bottom layer of MYSQL is stored through data pages. If a record occupies too much space, it will lead to cross page and cause additional overhead.

Split by:
1,Try to put the attributes with short length and high access frequency in one table, which is called the main table
2,Try to put the attributes with long fields and low access frequency in one table, which is called extended table
 If both 1 and 2 are satisfied, the third point can also be considered:
(3)Properties that are frequently accessed together can also be placed in a table

Host optimization

Mainly for my Optimization of CNF configuration

Mainly dba Professional work and professional things are left to professional people. List more important and understandable parameters


### last

As the saying goes, good scholars learn from the pond, but in the Internet age, for most of us, I think the awakening of learning consciousness is very important. This is the turning point to start learning. For example, when you see useful information for your own direction, collect one wave after another. For example, if you think my article is ok,Like the collection first. In this way, when you really sink down to study, you won't be distracted by looking for information. Take your time, start with the praise collection, come on!

> **Well, due to the length limitation of the article, I won't show the detailed answers to the interview questions here,[If you need this full version of interview question answer detailed information, click here to get it for free](https://gitee.com/vip204888/java-p7)**

In addition, we have arranged a wave of learning interview materials:

![image](https://img-blog.csdnimg.cn/img_convert/dfed946b0ad01ac159b0c42e9b4d96d7.png)

![image](https://img-blog.csdnimg.cn/img_convert/7146069726333cd0031603988742a83f.png)

The turning point of learning, for example, when you see useful information for your own direction, collect one wave after another. For example, if you think my article ok,Like the collection first. In this way, when you really sink down to study, you won't be distracted by looking for information. Take your time, start with the praise collection, come on!

> **Well, due to the length limitation of the article, I won't show the detailed answers to the interview questions here,[If you need this full version of interview question answer detailed information, click here to get it for free](https://gitee.com/vip204888/java-p7)**

In addition, we have arranged a wave of learning interview materials:

[External chain picture transfer...(img-rpxxelIl-1628622546409)]

[External chain picture transfer...(img-bDLF1Q14-1628622546410)]

The above is the whole content of this article. I hope it will be helpful to your interview. I wish you a promotion and a raise as soon as possible, marry Bai Fumei and go to the peak of your life!

Topics: Java Back-end Interview Programmer