mysql optimization, the interview is no longer afraid

Posted by triponline on Mon, 27 Dec 2021 09:00:03 +0100

catalogue

catalogue

1, System configuration optimization

1. Increase mysql memory

2. Reduce the number of disk writes

Extracurricular knowledge

1. mysql data storage procedure:

2. What are redo log and undo log?

3. Data preheating

3.1. Data preheating script

3.2. Execute the command:

3.3 execute the command when data preheating is required

2, Table structure design optimization

1. Design intermediate table

2. Design redundant fields.

3. Remove the meter

4. Field design

3, Index optimization, Optimization Practice of 10 million data tables

1. Be good at using explain to view the execution plan of problematic SQL, focusing on the index usage

2. IN SQL statement should not contain too many values

3. The select statement must indicate the field name

4. When only one piece of data is needed, use limit 1

5. Sorting field

6. Index shall be added to the where condition

7. If or is used, the conditions on both sides of or need to be indexed

8. Try to use union all instead of union

9. ORDER BY RAND() is not used

10. Distinguish between in and exists, not in and not exists

11. Use reasonable paging mode to improve paging efficiency

12. Segmented query

13. It is not recommended to use% prefix fuzzy query

14. Avoid expression operations on fields in the where clause

15. For the joint index, the leftmost prefix rule should be observed

16. If necessary, force index can be used to force the query to go to an index

17. Pay attention to the range query statement

18. Try to use inner join and avoid left join:

4, Complex sql optimization practice

5, Ali index protocol

6, Alibaba SQL statement specification

Hello everyone, I'm magic smile. I'm here again with my knowledge sharing. The following is my summary of sql optimization. If you need to interview or improve, you can have a look. Below, I created a table of 10000000 data, which is illustrated and illustrated. The effect of sql optimization is very obvious with such a large amount of data. Hope to give a quality third company. A good man lives safely

1, System configuration optimization

1. Increase mysql memory

  • MySQL will save certain data in memory, save infrequently accessed data in hard disk file through LRU algorithm, and modify the file my InnoDB in CNF_ buffer_ pool_ Size to increase the memory, which can theoretically be expanded to 3 / 4 or 4 / 5 of the memory.

2. Reduce the number of disk writes

2.1. Increase redolog and reduce the times of disc falling.

  • In my InnoDB in CNF_ log_ file_ The log file size is set to 0.25 * InnoDB_ buffer_ pool_ Size (memory).

2.2. Write redolog strategy

  • innodb_flush_log_at_trx_commit is set to 0 or 2.
  • innodb_ flush_ log_ at_ trx_ The commit parameter controls the log refresh behavior. The default value is 1

Before the transaction data is written to the redolog log file, it is written to the log buffer. The log buffer is then refreshed to the disk log file according to the policy set below, so increasing the log buffer can save disk I/O.

0: write log files and flush disk files every 1 second (write log files logbuffer -- > OS cache, flush disk OS cache -- > disk files), and lose data for up to 1 second
1: When the transaction is committed, the log file is written and the disk is flushed immediately. Data is not lost, but IO operations are frequent
2: When the transaction is committed, write the log file immediately and brush the disk every 1 second

Extracurricular knowledge

1. mysql data storage procedure:

  • Start a transaction, and the data is stored in the bufferPool before being written to the Database disk. The data is stored in the form of page. Whenever new page data is read into the Buffer Pool, the InnoDb engine will judge whether there are free pages and whether they are sufficient. If yes, delete the free page from the free list and put it into the LRU list. If there are no free pages, the default pages of the LRU linked list will be eliminated according to the LRU algorithm, and the memory space will be released and allocated to new pages. Before the pages in the Buffer Pool are flushed to the disk, these log information will be written to the log file (Redo/Undo). When writing two log files, they are written to the LogBuffer first, The LogBuffer writes the (Redo/Undo) log file. If the dirty pages in the Buffer Pool are not flushed successfully, the database hangs. After the database is started again, it can be recovered through the Redo log to ensure that the data written by the dirty pages will not be lost. If the dirty pages are refreshed successfully, the database hangs, which needs to be realized through Undo.

2. What are redo log and undo log?

  • Redo log: refers to the location where any data modified in the transaction is backed up and stored (redo log), which is called redo log. When the dirty pages in the bufferPool are brushed into the disk, the redo log is useless and will be overwritten
  • Undo Log: before the database transaction starts, the records to be modified will be stored in the Undo Log. When the transaction is rolled back or the database crashes, the Undo Log can be used to undo the impact of uncommitted transactions on the database

3. Data preheating

  • By default, a piece of data is cached in InnoDB only after it is read once_ buffer_ pool. Therefore, when the database has just started, it is necessary to preheat the data and cache all the data on the disk into memory. Data preheating can improve the reading speed.

3.1. Data preheating script

SELECT DISTINCT
 CONCAT('SELECT ',ndxcollist,' FROM ',db,'.',tb,
  ' ORDER BY ',ndxcollist,';') SelectQueryToLoadCache
  FROM
 (
    SELECT
     engine,table_schema db,table_name tb,
     index_name,GROUP_CONCAT(column_name ORDER BY seq_in_index)
ndxcollist
    FROM
   (
      SELECT
       B.engine,A.table_schema,A.table_name,
       A.index_name,A.column_name,A.seq_in_index
      FROM
       information_schema.statistics A INNER JOIN
       (
          SELECT engine,table_schema,table_name
          FROM information_schema.tables WHERE
          engine='InnoDB'
       ) B USING (table_schema,table_name)
      WHERE B.table_schema NOT IN ('information_schema','mysql')
      ORDER BY table_schema,table_name,index_name,seq_in_index
   ) A
    GROUP BY table_schema,table_name,index_name
 ) AA
ORDER BY db,tb;

3.2. Execute the command:

mysql -uroot -proot -AN < /root/loadtomem.sql > /root/loadtomem.sql

3.3 execute the command when data preheating is required

mysql -uroot < /root/loadtomem.sql > /dev/null 2>&1

2, Table structure design optimization

1. Design intermediate table

When we count the data of some tables, we can create a new table to store the data you want to display. We can regularly query these data every night and put them in this table, so that we can avoid waiting for query.

2. Design redundant fields.

To avoid associative queries, we can create reasonably redundant fields

3. Remove the meter

If there are too many fields in a large table, we can consider splitting the table, putting the commonly used fields in one table, the infrequently used fields in one table, and the text fields in a separate table, which are associated with the primary key.

4. Field design

  • Try to set the field to not NULL, so that the database does not have to compare NULL values when executing queries in the future
  • Try to set the field type to numeric type or ENUM type, because numeric data is processed much faster than text type. ENUM type is also treated as a numeric type.

3, Index optimization, Optimization Practice of 10 million data tables

Design a table. The following examples take this table as an example:

create table tbiguser(
 id int primary key auto_increment,
 nickname varchar(255),
 loginname varchar(255),
 age int ,
 sex char(1),
 status int,
 address varchar(255)
);

Insert the table into 10000000 pieces of data, so that the query and optimization time is obvious. The following is the stored procedure

CREATE PROCEDURE test_insert()
BEGIN DECLARE i INT DEFAULT 1;
WHILE i<=10000000
DO
insert into tbiguser
VALUES(null,concat('zy',i),concat('zhaoyun',i),23,'1',1,'beijing'); SET i=i+1;
END WHILE ;
commit;
END;

1. Be good at using explain to view the execution plan of problematic SQL, focusing on the index usage

mysql> explain select * from tbiguser where loginname='zhaoyun1' and
    -> nickname='zy1' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: ref
possible_keys: loginname1
          key: loginname1
      key_len: 768
          ref: const
         rows: 2
        Extra: Using where
1 row in set (0.01 sec)
  • Type column, connection type. A good SQL statement should at least reach the range level. Eliminate all levels. index
  • key column, the index name used. If no index is selected, the value is NULL. Forced indexing can be adopted.
  • key_len column, index length.
  • Rows column, number of rows scanned. This value is an estimate.
  • extra column, detailed description. Note that common unfriendly values are as follows: Using filesort, Using temporary.
  • Common indexes: where field, combined index (leftmost prefix), index push down (non selected rows are not locked), overwrite index (no return to the table), on both sides, sorting and grouping statistics

2. IN SQL statement should not contain too many values

mysql will put the values contained in in the array and sort the array. If the number is too large, it will lead to too much consumption

3. The select statement must indicate the field name

  • SELECT * increases a lot of unnecessary consumption (CPU, IO, memory, network bandwidth);
  • Reduce the possibility of using overlay index;

4. When only one piece of data is needed, use limit 1

  • limit can stop full table scanning

As follows, the table of 100 million data is displayed in seconds

mysql> select * from tbiguser limit 1 \G;
*************************** 1. row ***************************
       id: 1
 nickname: zy1
loginname: zhaoyun1
      age: 23
      sex: 1
   status: 1
  address: beijing
1 row in set (0.00 sec)

5. Sorting field

Not indexed as follows:

  • Extra is Using filesort. The query results need additional sorting. The small amount of data is in memory, and the large amount is on disk
  • type is All level, full table scanning.
mysql> explain select loginname from tbiguser order by loginname \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10953837
        Extra: Using filesort
1 row in set (0.00 sec)

Add the index as follows:

  • using index: indicates that the query needs to pass the index, and the index can meet the required data
  • Index: refers to index based full table scanning. First scan the index and then scan the full table data
//Create index
mysql> create index loginname on tbiguser(loginname);

mysql> explain select loginname from tbiguser order by loginname \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: index
possible_keys: NULL
          key: loginname1
      key_len: 768
          ref: NULL
         rows: 11162611
        Extra: Using index
1 row in set (0.00 sec)

The time difference between the two queries is also very large. Without index, my direct memory is insufficient. If index is added, it will take 12 seconds. You can go down and try it yourself.

6. Index shall be added to the where condition

Unindexed query:

mysql> select nickname from tbiguser where nickname='zy1000000' \G;
*************************** 1. row ***************************
nickname: zy1000000
*************************** 2. row ***************************
nickname: zy1000000
2 rows in set (9.20 sec)

The query time without index is 9.20 seconds, which is quite long

Indexed query:

mysql> create index nickname1 on  tbiguser(nickname);
Query OK, 0 rows affected (57.28 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select nickname from tbiguser where nickname='zy1000000' \G;
*************************** 1. row ***************************
nickname: zy1000000
*************************** 2. row ***************************
nickname: zy1000000
2 rows in set (0.00 sec)

The data is out of seconds.

Take a look at the explain execution plan below. Its type type is ref level, which means that the non unique index is used for single value query. This is the index with the best effect

mysql> explain select nickname from tbiguser where nickname='zy8888888' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: ref
possible_keys: nickname1
          key: nickname1
      key_len: 768
          ref: const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

7. If or is used, the conditions on both sides of or need to be indexed

If one of the fields on both sides of or is not an index field, the query will not go through the index

As follows, we have indexed nickName and loginname respectively. As follows, we use or to query.

mysql> select * from tbiguser where nickname='zy8888888' or loginname='zy7777777';
+----------+-----------+----------------+------+------+--------+---------+
| id       | nickname  | loginname      | age  | sex  | status | address |
+----------+-----------+----------------+------+------+--------+---------+
| 10888888 | zy8888888 | zhaoyun8888888 |   23 | 1    |      1 | beijing |
+----------+-----------+----------------+------+------+--------+---------+
1 row in set (0.01 sec)

Basically reached the second out.

Let's delete the index of nickname. What is the effect?

mysql> select * from tbiguser where nickname='zy8888888' or loginname='zy7777777';
+----------+-----------+----------------+------+------+--------+---------+
| id       | nickname  | loginname      | age  | sex  | status | address |
+----------+-----------+----------------+------+------+--------+---------+
| 10888888 | zy8888888 | zhaoyun8888888 |   23 | 1    |      1 | beijing |
+----------+-----------+----------------+------+------+--------+---------+
1 row in set (10.64 sec)

The index failed, and the query time reached 10.64, which was quite a long time.

8. Try to use union all instead of union

  • union needs to merge the result sets before uniqueness filtering, which will involve sorting, increase a lot of CPU operations, and increase resource consumption and delay
  • The precondition of union all is that there is no duplicate data between the two result sets

Add a new table:

CREATE USER 
(id INT PRIMARY KEY  AUTO_INCREMENT,
 NAME VARCHAR(10),
 age TINYINT UNSIGNED)
//I measured it on navicat and it took 14 seconds
SELECT loginname FROM tbiguser  UNION  SELECT NAME FROM USER;
//I measured this. It took 4 minutes
SELECT loginname FROM tbiguser  UNION all  SELECT NAME FROM USER;

9. ORDER BY RAND() is not used

  • ORDER BY RAND() does not take the index

For example:

mysql> select * from tbiguser order by rand() limit 5;
+----------+-----------+----------------+------+------+--------+---------+
| id       | nickname  | loginname      | age  | sex  | status | address |
+----------+-----------+----------------+------+------+--------+---------+
|  7867520 | zy5867520 | zhaoyun5867520 |   23 | 1    |      1 | beijing |
|  5388248 | zy961139  | zhaoyun961139  |   23 | 1    |      1 | beijing |
| 11173221 | zy9173221 | zhaoyun9173221 |   23 | 1    |      1 | beijing |
| 10938428 | zy8938428 | zhaoyun8938428 |   23 | 1    |      1 | beijing |
|  1849958 | zy1849958 | zhaoyun1849958 |   23 | 1    |      1 | beijing |
+----------+-----------+----------------+------+------+--------+---------+
5 rows in set (51.51 sec)

Time, 51 seconds.

We can change it to the following:

mysql> select * from tbiguser t1 join (select rand()*(select max(id) from
    -> tbiguser) nid ) t2 on t1.id>t2.nid  limit 5;
+----------+-----------+----------------+------+------+--------+---------+--------------------+
| id       | nickname  | loginname      | age  | sex  | status | address | nid                |
+----------+-----------+----------------+------+------+--------+---------+--------------------+
| 10834406 | zy8834406 | zhaoyun8834406 |   23 | 1    |      1 | beijing | 10834405.771302437 |
| 10834407 | zy8834407 | zhaoyun8834407 |   23 | 1    |      1 | beijing | 10834405.771302437 |
| 10834408 | zy8834408 | zhaoyun8834408 |   23 | 1    |      1 | beijing | 10834405.771302437 |
| 10834409 | zy8834409 | zhaoyun8834409 |   23 | 1    |      1 | beijing | 10834405.771302437 |
| 10834410 | zy8834410 | zhaoyun8834410 |   23 | 1    |      1 | beijing | 10834405.771302437 |
+----------+-----------+----------------+------+------+--------+---------+--------------------+
5 rows in set (0.00 sec)

Second out

10. Distinguish between in and exists, not in and not exists

  • The difference between in and exists is mainly caused by the change of driving sequence (which is the key to performance change)
  • Exists, which takes the outer table as the driving table and is accessed first. Exists is suitable for the case where the outer table is small and the inner table is large

exists: specify a subquery to detect the existence of rows. Traverse the loop surface, and then see if the records in the surface are the same as the data in the inner table. When matching, put the results into the result set.

  • IN, execute the sub query first. Therefore, IN is suitable for the case of large appearance and small inner appearance

In: determines whether the given value matches the value in the subquery or list. In when querying, first query the sub query table, then make a Cartesian product of the inner table and the outer table, and then filter according to the conditions. Therefore, when the internal table is relatively small, the speed of in is faster

Here, to test, I inserted a name in the user table, and the name value is stored in a piece of data in the tbiguser table:

INSERT INTO USER VALUES(9,'zy8888888',88);

For example, tbiguser is a large table, user is a small table, nickname and name have no indexes, in (inside is a large table):

mysql> SELECT NAME FROM USER WHERE NAME IN (SELECT nickname FROM tbiguser);
+-----------+
| NAME      |
+-----------+
| zy8888888 |
+-----------+
1 row in set (17.37 sec)

It took 17 seconds

This is in.

mysql> SELECT nickname FROM tbiguser WHERE nickname IN (SELECT NAME FROM USER);
+-----------+
| nickname  |
+-----------+
| zy8888888 |
+-----------+
1 row in set (32.28 sec)

It took 32 seconds. The result is a little inconsistent with the theory. My guess may be related to my database version. If there is a big man who knows, help answer it

Possible: my version is mysql5 5. Compare two sql queries. The first is to use the appearance query results first, and then compare them with the sub query results.

11. Use reasonable paging mode to improve paging efficiency

  • Use limit m for paging, and keep m as small as possible for n

Example:

mysql> select * from tbiguser limit 9999998, 2;
+----------+-----------+----------------+------+------+--------+---------+
| id       | nickname  | loginname      | age  | sex  | status | address |
+----------+-----------+----------------+------+------+--------+---------+
|  9999999 | zy7999999 | zhaoyun7999999 |   23 | 1    |      1 | beijing |
| 10000000 | zy8000000 | zhaoyun8000000 |   23 | 1    |      1 | beijing |
+----------+-----------+----------------+------+------+--------+---------+
2 rows in set (4.19 sec)
mysql> explain select * from tbiguser limit 9999998, 2 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 12019153
        Extra:
1 row in set (0.00 sec)

This took 4.19 seconds. Look at the explain execution plan. The type is all. There is no index. rows: 12019153. All the data is scanned. Here, we can use the id index to optimize the sql

Example:

mysql> select * from tbiguser where id>9999998 limit 2;
+----------+-----------+----------------+------+------+--------+---------+
| id       | nickname  | loginname      | age  | sex  | status | address |
+----------+-----------+----------------+------+------+--------+---------+
|  9999999 | zy7999999 | zhaoyun7999999 |   23 | 1    |      1 | beijing |
| 10000000 | zy8000000 | zhaoyun8000000 |   23 | 1    |      1 | beijing |
+----------+-----------+----------------+------+------+--------+---------+
2 rows in set (0.00 sec)
mysql> explain select * from tbiguser where id>9999998 limit 2 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3973262
        Extra: Using where
1 row in set (0.00 sec)

Data reached seconds out. The type here is range, and the index is used. rows: 3973262

12. Segmented query

  • In some user selection pages, some users may select too large a range, resulting in slow query. The main reason is that there are too many scan lines. At this time, you can query by program, segment, cycle, and merge the results for display

13. It is not recommended to use% prefix fuzzy query

  • LIKE "% name" or LIKE "% name%", this kind of query will lead to index invalidation and full table scanning. However, you can use LIKE "name%".

Example: the loginname column is indexed;

mysql> select * from tbiguser where loginname like '%zhaoyun7999999%';
+---------+-----------+----------------+------+------+--------+---------+
| id      | nickname  | loginname      | age  | sex  | status | address |
+---------+-----------+----------------+------+------+--------+---------+
| 9999999 | zy7999999 | zhaoyun7999999 |   23 | 1    |      1 | beijing |
+---------+-----------+----------------+------+------+--------+---------+
1 row in set (6.67 sec)

The query time is 6.67 seconds. Let's use explain to check:

mysql> explain select loginname from tbiguser where loginname like '%zhaoyun7999999%' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: index
possible_keys: NULL
          key: loginname1
      key_len: 768
          ref: NULL
         rows: 12019153
        Extra: Using where; Using index
1 row in set (0.00 sec)

The type is index (which means full table scanning based on index. First scan the index and then scan the full table data). The number of scanning lines is 12019153

For example, remove the% sign from the prefix to see what effect it will achieve:
 

mysql> select loginname from tbiguser where loginname like 'zhaoyun7999999%';
+----------------+
| loginname      |
+----------------+
| zhaoyun7999999 |
+----------------+
1 row in set (0.00 sec)

Seconds later, let's take a look at explain's check:

mysql> explain select loginname from tbiguser where loginname like 'zhaoyun7999999%' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: range
possible_keys: loginname1
          key: loginname1
      key_len: 768
          ref: NULL
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

type is range level, which means that the index range query is used, and the number of query entries is also 1. Therefore, do not use% prefix, which will invalidate the index.

14. Avoid expression operations on fields in the where clause

  • Arithmetic operations are performed on fields in the where conditional statement, which will cause the engine to abandon the use of indexes

Example: id is the primary key index

mysql> select * from tbiguser where id*2=4;
+----+----------+-----------+------+------+--------+---------+
| id | nickname | loginname | age  | sex  | status | address |
+----+----------+-----------+------+------+--------+---------+
|  2 | zy2      | zhaoyun2  |   23 | 1    |      1 | beijing |
+----+----------+-----------+------+------+--------+---------+
1 row in set (5.11 sec)
mysql> explain select * from tbiguser where id*2=4 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 12019153
        Extra: Using where
1 row in set (0.00 sec)

It can be seen that the type is all (full table scan, with the worst performance), which takes 5.11 and does not use indexes. We can change it to the following sentence

mysql> select * from tbiguser where id=4/2;
+----+----------+-----------+------+------+--------+---------+
| id | nickname | loginname | age  | sex  | status | address |
+----+----------+-----------+------+------+--------+---------+
|  2 | zy2      | zhaoyun2  |   23 | 1    |      1 | beijing |
+----+----------+-----------+------+------+--------+---------+
1 row in set (0.00 sec)
mysql> explain select * from tbiguser where id=4/2 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbiguser
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra:
1 row in set (0.00 sec)

The type type is const (equivalent query using primary key or unique index, constant query). The query time is seconds. The index is used, so do not perform expression operation on the field.

15. For the joint index, the leftmost prefix rule should be observed

  • The index contains fields id, loginname and age. You can use the id field directly or in the order of id and loginname, but loginname; This index cannot be used by ages. Therefore, when creating a joint index, you must pay attention to the order of index fields, and the commonly used query fields are placed at the top.
  • I won't demonstrate this. It's very simple. You can try it

16. If necessary, force index can be used to force the query to go to an index

  • Sometimes the MySQL optimizer uses the index it deems appropriate to retrieve SQL statements, but maybe the index it uses is not what we want. In this case, force index can be used to force the optimizer to use the index we have made.

17. Pay attention to the range query statement

  • For a federated index, if there are range queries, such as between, >, <, etc., the following index fields will become invalid

Example: use the user table as an example to create a combined index with two columns of age and name, age_name

mysql> create index age_name on user(age,name);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select age,name from user where age>50 and name='java' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: index
possible_keys: age_name
          key: age_name
      key_len: 38
          ref: NULL
         rows: 5
        Extra: Using where; Using index
1 row in set (0.00 sec)

type is index (full table scan), so the index is invalid.

18. Try to use inner join and avoid left join:

  • There are at least two tables participating in the joint query, which are generally of different sizes. If the connection method is inner join, MySQL will automatically select the small table as the driving table without other filter conditions, but the left join follows the principle of driving the right from the left, that is, the table on the left of the left join is called the driving table
  • The index field of the driven table is used as the limit field of on
  • Small tables are used to drive large tables to reduce the number of cycles in nested loops

For example, a small table has 5 pieces of data and a large table has 10000 pieces of data. If a small table drives a large table, the small table and the large table are connected five times. If a large table drives a small table, it is connected 10000 times,

4, Complex sql optimization practice

1. Create tables tuser1,tuser2 and tbiguser. We have created 10 million pieces of data on this table. For actual combat, we modified the first 100 address es as "tianjin" and 100-150 as "shanghai"

CREATE TABLE `tuser1` (
  `id` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `name` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `address` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `address1` (`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

insert  into `tuser1`(`id`,`name`,`address`) values ('1','zhangfei','tianjin');
insert  into `tuser1`(`id`,`name`,`address`) values ('2','zhaoyun','tianjin');
insert  into `tuser1`(`id`,`name`,`address`) values ('3','diaochan','guangzhou');
insert  into `tuser1`(`id`,`name`,`address`) values ('4','diaochan','xianggang');
insert  into `tuser1`(`id`,`name`,`address`) values ('5','diaochan','hebei');
insert  into `tuser1`(`id`,`name`,`address`) values ('6','diaochan','dongbei');
insert  into `tuser1`(`id`,`name`,`address`) values ('7','diaochan','dongbei');
insert  into `tuser1`(`id`,`name`,`address`) values ('8','diaochan','dongbei');
insert  into `tuser1`(`id`,`name`,`address`) values ('9','diaochan','dongbei');
insert  into `tuser1`(`id`,`name`,`address`) values ('10','diaochan','dongbei');
insert  into `tuser1`(`id`,`name`,`address`) values ('11','1','1');
insert  into `tuser1`(`id`,`name`,`address`) values ('12','1','1');

CREATE TABLE `tuser2` (
  `id` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `name` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `address` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `address1` (`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;


insert  into `tuser2`(`id`,`name`,`address`) values ('1','zhangfei','shanghai');
insert  into `tuser2`(`id`,`name`,`address`) values ('2','zhaoyun','shanghai');
insert  into `tuser2`(`id`,`name`,`address`) values ('3','diaochan','guangzhou');
insert  into `tuser2`(`id`,`name`,`address`) values ('4','diaochan','xianggang');
insert  into `tuser2`(`id`,`name`,`address`) values ('5','diaochan','heibei');
insert  into `tuser2`(`id`,`name`,`address`) values ('6','diaochan','dongbei');
insert  into `tuser2`(`id`,`name`,`address`) values ('7','diaochan','dongbei');
insert  into `tuser2`(`id`,`name`,`address`) values ('8','diaochan','dongbei');
insert  into `tuser2`(`id`,`name`,`address`) values ('9','diaochan','dongbei');
insert  into `tuser2`(`id`,`name`,`address`) values ('10','diaochan','dongbei');
insert  into `tuser2`(`id`,`name`,`address`) values ('11','1','1');
insert  into `tuser2`(`id`,`name`,`address`) values ('12','1','1');
insert  into `tuser2`(`id`,`name`,`address`) values ('13','1','1');

Demand: the tbiguser table is summed by region grouping statistics. It is required that the regions appear in tuser1 and tuser2 tables

First of all, these three tables do not create other indexes except the primary key index.

Step 1, I must write an sql first. I didn't run this sql, and the running time is too long.

explain select  count(id) num , address from tbiguser  
where address in (select distinct address from tuser1)  group by address 
union  
select count(id) num , address from tbiguser  
where address in (select distinct address from tuser2)  group by address ;

Step 2: if it doesn't work, it's OK.

CREATE INDEX address1 ON tbiguser(address);
CREATE INDEX address1 ON tuser1(address);
CREATE INDEX address1 ON tuser2(address);

As mentioned above, we have indexed the address fields of the three tables. It takes too long to run the above tables.

Step 3, change the sql to the following, which takes a long time, 26.57 seconds.

mysql> select  count(id) num , address from tbiguser  where address in (select
    -> distinct address from tuser1) or address in (select distinct address from
    -> tuser2) group by address order by address;
+-----+----------+
| num | address  |
+-----+----------+
|  50 | shanghai |
| 100 | tianjin  |
+-----+----------+
2 rows in set (26.57 sec)

Let's take a look at the explain implementation plan:
 

mysql> EXPLAIN SELECT  COUNT(id) num , address FROM tbiguser  WHERE address IN (SELECT DISTINCT
    -> address FROM tuser1) OR address IN (SELECT DISTINCT address FROM tuser2) GROUP
    -> BY address ORDER BY address \G;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: tbiguser
         type: index
possible_keys: NULL
          key: address1
      key_len: 768
          ref: NULL
         rows: 11992968
        Extra: Using where; Using index
*************************** 2. row ***************************

Although the index is used, the whole data is scanned.

In step 4, we can use the index to find the data first, and then sum it.

First of all, this data is to use the index to find the data first. This query is particularly efficient

SELECT t.id,t.address FROM tbiguser t , tuser1 t1 WHERE t.address=t1.address GROUP BY t.id

Let's look at the sql execution plan. The amount of scanned data is very small.

mysql> explain SELECT t.id,t.address FROM tbiguser t , tuser1 t1 WHERE t.address=t1.address GROUP BY t.id \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: address1
          key: address1
      key_len: 768
          ref: NULL
         rows: 12
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: address1
          key: address1
      key_len: 768
          ref: mytest.t1.address
         rows: 672095
        Extra: Using where; Using index
2 rows in set (0.01 sec)

In this way, we can check the data seconds with the following sql.

mysql> SELECT COUNT(*) num,x.address FROM
    -> (SELECT t.id,t.address FROM tbiguser t , tuser1 t1 WHERE t.address=t1.address GROUP BY t.id
    -> UNION ALL  SELECT t.id,t.address FROM tbiguser t , tuser2 t2 WHERE t.address=t2.address GROUP BY t.id) X
    ->  GROUP BY x.address;
+-----+----------+
| num | address  |
+-----+----------+
|  50 | shanghai |
| 100 | tianjin  |
+-----+----------+
2 rows in set (0.01 sec)

The data is out of seconds, and the efficiency is particularly high.

5, Ali index protocol


1. [mandatory] for fields with unique business characteristics, even for a combination of multiple fields, a unique index must be built.
Note: do not think that the unique index affects the insert speed. This speed loss can be ignored, but it is obvious to improve the search speed; In addition, even if a very perfect verification control is made in the application layer, as long as there is no unique index, according to Murphy's law, there must be dirty data.


2. [mandatory] join is prohibited for more than three tables. The data types of the fields that need to be joined must be absolutely the same. For multi table Association query,
Ensure that the associated fields need to have indexes.
Note: you should pay attention to table index and SQL performance even if you join a double table.


3. [mandatory] when creating an index on a varchar field, you must specify the index length. It is not necessary to create an index on all fields. According to
The actual text discrimination determines the index length.
Note: the length and discrimination of the index are a pair of contradictions. Generally, for string type data, the index with length of 20 can be distinguished
The degree can reach more than 90%, which can be determined by the discrimination of count (distinct left (column name, index length)) / count(*).


4. [mandatory] page search is strictly prohibited from left blur or full blur. If necessary, please go to the search engine to solve it.
Note: the index file has the leftmost prefix matching feature of B-Tree. If the left value is not determined, this index cannot be used.


5. [recommended] if there is an order by scenario, please pay attention to the order of the index. The last field of order by is combination
Part of the index and placed at the end of the index combination order to avoid file_sort affects query performance.
Positive example: where a=? and b=? order by c; Index: a_b_c
Counter example: if there is range search in the index, the index order cannot be used, such as where a > 10 order by B; Index a_b cannot sort.


6. [recommended] use the overlay index to query and avoid returning to the table.
Note: if a book needs to know what the title of Chapter 11 is, will it open the page corresponding to Chapter 11? directory browse
Just click. This directory is used to overwrite the index.
Positive example: there are three types of indexes that can be established: primary key index, unique index and ordinary index, while overlay index is only a query
As an effect of query, with the result of explain, the extra column will appear: using index.


7. [recommended] use delay association or sub query to optimize the super multi paging scenario.
Note: MySQL does not skip the offset line, but takes the offset+N line, then returns the offset line before abandonment, and returns
N lines. When the offset is very large, the efficiency is very low. Either control the total number of pages returned or exceed
SQL overwrites the number of pages for a specific threshold.
Positive example: first quickly locate the id segment to be obtained, and then associate:
SELECT a.* FROM table 1 a, (select id from table 1 where condition limit 100000,20) b where a.id = b.id


8. [recommended] the goal of SQL performance optimization: at least reach the range level. It is required to be ref level. If it can be consts, it is the most important
OK.
explain:
1) There is at most one matching row (primary key or unique index) in a consts single table, and the data can be read in the optimization stage.
2) ref refers to using a normal index.
3) Range retrieves the range of the index.
Counter example: the result of explain table, type=index, is that the index physical file is fully scanned, and the speed is very slow. This is the index level
Don't compare the low range with the full table scanning.


9. [recommended] when building a composite index, the one with the highest discrimination is on the left
Positive example: if where a=? and b=? , If the value of column A is almost unique, you only need to create idx_a index is enough.
Note: when there is a mixture of non equal sign and equal sign, please precede the column of equal sign condition when building an index. For example: where C >? and
d=? Even if C has a higher degree of discrimination, d must be placed at the forefront of the index, that is, index idx_d_c.


10. [recommended] prevent implicit conversion caused by different field types, resulting in index failure.


11. [reference] avoid the following extreme misunderstandings when creating an index
1) Better abuse than lack. Think that a query needs to build an index.
2) Better lack than abuse. It is considered that the index will consume space and seriously slow down the update and addition speed.
3) Resist unique indexes. It is considered that the uniqueness of the business needs to be solved in the application layer by "checking first and then inserting".

6, Alibaba SQL statement specification


1. [mandatory] do not use count (column name) or count (constant) instead of count(). count() is the standard statistics line defined by SQL92
The syntax of number has nothing to do with database, NULL and non NULL.
Note: count(*) will count the rows with NULL values, while count (column name) will not count the rows with NULL values.


2. [mandatory] count(distinct col) calculates the number of non repeating rows of the column except NULL. Note that count(distinct col1, col2) if one column is NULL, it returns 0 even if the other column has different values.


3. [mandatory] when the values of a column are all NULL, the return result of count(col) is 0, but the return result of sum(col) is 0
NULL, so you should pay attention to NPE (Null Pointer Exception) when using sum().
Positive example: you can use the following methods to avoid the NPE problem of sum: SELECT IF(ISNULL(SUM(g)),0,SUM(g))FROM table;


4. [mandatory] use ISNULL() to determine whether it is a NULL value.
Note: the direct comparison between NULL and any value is NULL.
1) NULL < > the return result of NULL is NULL, not false.
2) NULL=NULL returns NULL, not true.
3) NULL < > 1 returns NULL instead of true.


5. [force] when writing paging query logic in the code, if count is 0, it should be returned directly to avoid executing subsequent paging statements.


6. [mandatory] foreign keys and cascading are not allowed. All foreign key concepts must be solved at the application layer.
Note: Taking the relationship between students and grades as an example, students in the student table_ If Id is the primary key, the student in the grade sheet_ id
Foreign key. If you update the student in the student table_ ID, and trigger the student in the grade sheet at the same time_ ID update, i.e
Cascade updates. Foreign keys and cascaded updates are suitable for single machine low concurrency, not for distributed and high concurrency clusters; Cascade renewal is a strong resistance
There is a risk of database update storm; Foreign keys affect the insertion speed of the database.


7. [mandatory] it is forbidden to use stored procedures, which are difficult to debug and expand, and have no portability.


8. [force] when revising data (especially deleting and modifying records), select first to avoid accidental deletion. The update statement can be executed only after confirmation.

Topics: Java MySQL