Execution order of SQL statement:
FROM <left_table> ON <join_condition> <join_type> JOIN <right_table> WHERE <where_condition> GROUP BY <group_by_list> HAVING <having_condition> SELECT DISTINCT <select_list> ORDER BY <order_by_condition> LIMIT <limit_number>
1. LIMIT statement
Paging query is one of the most commonly used scenarios, but it is also the most prone to problems. For example, for the following simple statements, the general way DBA s think of is to add a composite index to the fields of type, name, create "time. In this way, conditional sorting can effectively utilize index and improve performance rapidly.
SELECT * FROM operation WHERE type = 'SQLStats' AND name = 'SlowLog' ORDER BY create_time LIMIT 1000, 10;
Well, maybe more than 90% of DBA s solve this problem. But when the LIMIT clause becomes "LIMIT 1000000,10", programmers will still complain: why am I still slow to take only 10 records?
To know that the database does not know where the 1000000 record starts, even if there is an index, it needs to be calculated from the beginning. When this performance problem occurs, programmers are lazy in most cases.
In the scenario of front-end data browsing and page turning, or batch export of big data, the maximum value of the previous page can be regarded as a parameter as a query condition. SQL is redesigned as follows:
SELECT * FROM operation WHERE type = 'SQLStats' AND name = 'SlowLog' AND create_time > '2017-03-16 14:00:00' ORDER BY create_time limit 10;
Under the new design, the query time is basically fixed and will not change with the growth of data volume.
2. Implicit conversion
Another common error in SQL statements is the mismatch between query variables and field definition types. For example, the following statement:
mysql> explain extended SELECT * > FROM my_balance b > WHERE b.bpn = 14000000123 > AND b.isverified IS NULL ; mysql> show warnings; | Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'
The field bpn is defined as varchar(20), and MySQL's strategy is to convert strings to numbers and then compare them. Function acts on a table field and the index fails.
This might be a parameter that the application framework automatically fills in, not what the programmer meant. At present, the application framework is very complex, and it is convenient to use, but also be careful that it may dig holes for itself.
3. Association update and deletion
Although MySQL 5.6 introduces the materialization feature, we need to pay special attention to the optimization of query statement. For update or deletion, it needs to be manually rewritten as a JOIN.
For example, in the following UPDATE statement, MySQL actually executes a circular / nested subquery, whose execution time can be imagined.
UPDATE operation o SET status = 'applying' WHERE o.id IN (SELECT id FROM (SELECT o.id, o.status FROM operation o WHERE o.group = 123 AND o.status NOT IN ( 'done' ) ORDER BY o.parent, o.id LIMIT 1) t);
Execution plan:
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+ | 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary | | 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables | | 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort | +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
After rewriting to JOIN, the sub query selection mode changes from DEPENDENT SUBQUERY to DERIVED, which greatly speeds up the execution speed, from 7 seconds to 2 milliseconds.
UPDATE operation o JOIN (SELECT o.id, o.status FROM operation o WHERE o.group = 123 AND o.status NOT IN ( 'done' ) ORDER BY o.parent, o.id LIMIT 1) t ON o.id = t.id SET status = 'applying'
The implementation plan is simplified to:
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+ | 1 | PRIMARY | | | | | | | | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort | +----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
4. Mixed sorting
MySQL cannot use indexes for mixed sorting. But in some scenarios, there are still opportunities to use special methods to improve performance.
SELECT * FROM my_order o INNER JOIN my_appraise a ON a.orderid = o.id ORDER BY a.is_reply ASC, a.appraise_time DESC LIMIT 0, 20
The execution plan is displayed as a full table scan:
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra +----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+ | 1 | SIMPLE | a | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort | | 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 122 | a.orderid | 1 | NULL | +----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+
Since is reply has only two states, i.e. 0 and 1, after rewriting according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds.
SELECT * FROM ((SELECT * FROM my_order o INNER JOIN my_appraise a ON a.orderid = o.id AND is_reply = 0 ORDER BY appraise_time DESC LIMIT 0, 20) UNION ALL (SELECT * FROM my_order o INNER JOIN my_appraise a ON a.orderid = o.id AND is_reply = 1 ORDER BY appraise_time DESC LIMIT 0, 20)) t ORDER BY is_reply ASC, appraisetime DESC LIMIT 20;
5. EXISTS statement
When MySQL treats the EXISTS clause, it still uses nested subqueries. Such as the following SQL statement:
SELECT * FROM my_neighbor n LEFT JOIN my_neighbor_apply sra ON n.id = sra.neighbor_id AND sra.user_id = 'xxx' WHERE n.topic_status < 4 AND EXISTS(SELECT 1 FROM message_info m WHERE n.id = m.neighbor_id AND m.inuser = 'xxx') AND n.topic_type <> 5
The execution plan is:
+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+ | 1 | PRIMARY | n | ALL | | NULL | NULL | NULL | 1086041 | Using where | | 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where | | 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where | +----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
Removing exists and changing to join can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.
SELECT * FROM my_neighbor n INNER JOIN message_info m ON n.id = m.neighbor_id AND m.inuser = 'xxx' LEFT JOIN my_neighbor_apply sra ON n.id = sra.neighbor_id AND sra.user_id = 'xxx' WHERE n.topic_status < 4 AND n.topic_type <> 5
New implementation plan:
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+ | 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition | | 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where | | 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where | +----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
6. Push down
External query conditions cannot be pushed down to complex views or subqueries:
- Aggregate subquery;
- Subquery with LIMIT;
- UNION or UNION ALL subquery;
- Sub query in output field;
As shown in the following statement, it can be seen from the execution plan that its conditions act on the aggregation sub query:
SELECT * FROM (SELECT target, Count(*) FROM operation GROUP BY target) t WHERE target = 'rm-xxxx' +----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+ | 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 514 | const | 2 | Using where | | 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index | +----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
Confirm that the semantic query criteria can be pushed down directly and rewritten as follows:
SELECT target, Count(*) FROM operation WHERE target = 'rm-xxxx' GROUP BY target
The implementation plan becomes:
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+ | 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index | +----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
Please refer to the previous article: MySQL · performance optimization · push down to materialized table under the condition for a detailed explanation of MySQL external conditions that cannot be pushed down http://mysql.taobao.org/monthly/2016/07/08
7. Reduce the scope in advance
Start with the initial SQL statement:
SELECT * FROM my_order o LEFT JOIN my_userinfo u ON o.uid = u.uid LEFT JOIN my_productinfo p ON o.pid = p.pid WHERE ( o.display = 0 ) AND ( o.ostaus = 1 ) ORDER BY o.selltime DESC LIMIT 0, 15
The original meaning of the SQL statement is: first make a series of left connections, and then sort the first 15 records. As can be seen from the execution plan, the last step estimates that the number of sorting records is 900000 and the time consumption is 12 seconds.
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+ | 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | | 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
Because the last WHERE condition and sorting are all for the leftmost main table, you can first reduce the amount of data before making a left join for my order sorting. After SQL rewriting, the execution time is reduced to about 1 ms.
SELECT * FROM ( SELECT * FROM my_order o WHERE ( o.display = 0 ) AND ( o.ostaus = 1 ) ORDER BY o.selltime DESC LIMIT 0, 15 ) o LEFT JOIN my_userinfo u ON o.uid = u.uid LEFT JOIN my_productinfo p ON o.pid = p.pid ORDER BY o.selltime DESC limit 0, 15
Check the execution plan again: participate in the JOIN after materializing the sub query (select [type = derived). Although the estimated row scan is still 900000, the actual execution time is very small after using index and LIMIT clause.
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort | | 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | | 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) | | 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where | +----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
8. Push down the intermediate result set
Let's look at the following example that has been preliminarily optimized (the main table in the left join takes precedence over the query criteria):
SELECT a.*, c.allocated FROM ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a LEFT JOIN ( SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources GROUP BY resourcesid) c ON a.resourceid = c.resourcesid
Are there any other problems with this statement? It is not hard to see that sub query c is a full table aggregate query, which will lead to the performance degradation of the whole statement when the number of tables is particularly large.
In fact, for subquery c, the final result set of left connection only concerns the data that can match the main table resourceid. So we can rewrite the statement as follows, the execution time is reduced from 2 seconds to 2 milliseconds.
SELECT a.*, c.allocated FROM ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a LEFT JOIN ( SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources r, ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesid
But sub query a appears many times in our SQL statement. This kind of writing method not only has additional overhead, but also makes the whole statement appear complicated. Rewrite again WITH statement:
WITH a AS ( SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) SELECT a.*, c.allocated FROM a LEFT JOIN ( SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources r, a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesid
summary
The execution plan generated by database compiler determines the actual execution mode of SQL. But the compiler is just trying to serve, all the compilers of the database are not perfect.
Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of database compiler can we avoid its shortcomings and write high-performance SQL statements.
When programmers design data models and write SQL statements, they should bring in the idea or consciousness of algorithms.
To write complex SQL statements, you should get into the habit of using WITH statements. Simple and clear thinking SQL statement can also reduce the burden of database.