On the analysis of 8 kinds of SQL errors

Posted by Notoriouswow on Tue, 22 Oct 2019 03:12:33 +0200

Execution order of SQL statement:

FROM
<left_table>

ON
<join_condition>

<join_type>
 JOIN
<right_table>

WHERE
<where_condition>

GROUP BY
<group_by_list>

HAVING
<having_condition>

SELECT

DISTINCT
<select_list>

ORDER BY
<order_by_condition>

LIMIT
<limit_number>

1. LIMIT statement

Paging query is one of the most commonly used scenarios, but it is also the most prone to problems. For example, for the following simple statements, the general way DBA s think of is to add a composite index to the fields of type, name, create "time. In this way, conditional sorting can effectively utilize index and improve performance rapidly.

SELECT *
FROM   operation
WHERE  type = 'SQLStats'
       AND name = 'SlowLog'
ORDER  BY create_time
LIMIT  1000, 10;

Well, maybe more than 90% of DBA s solve this problem. But when the LIMIT clause becomes "LIMIT 1000000,10", programmers will still complain: why am I still slow to take only 10 records?

To know that the database does not know where the 1000000 record starts, even if there is an index, it needs to be calculated from the beginning. When this performance problem occurs, programmers are lazy in most cases.

In the scenario of front-end data browsing and page turning, or batch export of big data, the maximum value of the previous page can be regarded as a parameter as a query condition. SQL is redesigned as follows:

SELECT   *
FROM     operation
WHERE    type = 'SQLStats'
AND      name = 'SlowLog'
AND      create_time > '2017-03-16 14:00:00'
ORDER BY create_time limit 10;

Under the new design, the query time is basically fixed and will not change with the growth of data volume.

2. Implicit conversion

Another common error in SQL statements is the mismatch between query variables and field definition types. For example, the following statement:

mysql> explain extended SELECT *
     > FROM   my_balance b
     > WHERE  b.bpn = 14000000123
     >       AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

The field bpn is defined as varchar(20), and MySQL's strategy is to convert strings to numbers and then compare them. Function acts on a table field and the index fails.

This might be a parameter that the application framework automatically fills in, not what the programmer meant. At present, the application framework is very complex, and it is convenient to use, but also be careful that it may dig holes for itself.

3. Association update and deletion

Although MySQL 5.6 introduces the materialization feature, we need to pay special attention to the optimization of query statement. For update or deletion, it needs to be manually rewritten as a JOIN.

For example, in the following UPDATE statement, MySQL actually executes a circular / nested subquery, whose execution time can be imagined.

UPDATE operation o
SET    status = 'applying'
WHERE  o.id IN (SELECT id
                FROM   (SELECT o.id,
                               o.status
                        FROM   operation o
                        WHERE  o.group = 123
                               AND o.status NOT IN ( 'done' )
                        ORDER  BY o.parent,
                                  o.id
                        LIMIT  1) t);

Execution plan:

+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                                               |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY            | o     | index |               | PRIMARY | 8       |       | 24   | Using where; Using temporary                        |
| 2  | DEPENDENT SUBQUERY |       |       |               |         |         |       |      | Impossible WHERE noticed after reading const tables |
| 3  | DERIVED            | o     | ref   | idx_2,idx_5   | idx_5   | 8       | const | 1    | Using where; Using filesort                         |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+

After rewriting to JOIN, the sub query selection mode changes from DEPENDENT SUBQUERY to DERIVED, which greatly speeds up the execution speed, from 7 seconds to 2 milliseconds.

UPDATE operation o
       JOIN  (SELECT o.id,
                            o.status
                     FROM   operation o
                     WHERE  o.group = 123
                            AND o.status NOT IN ( 'done' )
                     ORDER  BY o.parent,
                               o.id
                     LIMIT  1) t
         ON o.id = t.id
SET    status = 'applying'

The implementation plan is simplified to:

+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                                               |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY     |       |      |               |       |         |       |      | Impossible WHERE noticed after reading const tables |
| 2  | DERIVED     | o     | ref  | idx_2,idx_5   | idx_5 | 8       | const | 1    | Using where; Using filesort                         |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+

4. Mixed sorting

MySQL cannot use indexes for mixed sorting. But in some scenarios, there are still opportunities to use special methods to improve performance.

SELECT *
FROM   my_order o
       INNER JOIN my_appraise a ON a.orderid = o.id
ORDER  BY a.is_reply ASC,
          a.appraise_time DESC
LIMIT  0, 20

The execution plan is displayed as a full table scan:

+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type   | possible_keys     | key     | key_len | ref      | rows    | Extra
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
|  1 | SIMPLE      | a     | ALL    | idx_orderid | NULL    | NULL    | NULL    | 1967647 | Using filesort |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY     | PRIMARY | 122     | a.orderid |       1 | NULL           |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

Since is reply has only two states, i.e. 0 and 1, after rewriting according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds.

SELECT *
FROM   ((SELECT *
         FROM   my_order o
                INNER JOIN my_appraise a
                        ON a.orderid = o.id
                           AND is_reply = 0
         ORDER  BY appraise_time DESC
         LIMIT  0, 20)
        UNION ALL
        (SELECT *
         FROM   my_order o
                INNER JOIN my_appraise a
                        ON a.orderid = o.id
                           AND is_reply = 1
         ORDER  BY appraise_time DESC
         LIMIT  0, 20)) t
ORDER  BY  is_reply ASC,
          appraisetime DESC
LIMIT  20;

5. EXISTS statement

When MySQL treats the EXISTS clause, it still uses nested subqueries. Such as the following SQL statement:

SELECT *
FROM   my_neighbor n
       LEFT JOIN my_neighbor_apply sra
              ON n.id = sra.neighbor_id
                 AND sra.user_id = 'xxx'
WHERE  n.topic_status < 4
       AND EXISTS(SELECT 1
                  FROM   message_info m
                  WHERE  n.id = m.neighbor_id
                         AND m.inuser = 'xxx')
       AND n.topic_type <> 5

The execution plan is:

+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type        | table | type | possible_keys     | key   | key_len | ref   | rows    | Extra   |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
|  1 | PRIMARY            | n     | ALL  |  | NULL     | NULL    | NULL  | 1086041 | Using where                   |
|  1 | PRIMARY            | sra   | ref  |  | idx_user_id | 123     | const |       1 | Using where          |
|  2 | DEPENDENT SUBQUERY | m     | ref  |  | idx_message_info   | 122     | const |       1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+

Removing exists and changing to join can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.

SELECT *
FROM   my_neighbor n
       INNER JOIN message_info m
               ON n.id = m.neighbor_id
                  AND m.inuser = 'xxx'
       LEFT JOIN my_neighbor_apply sra
              ON n.id = sra.neighbor_id
                 AND sra.user_id = 'xxx'
WHERE  n.topic_status < 4
       AND n.topic_type <> 5

New implementation plan:

+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type   | possible_keys     | key       | key_len | ref   | rows | Extra                 |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
|  1 | SIMPLE      | m     | ref    | | idx_message_info   | 122     | const    |    1 | Using index condition |
|  1 | SIMPLE      | n     | eq_ref | | PRIMARY   | 122     | ighbor_id |    1 | Using where      |
|  1 | SIMPLE      | sra   | ref    | | idx_user_id | 123     | const     |    1 | Using where           |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+

6. Push down

External query conditions cannot be pushed down to complex views or subqueries:

  1. Aggregate subquery;
  2. Subquery with LIMIT;
  3. UNION or UNION ALL subquery;
  4. Sub query in output field;

As shown in the following statement, it can be seen from the execution plan that its conditions act on the aggregation sub query:

SELECT *
FROM   (SELECT target,
               Count(*)
        FROM   operation
        GROUP  BY target) t
WHERE  target = 'rm-xxxx'
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table      | type  | possible_keys | key         | key_len | ref   | rows | Extra       |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
|  1 | PRIMARY     | <derived2> | ref   | <auto_key0>   | <auto_key0> | 514     | const |    2 | Using where |
|  2 | DERIVED     | operation  | index | idx_4         | idx_4       | 519     | NULL  |   20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+

Confirm that the semantic query criteria can be pushed down directly and rewritten as follows:

SELECT target,
       Count(*)
FROM   operation
WHERE  target = 'rm-xxxx'
GROUP  BY target

The implementation plan becomes:

+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+

Please refer to the previous article: MySQL · performance optimization · push down to materialized table under the condition for a detailed explanation of MySQL external conditions that cannot be pushed down http://mysql.taobao.org/monthly/2016/07/08

7. Reduce the scope in advance

Start with the initial SQL statement:

SELECT *
FROM   my_order o
       LEFT JOIN my_userinfo u
              ON o.uid = u.uid
       LEFT JOIN my_productinfo p
              ON o.pid = p.pid
WHERE  ( o.display = 0 )
       AND ( o.ostaus = 1 )
ORDER  BY o.selltime DESC
LIMIT  0, 15

The original meaning of the SQL statement is: first make a series of left connections, and then sort the first 15 records. As can be seen from the execution plan, the last step estimates that the number of sorting records is 900000 and the time consumption is 12 seconds.

+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref             | rows   | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
|  1 | SIMPLE      | o     | ALL    | NULL          | NULL    | NULL    | NULL            | 909119 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | SIMPLE      | p     | ALL    | PRIMARY       | NULL    | NULL    | NULL            |      6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+

Because the last WHERE condition and sorting are all for the leftmost main table, you can first reduce the amount of data before making a left join for my order sorting. After SQL rewriting, the execution time is reduced to about 1 ms.

SELECT *
FROM (
SELECT *
FROM   my_order o
WHERE  ( o.display = 0 )
       AND ( o.ostaus = 1 )
ORDER  BY o.selltime DESC
LIMIT  0, 15
) o
     LEFT JOIN my_userinfo u
              ON o.uid = u.uid
     LEFT JOIN my_productinfo p
              ON o.pid = p.pid
ORDER BY  o.selltime DESC
limit 0, 15

Check the execution plan again: participate in the JOIN after materializing the sub query (select [type = derived). Although the estimated row scan is still 900000, the actual execution time is very small after using index and LIMIT clause.

+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref   | rows   | Extra                                              |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL  |     15 | Using temporary; Using filesort                    |
|  1 | PRIMARY     | u          | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | PRIMARY     | p          | ALL    | PRIMARY       | NULL    | NULL    | NULL  |      6 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | o          | index  | NULL          | idx_1   | 5       | NULL  | 909112 | Using where                                        |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+

8. Push down the intermediate result set

Let's look at the following example that has been preliminarily optimized (the main table in the left join takes precedence over the query criteria):

SELECT    a.*,
          c.allocated
FROM      (
              SELECT   resourceid
              FROM     my_distribute d
                   WHERE    isdelete = 0
                   AND      cusmanagercode = '1234567'
                   ORDER BY salecode limit 20) a
LEFT JOIN
          (
              SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
              FROM     my_resources
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

Are there any other problems with this statement? It is not hard to see that sub query c is a full table aggregate query, which will lead to the performance degradation of the whole statement when the number of tables is particularly large.

In fact, for subquery c, the final result set of left connection only concerns the data that can match the main table resourceid. So we can rewrite the statement as follows, the execution time is reduced from 2 seconds to 2 milliseconds.

SELECT    a.*,
          c.allocated
FROM      (
                   SELECT   resourceid
                   FROM     my_distribute d
                   WHERE    isdelete = 0
                   AND      cusmanagercode = '1234567'
                   ORDER BY salecode limit 20) a
LEFT JOIN
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources r,
                            (
                                     SELECT   resourceid
                                     FROM     my_distribute d
                                     WHERE    isdelete = 0
                                     AND      cusmanagercode = '1234567'
                                     ORDER BY salecode limit 20) a
                   WHERE    r.resourcesid = a.resourcesid
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

But sub query a appears many times in our SQL statement. This kind of writing method not only has additional overhead, but also makes the whole statement appear complicated. Rewrite again WITH statement:

WITH a AS
(
         SELECT   resourceid
         FROM     my_distribute d
         WHERE    isdelete = 0
         AND      cusmanagercode = '1234567'
         ORDER BY salecode limit 20)
SELECT    a.*,
          c.allocated
FROM      a
LEFT JOIN
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources r,
                            a
                   WHERE    r.resourcesid = a.resourcesid
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

summary

The execution plan generated by database compiler determines the actual execution mode of SQL. But the compiler is just trying to serve, all the compilers of the database are not perfect.

Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of database compiler can we avoid its shortcomings and write high-performance SQL statements.

When programmers design data models and write SQL statements, they should bring in the idea or consciousness of algorithms.

To write complex SQL statements, you should get into the habit of using WITH statements. Simple and clear thinking SQL statement can also reduce the burden of database.

Topics: Java SQL MySQL Database Big Data