Cases of data inconsistency caused by MySQL implicit type conversion

Posted by dabbott on Fri, 05 Jun 2020 06:09:46 +0200

1.1 problem description

                  . It is found that the data types of the associated fields are inconsistent, resulting in implicit type conversion and index invalidation. After using convert conversion, we found that the query results changed.

Table structure

###t1###

CREATE TABLE `t1` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(32) NOT NULL,
  `age` int NOT NULL,
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `stamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `test_json` json NOT NULL,
  `num_dec` decimal(10,0) NOT NULL,
  `num_dou` double NOT NULL,
  `num_flo` float NOT NULL,
  `test_test` text NOT NULL,
  `joinnum` bigint NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

###t2###

CREATE TABLE `t2` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(32) NOT NULL,
  `age` int NOT NULL,
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `stamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `test_json` json NOT NULL,
  `num_dec` decimal(10,0) NOT NULL,
  `num_dou` double NOT NULL,
  `num_flo` float NOT NULL,
  `test_test` text NOT NULL,
  `joinnum` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_joinnum` (`joinnum`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Problem SQL

#The SQLa.joinnum The data type is bigint, and the data type of b.joinnum is varchar. There is an implicit type conversion and index cannot be used for Association (because 8.0 is used in this article, hash join is used).
root@mysql8 16:13:  [test]> select count(*) from t1 a join t2 b on a.joinnum=b.joinnum;
+----------+
| count(*) |
+----------+
|   249996 |
+----------+
1 row in set (0.67 sec)
root@mysql8 16:16:  [test]> explain select count(*) from t1 a join t2 b on a.joinnum=b.joinnum;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+---------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key         | key_len | ref  | rows  | filtered | Extra                                                   |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+---------------------------------------------------------+
|  1 | SIMPLE      | a     | NULL       | ALL   | NULL          | NULL        | NULL    | NULL | 99605 |   100.00 | NULL                                                    |
|  1 | SIMPLE      | b     | NULL       | index | idx_joinnum   | idx_joinnum | 1022    | NULL | 99484 |    10.00 | Using where; Using index; Using join buffer (hash join) |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+---------------------------------------------------------+
2 rows in set, 3 warnings (0.00 sec)
root@mysql8 17:20:  [test]> show warnings;
+---------+------+-------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                               |
+---------+------+-------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'idx_joinnum' due to type or collation conversion on field 'joinnum'   |
| Warning | 1739 | Cannot use range access on index 'idx_joinnum' due to type or collation conversion on field 'joinnum' |
+---------+------+-------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Overwrite SQL

# Convert the a.joinnum function to char type and make it index.
root@mysql8 16:16:  [test]> select count(*) from t1 a join t2 b on convert(a.joinnum,char)=b.joinnum;
+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.37 sec)

root@mysql8 16:17:  [test]> explain select count(*) from t1 a join t2 b on convert(a.joinnum,char)=b.joinnum;
+----+-------------+-------+------------+------+---------------+-------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key         | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+------+---------------+-------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | a     | NULL       | ALL  | NULL          | NULL        | NULL    | NULL | 99605 |   100.00 | NULL                     |
|  1 | SIMPLE      | b     | NULL       | ref  | idx_joinnum   | idx_joinnum | 1022    | func |     1 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+------+---------------+-------------+---------+------+-------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

Inconsistent data instance SQL

#Some data are inconsistent
root@mysql8 17:34:  [test]> select a.joinnum,b.joinnum from t1 a join t2 b on a.joinnum=b.joinnum limit 10;
+-------------------+-------------------+
| joinnum           | joinnum           |
+-------------------+-------------------+
| 10000000000000001 | 10000000000000001 |
| 10000000000000002 | 10000000000000002 |
| 10000000000000005 | 10000000000000003 |
| 10000000000000004 | 10000000000000003 |
| 10000000000000003 | 10000000000000003 |
| 10000000000000005 | 10000000000000004 |
| 10000000000000004 | 10000000000000004 |
| 10000000000000003 | 10000000000000004 |
| 10000000000000005 | 10000000000000005 |
| 10000000000000004 | 10000000000000005 |
+-------------------+-------------------+
10 rows in set (0.02 sec)

1.2 theoretical basis

https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html

The following rules describe how the comparison operation is transformed:

If one or both arguments are NULL, the result of the comparison is NULL, except for the NULL-safe <=> equality comparison operator. For NULL <=> NULL, the result is true. No conversion is needed.
If one or both parameters are NULL, the result of the comparison is NULL, except for the NULL safe < = > equality comparison operator. For NULL < = > NULL, the result is true. No conversion required.
If both arguments in a comparison operation are strings, they are compared as strings.
If both parameters in the comparison operation are strings, they are compared as strings.
If both arguments are integers, they are compared as integers.
If both parameters are integers, they are compared as integers.
Hexadecimal values are treated as binary strings if not compared to a number.
If not compared to a number, the hexadecimal value is treated as a binary string.
If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed. This is done to be more ODBC-friendly. This is not done for the arguments to IN(). To be safe, always use complete datetime, date, or time strings when doing comparisons. For example, to achieve best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type.
If one of the parameters is a TIMESTAMP or DATETIMEcolumn and the other is a constant, the constant is converted to a TIMESTAMP before the comparison is performed. This is to make ODBC more friendly. The operation did not complete IN() for the parameter of. For security reasons, always use the complete datetime, date, or time string when comparing. For example, to get the best results when using BETWEEN with a date or time value, use CAST() to explicitly convert the value to the desired data type.
A single-row subquery from a table or tables is not considered a constant. For example, if a subquery returns an integer to be compared to a DATETIME value, the comparison is done as two integers. The integer is not converted to a temporal value. To compare the operands as DATETIME values, use CAST() to explicitly convert the subquery value to DATETIME.
Single row subqueries in one or more tables are not treated as constants. For example, if a subquery returns an integer to compare with a DATETIME value, the comparison is done as two integers. Integers are not converted to time values. To compare operands as DATETIME values, use CAST() to explicitly convert subquery values to DATETIME.
If one of the arguments is a decimal value, comparison depends on the other argument. The arguments are compared as decimal values if the other argument is a decimal or integer value, or as floating-point values if the other argument is a floating-point value.
If one of the parameters is a decimal value, the comparison depends on the other. If the other parameter is a decimal or integer value, compare the parameter as a decimal value; if the other parameter is a floating-point value, compare the parameter as a floating-point value.
In all other cases, the arguments are compared as floating-point (real) numbers. For example, a comparison of string and numeric operands takes place as a comparison of floating-point numbers.
In all other cases, compare parameters as floating-point numbers (real numbers). For example, compare a string with a numeric operand as a floating-point comparison.
  1. To avoid implicit type conversion, the types of implicit conversion mainly include inconsistent field types, multiple types of in parameter, inconsistent character set types or proofing rules, etc
  2. Implicit type conversion may lead to the inability to use indexes, inaccurate query results, etc., so it is necessary to pay attention to
  3. It is recommended to define the number type as int or bigint when defining the field. When associating a table, the associated field must keep the same type, character set and proofing rules
  4. When a string is converted to a floating-point number, a problem occurs when the number of digits is at least 17.

   using floating-point numbers can sometimes be confusing because they are approximate rather than stored as exact values. Floating point values written in SQL statements may not be the same as the values represented internally. Trying to treat floating-point values as exact values in a comparison can cause problems. They are also constrained by platform or implementation dependencies. Both the FLOAT and DOUBLE data types suffer from these problems. For DECIMAL columns, MySQL performs 65 DECIMAL digits, which should solve the most common inaccuracy problem.

Topics: Database SQL JSON MySQL odbc