The invalid character encoding of mysql index is inconsistent

Posted by nothingatall on Wed, 01 Dec 2021 14:22:23 +0100

preface

I'm sure you've heard that if the character codes of the two tables are inconsistent, the index field will cause index invalidation during join, but is that true? This article will carefully analyze this problem.

prepare

We first prepare two tables, one using utf8 character set and one using utf8mb4 character set, and insert a piece of data respectively.

-- Table building utf8 character set
CREATE TABLE t_utf8 (
    id INT ( 11 ) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'Self increasing id',
    age INT ( 11 ) COMMENT 'Age',
    NAME VARCHAR ( 32 ) COMMENT 'full name',
    PRIMARY KEY ( id ),
    KEY idx_name ( NAME ),
    KEY idx_age ( age ) 
) ENGINE = INNODB DEFAULT CHARSET = utf8;

-- Insert a piece of data
INSERT into t_utf8 (age,name) VALUES (1,'zz');

-- Table building utf8mb4 character set
CREATE TABLE t_utf8m (
    id INT ( 11 ) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'Self increasing id',
    age INT ( 11 ) COMMENT 'Age',
    NAME VARCHAR ( 32 ) COMMENT 'full name',
    PRIMARY KEY ( id ),
    KEY idx_name ( NAME ),
    KEY idx_age ( age ) 
) ENGINE = INNODB DEFAULT CHARSET = utf8mb4;

-- Insert a piece of data
INSERT into t_utf8m (age,name) VALUES (1,'zz');

Now the indexes are established in the name and age fields of the two tables. In the following two SQL statements, will a.age = b.age, a.name = b.name go through the index?

EXPLAIN select * from t_utf8m a, t_utf8 b where a.name = b.name and a.name = 'zz';

EXPLAIN select * from t_utf8m a, t_utf8 b where a.name = b.name and b.name = 'zz';

Let's talk about the conclusion directly. The first SQL won't go through the index, and the second SQL will go through the index. Let's analyze a.name = b.name. Why does one go through the index and the other doesn't? Analyze the SQL execution. We can see from the execution plan that the name condition of table a is filtered first and IDX is gone_ Name index, and then match with table B.

We call table a as the driven table and table b as the driven table. The final SQL statement can be understood as follows:

The first step is to perform conditional filtering of table a

select * from t_utf8m a where a.name = 'zz';

The second step is to bring the result set to b

select * from t_utf8 b where b.name = 'a Table filtered values';

Well, now that we know the execution plan of the SQL statement, let's see how to deal with the inconsistency between the character sets of the two tables.

Character set conversion

As we know, the character set utf8mb4 is a superset of utf8, so utf8 will be converted to utf8mb4 according to the principle of subset to superset conversion (precision may be lost if superset is converted to subset). After understanding this rule, the above SQL statement can analyze the following form:

select * from t_utf8 b where CONVERT(b.name USING utf8mb4) = 'a Table filtered values';

Seeing this, you should know what causes the index to fail. Yes, the function operation is performed on the index column. Why can another go

EXPLAIN select * from t_utf8m a, t_utf8 b where a.name = b.name and b.name = 'zz';

We can also see from the execution plan that b is filtered first, and then matched with a.

Therefore, the SQL execution process is as follows:

The first step is to perform conditional filtering of table b

select * from t_utf8 b where b.name = 'zz';

The second step is to bring the result set to a

select * from t_utf8m a where a.name = 'b Table filtered values';

According to the conversion principle, the final execution is analyzed. Obviously, there is no function operation on the index column, so the index can be used.

select * from t_utf8m a where a.name = CONVERT('b Table filtered values' USING utf8mb4);

summary

I think you should now understand the real reason why inconsistent character codes lead to index failure. In fact, there is a function conversion on the index column. We should fundamentally solve this problem and directly make the character sets of the two tables consistent.

Topics: Database MySQL

Programmer Think

The invalid character encoding of mysql index is inconsistent

preface

prepare

Character set conversion

summary

Hot Topics