Talk about the failure of MySQL index

Posted by Romeo20 on Thu, 24 Feb 2022 08:35:52 +0100

1, Index principle

Index is an important method to improve the performance of database query.

Use indexes to quickly find rows in a column that contain specific values. If you do not use an index, you must start reading from the first record. You may have to read the whole table to find the relevant rows.

Using an index is like looking up a dictionary. We can quickly find the words we need according to the directory (index) sorted by pinyin, strokes, partial radicals, etc.

Previous articles on MySQL storage engine( Talk about MySQL's storage engine ), the test compares two storage engines (MyISAM or InnoDB). The efficiency of using primary key index query is dozens of times faster.

Although the index greatly improves the speed of query (select), it will also reduce the speed of updating (insert,update,delete) tables, because when updating tables, the database should not only update and save data, but also update and save index files. Of course, index files also occupy storage space.

2, Scenario of index invalidation

Although the index speeds up the query efficiency, if it is not used properly, the index will fail.

Here are some scenarios of index invalidation.

CREATETABLE customer(
id INTAUTO_INCREMENT,
companyVARCHAR(30),
nameVARCHAR(30),
sex enum('male','female'),
age INT,
phoneVARCHAR(30),
addressVARCHAR(60),
PRIMARYKEY (id));

CREATEINDEX index_company ON customer(company);
CREATEINDEX index_age ON customer(age);
CREATEINDEX index_phone ON customer(phone);
CREATEINDEX index_phone_name ON customer(phone,name);

A customer table, three single column indexes and a combined index are created above. Let's check the indexes:

mysql>show index from customer; # (some information has been removed due to typographical problems)
+------------+------------------+--------------+-------------+------------+
|Non_unique | Key_name         |Seq_in_index | Column_name | Index_type |
+------------+------------------+--------------+-------------+------------+
|          0 | PRIMARY          |           1 | id          | BTREE      |
|          1 | index_company    |           1 | company     | BTREE      |
|          1 | index_age        |            1 | age         | BTREE      |
|          1 | index_phone      |            1 | phone       |BTREE      |
|          1 | index_phone_name |            1 | phone       | BTREE      |
|          1 | index_phone_name |            2 | name        | BTREE      |
+------------+------------------+--------------+-------------+------------+
6 rows inset (0.01 sec)

The above display includes a default primary key index, three single column indexes and two combined index items. The primary key index is a unique index, and the index_type is displayed as BTREE, which is actually a B + tree.

2.1 fuzzy match LIKE starts with%, which will lead to index invalidation.

explain select *from customer where company like '%abc'\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

Fuzzy matching should be placed on the far right.

explain select * from customer where company like 'abc%'\G
*************************** 1. row***************************
           id: 1
  select_type:SIMPLE
        table:customer
   partitions:NULL
         type:range
possible_keys: index_company
          key:index_company
      key_len: 123
          ref:NULL
         rows: 1
     filtered:100.00
        Extra:Using index condition

2.2 index column calculation will lead to index invalidation.

explain select *from customer where age-1=20\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

You should put the operation on the right.

explain select *from customer where age=20+1\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ref
possible_keys:index_age
          key: index_age
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL

2.3 the use of functions in index columns will lead to index invalidation.

explain select *from customer where lcase(company)='abc'\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

2.4 index column type conversion will lead to index invalidation.

explain select *from customer where phone=13012345678\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys:index_phone,index_phone_name
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

You should keep the original type and avoid type conversion.

explain select *from customer where phone='13012345678'\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ref
possible_keys:index_phone,index_phone_name
          key: index_phone
      key_len: 123
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL

2.5 when the character set of index column comparison is inconsistent, the index will become invalid.

Create the supplier and supplier tables of two suppliers according to the format of the customer table above_ utf8, and the character sets are utf8mb4 and utf8 respectively. For the character set, see the previous article( Talk about the character set of MySQL).

CREATETABLE supplier (
id INTAUTO_INCREMENT,
companyVARCHAR(30),
nameVARCHAR(30),
sexenum('male','female'),
age INT,
phoneVARCHAR(30),
addressVARCHAR(60),
PRIMARYKEY (id));


CREATEINDEX index_company ON supplier(company);
CREATEINDEX index_age ON supplier(age);
CREATEINDEX index_phone ON supplier(phone);
CREATEINDEX index_phone_name ON supplier(phone,name);
CREATETABLE supplier_utf8 (
id INTAUTO_INCREMENT,
companyVARCHAR(30),
nameVARCHAR(30),
sexenum('male','female'),
age INT,
phoneVARCHAR(30),
addressVARCHAR(60),
PRIMARYKEY (id)) CHARSET=utf8;
CREATEINDEX index_company ON supplier_utf8(company);
CREATEINDEX index_age ON supplier_utf8(age);
CREATEINDEX index_phone ON supplier_utf8(phone);
CREATEINDEX index_phone_name ON supplier_utf8(phone,name);
explain select *from customer,supplier where customer.company = supplier.company\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys:index_company
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where
***************************2. row ***************************
           id: 1
  select_type: SIMPLE
        table: supplier
   partitions: NULL
         type: ref
possible_keys:index_company
          key: index_company
      key_len: 123
          ref: mydb.customer.company
         rows: 1
     filtered: 100.00
        Extra: NULL

You can see that the character set is the same. One table uses an index, and the other table goes through a full table scan.

explain select *from customer,supplier_utf8 where customer.company = supplier_utf8.company\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys:index_company
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
***************************2. row ***************************
           id: 1
  select_type: SIMPLE
        table: supplier_utf8
   partitions: NULL
         type: ALL
possible_keys:index_company
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using join buffer(hash join)

The character sets are different. The two tables are scanned in full.

This situation is quite hidden, and it is often found only after the business is online.

2.6 using or query, whether another item is an index column or not, will lead to index invalidation.

explainselect * from customer where company='abc' or address='abc'\G
explainselect * from customer where company='abc' or age=20\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys:index_company
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

However, if you have to use or to query, you can use the MRR function to sort and optimize the back table query.

2.7 combined index. If the first column index is not used, the index will become invalid.

explain select *from customer where name='abc'\G
***************************1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

The combined index does not need to be arranged in order. The index can be used below.

explain select * from customer where name ='abc' and phone='13012345678'\G
*************************** 1. row***************************
           id: 1
  select_type:SIMPLE
        table:customer
   partitions:NULL
         type: ref
possible_keys: index_phone,index_phone_name
          key:index_phone
      key_len: 123
          ref:const
         rows: 1
     filtered:100.00
        Extra:Using where

If the database expects to use full table scans faster than indexes, indexes are not used. In the latest version of MySQL, the index judgment has been greatly improved. The previous version used in,! =, < >, is null and is not null will lead to index invalidation. The latest version of the test still uses index query.

MySQL5. The second level of MRR optimization is introduced to optimize the range of MRR.

Its principle is to sort multiple secondary indexes that need to be returned to the table according to the primary key, and then return to the table together. The original random I/O during the return to the table is transformed into sequential I/O, so as to reduce the I/O overhead in the query process and reduce the frequency of data pages being replaced in the buffer pool.

MySQL5.7. The Generated Column is introduced. If you really need to calculate the index column, you can use the method of virtual column. For example, add the corresponding column when creating the customer table, and the company name is lowercase lcase_company, and full_age and index.

CREATE TABLE customer2(
id INT AUTO_INCREMENT,
company VARCHAR(30),
lcase_company VARCHAR(30)as(lcase(company)),
name VARCHAR(30),
sex enum('male','female'),
age INT,
full_age INT as(age-1),
phone VARCHAR(30),
address VARCHAR(60),
PRIMARY KEY (id));
CREATE INDEX index_lcase_company ON customer2(lcase_company);
CREATE INDEX index_full_age ON customer2(full_age);

Then directly query the Cenerated Column, which is equivalent to calculating the index of the column.

3, Usage specification of index

When using the index, we should not only pay attention to avoid index failure, but also follow certain specifications in order to use the index efficiently.

The following summarizes some normative suggestions that can be used for reference, not absolute truth.

3.1 it is recommended that the index number of a single table should not exceed 5, and the field of combined index should not exceed 3 in principle.

3.2 try not to establish an index on a field with a long string. You can set the prefix length of the index field.

3.3 select the columns with high utilization rate in query filtering, such as where, orderby and group by to build indexes.

3.4 do not create an index on a column with low discrimination, such as gender, which will not make use of the index performance.

3.5 do not build indexes on frequently updated columns. Data updates will also update indexes, affecting database performance.

3.6 when building a composite index, the one with the highest discrimination or the highest query frequency shall be placed on the far left.

3.7 make rational use of coverage index to meet query requirements, avoid back table query and reduce I/O overhead.

3.8 delete indexes that are no longer used, less used, or duplicated to reduce the cost of data update.

3.9 use explain to judge whether the query statement uses index or full table scan.

KunlunDB project is open source

[GitHub: ]
https://github.com/zettadb

[Gitee: ]
https://gitee.com/zettadb

END

Topics: MySQL