MySQL8.0.17 - Introduction to Multi-Valued Indexes

Posted by willwill100 on Fri, 16 Aug 2019 04:48:29 +0200

This paper briefly introduces the new function multi-value index introduced in 8.0.17 below. As the name implies, the index can create multiple secondary index items on the same Primary key. In fact, it already supports the basic function of array type (feel that the official future will introduce array column type like pg), and based on arrayTo build a secondary index, which means that the secondary index can have more records than the clustered index records, so the index cannot be used for queries of general meaning, but only through specific interface functions, as illustrated in the following example.

This article does not do a thorough understanding of the code, only records the relevant entry functions, so that you can quickly review them when you encounter future work.A connection to the worklog is attached at the end, and interested friends can read the worklog directly to find out how it works.

Example

Extracted from official documents

root@test 04:08:50>show create table customers\G                                                                                                                                   *************************** 1. row ***************************
       Table: customers
Create Table: CREATE TABLE `customers` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `custinfo` json DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `zips` ((cast(json_extract(`custinfo`,_latin1'$.zip') as unsigned array)))
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

root@test 04:08:53>select * from customers;
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  1 | 2019-08-14 16:08:50 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]}        |
|  2 | 2019-08-14 16:08:50 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  3 | 2019-08-14 16:08:50 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94536]}         |
|  4 | 2019-08-14 16:08:50 | {"user": "Mary", "user_id": 72, "zipcode": [94536]}               |
|  5 | 2019-08-14 16:08:50 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
5 rows in set (0.00 sec)

The index is available through three functions, member of, json_contains, json_overlaps

root@test 04:09:00>SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  2 | 2019-08-14 16:08:50 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  5 | 2019-08-14 16:08:50 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
2 rows in set (0.00 sec)

root@test 04:09:41>SELECT * FROM customers  WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  2 | 2019-08-14 16:08:50 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  5 | 2019-08-14 16:08:50 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
2 rows in set (0.00 sec)

root@test 04:09:54>SELECT * FROM customers   WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  1 | 2019-08-14 16:08:50 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]}        |
|  2 | 2019-08-14 16:08:50 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  5 | 2019-08-14 16:08:50 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
3 rows in set (0.00 sec)

Application program interface

multi-value index is an implementation of functional index. The definition of a column is a virtual column whose value is an array taken from json column
If the same value exists on the array, only one is stored on the index.Supported types: DECIMAL, INTEGER, DATETIME,VARCHAR/CHAR.There can only be one multi-value column on the index.

The following is a brief description of the interface functions

Array maximum capacity:
Entry function: ha_innobase::mv_key_capacity

Insert record:
Entry function row_ins_sec_index_multi_value_entry
Build the tuple by classing Multi_value_entry_builder_insert, then call the normal interface function row_ins_sec_index_entry to insert into the secondary index.
The parsed, sorted and unweighted data is stored in struct multi_value_data, and the pointer is in dfield_t::data. The multi_value_data structure is also a memory representation of the multi-value concrete value

Delete records:
Entry function: row_upd_del_multi_sec_index_entry
Build tuple based on class Multi_value_entry_builder_normal and delete it from index in turn

Update Records
Entry function: row_upd_multi_sec_index_entry
Since not all secondary index records may need to be updated, you need to calculate a diff to find out which records you want to update calc_row_difference--> innobase_get_multi_value_and_diff and set a bitmap that needs to be updated

rollback
Correlation function:

row_undo_ins_remove_multi_sec
row_undo_mod_upd_del_multi_sec
row_undo_mod_del_mark_multi_sec

When rolling back, get the value of the multi-value column from the undo log by trx_undo_rec_get_multi_value, and build and store it in the field data by interface Multi_value_logger::read

Record undo log
Function: trx_undo_store_multi_value

Store multi-value information in Undo log through Multi_value_logger::log.'Multi_value_logger'is an auxiliary class that records the value of multi-value column and how to read it out

purge secondary index record
Entry function:

row_purge_del_mark
row_purge_upd_exist_or_extern_func
    |--> row_purge_remove_multi_sec_if_poss

Reference Documents

WL#10604: Create multi-value index
WL#8763: support multi-value functional index for InnoDB
WL#8955: Add support for multi-valued indexes
Official Documents

Topics: Database JSON