Use MySQL, use JSON card well

Posted by escabiz on Fri, 14 Jan 2022 23:13:59 +0100

Click "Lifetime", Focus, Top Public Number

Daily Technical Dry, First Time Delivery!

 
 

Relational structured storage has some drawbacks because it requires all columns and their corresponding types to be predefined. However, the business may need to expand the descriptive capabilities of individual columns as it evolves, and when JSON data types are used well, the boundaries between relational and non-relational data storage can be bridged, providing better architecture options for the business.

Of course, many students will encounter a variety of problems when using JSON data types, the most common mistake is to simply understand the type JSON as a string type. But when you finish this article, you'll really realize the power of JSON data types to better store unstructured data in practice.

JSON data type

JSON (JavaScript Object Notation) is mainly used for data exchange between Internet application services. MySQL supports the JSON specification defined by RFC 7159, mainly of two types: JSON object and JSON array. The following are JSON objects, which are mainly used to store information about pictures:

{
 "Image": {
   "Width": 800,
   "Height": 600,
   "Title": "View from 15th Floor",
   "Thumbnail": {
     "Url": "http://www.example.com/image/481989943",
     "Height": 125,
     "Width": 100
   },
 "IDs": [116, 943, 234, 38793]
 }
}

As you can see, the JSON type is a good description of what the data is about, such as the width, height, title, and so on (the types used here are integer, string).

In addition to string, integer, and date types, JSON-embedded fields also support array types, such as the IDs field in the code above.

Another JSON data type is an array type, such as:

[
   {
     "precision": "zip",
     "Latitude": 37.7668,
     "Longitude": -122.3959,
     "Address": "",
     "City": "SAN FRANCISCO",
     "State": "CA",
     "Zip": "94107",
     "Country": "US"
   },
   {
     "precision": "zip",
     "Latitude": 37.371991,
     "Longitude": -122.026020,
     "Address": "",
     "City": "SUNNYVALE",
     "State": "CA",
     "Zip": "94085",
     "Country": "US"
   }
 ]

The example above demonstrates a JSON array with two JSON objects.

So far, many students may think of JSON as a very large field string type, which is superficially correct. However, in essence, JSON is a new type with its own storage format and the ability to create indexes on each corresponding field for specific optimization, which is not possible with traditional field strings. Another benefit of the JSON type is that fields can be expanded indefinitely without having to predefine them. Columns in traditional relational databases need to be predefined, and ALTER TABLE needs to be executed in order to expand. ADD COLUMN... This is a heavy operation.

It is important to note that JSON types are supported from MySQL version 5.7, while version 8.0 solves the log performance bottleneck for updating JSON. If you are using JSON data types in a production environment, MySQL version 8.0 is highly recommended.

Now that you know the basic concepts of JSON types, let's move on to the real world: how do you use JSON types in your business?

Business table structure design practice

User Login Design

In databases, JSON types are better suited for storing relatively static data with fewer modifications, such as the following for user login information:

DROP TABLE IF EXISTS UserLogin;

CREATE TABLE UserLogin (
    userId BIGINT NOT NULL,
    loginInfo JSON,
    PRIMARY KEY(userId)
);

As the login methods of the current business are more and more diverse, such as one account supporting mobile phone, WeChat, QQ account login, so you can use JSON type to store login information here.

Next, insert the following data:

SET @a = '
{
   "cellphone" : "13918888888",
   "wxchat" : "Bankruptcy Codon",
   "QQ" : "82946772"
}
';

INSERT INTO UserLogin VALUES (1,@a);

SET @b = '
{  
  "cellphone" : "15026888888"
}
';

INSERT INTO UserLogin VALUES (2,@b);

From the example above, you can see that there are three ways for user 1 to log in: mobile authentication code login, WeChat login, QQ login, while user 2 only logs in with mobile authentication code.

If you don't use the JSON data type, you will build the table as follows:

SELECT
    userId,
    JSON_UNQUOTE(JSON_EXTRACT(loginInfo,"$.cellphone")) cellphone,
    JSON_UNQUOTE(JSON_EXTRACT(loginInfo,"$.wxchat")) wxchat
FROM UserLogin;
+--------+-------------+--------------+
| userId | cellphone   | wxchat       |
+--------+-------------+--------------+
|      1 | 13918888888 | Bankruptcy Codon     |
|      2 | 15026888888 | NULL         |
+--------+-------------+--------------+
2 rows in set (0.01 sec)

Of course, every time you write JSON_EXTRACT, JSON_UNQUOTE is cumbersome, and MySQL also provides the -> expression, which works exactly like the above SQL:

SELECT 
    userId,
    loginInfo->>"$.cellphone" cellphone,
    loginInfo->>"$.wxchat" wxchat
FROM UserLogin;

When JSON data is very large and users want to retrieve JSON data effectively, they can use MySQL's function index to index a field in JSON.

For example, in the user login example above, if the user must bind a unique mobile number and want to be able to retrieve the user with the mobile number in the future, you can create the following index:

ALTER TABLE UserLogin ADD COLUMN cellphone VARCHAR(255) AS (loginInfo->>"$.cellphone");

ALTER TABLE UserLogin ADD UNIQUE INDEX idx_cellphone(cellphone);

The above SQL first creates a virtual column, cellphone, which is computed from the function loginInfo - > "$.cellphone". Then create a unique index idx_on this virtual column Cellphone. By querying through the virtual column cellphone, you can see that the optimizer will use the newly created idx_cellphone index:

EXPLAIN SELECT  *  FROM UserLogin 
WHERE cellphone = '13918888888'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserLogin
   partitions: NULL
         type: const
possible_keys: idx_cellphone
          key: idx_cellphone
      key_len: 1023
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

Of course, we can create virtual columns and function indexes as soon as we start creating tables. The column cellphone created in the following table corresponds to the content in JSON and is a virtual column. uk_idx_cellphone is the index created on the virtual column cellphone.

CREATE TABLE UserLogin (
    userId BIGINT,
    loginInfo JSON,
    cellphone VARCHAR(255) AS (loginInfo->>"$.cellphone"),
    PRIMARY KEY(userId),
    UNIQUE KEY uk_idx_cellphone(cellphone)
);

User Portrait Design

Some businesses need to make user portraits (that is, label users) and then make corresponding product recommendations through data mining technology based on users'labels. For example:

  • In the electronics industry, recommend the corresponding goods according to the user's clothing preferences;

  • In the music industry, the corresponding songs are recommended according to the user's favorite music style and popular singers.

  • In the financial industry, appropriate financial products are recommended according to the user's risk preferences and investment experience.

In this regard, I strongly recommend that you use the JSON type to store user portrait information in the database, and combine the features of JSON array types and multivalued indexes for efficient queries. Suppose you have a table of portrait definitions:

CREATE TABLE Tags (
    tagId bigint auto_increment,
    tagName varchar(255) NOT NULL,
    primary key(tagId)
);

SELECT * FROM Tags;
+-------+--------------+
| tagId | tagName      |
+-------+--------------+
|     1 | 70 after         |
|     2 | 80 after         |
|     3 | 90 after         |
|     4 | 00 after         |
|     5 | go in for sports       |
|     6 | Highly educated       |
|     7 | Small capitals         |
|     8 | Have a room         |
|     9 | Car         |
|    10 | Frequent movies     |
|    11 | I Wanna Go       |
|    12 | Love takeaway       |
+-------+--------------+

You can see that Tags is a portrait definition table that describes how many tags are currently defined, and then tags each user, such as user David, whose tag is post-80s, highly educated, small-capitalized, owning a house, often watching movies; User Tom, after 90, often watches movies and likes takeaway.

Without tag storage using JSON data types, user tags are usually accessed in one field through a string with a splitter:

+-------+---------------------------------------+
|user    |Label                                   |
+-------+---------------------------------------+
|David  |80 after ; Highly educated ; Small capitals ; Have a room ;Frequent movies   |
|Tom    |90 after ;Frequent movies ; Love takeaway                 |
+-------+---------------------------------------

The disadvantage of this is that it is not good for users to search for specific portraits, and the separator is also a self-convention that allows other data to be stored arbitrarily in the database, resulting in dirty data.

The JSON data type is a good solution to this problem:

DROP TABLE IF EXISTS UserTag;
CREATE TABLE UserTag (
    userId bigint NOT NULL,
    userTags JSON,
    PRIMARY KEY (userId)
);

INSERT INTO UserTag VALUES (1,'[2,6,8,10]');
INSERT INTO UserTag VALUES (2,'[3,10,12]');

The tags stored by userTags are those tag values defined by the table Tags, but are stored using the JSON array type.

MySQL version 8.0.17 started to support Multi-Valued Indexes for creating indexes on JSON arrays and through functions member of, json_contains, json_overlaps to quickly retrieve index data. So you can create Multi-Valued Indexes on the table UserTag:

ALTER TABLE UserTag
ADD INDEX idx_user_tags ((cast((userTags->"$") as unsigned array)));

If you want to query a user whose portrait is a frequent movie watcher, you can use the function MEMBER OF:

EXPLAIN SELECT * FROM UserTag 
WHERE 10 MEMBER OF(userTags->"$")\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserTag
   partitions: NULL
         type: ref
possible_keys: idx_user_tags
          key: idx_user_tags
      key_len: 9
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 
WHERE 10 MEMBER OF(userTags->"$");
+--------+---------------+
| userId | userTags      |
+--------+---------------+
|      1 | [2, 6, 8, 10] |
|      2 | [3, 10, 12]   |
+--------+---------------+
2 rows in set (0.00 sec)

If you want to query people who have portraits of post-80s and watch movies frequently, you can use the function JSON_CONTAINS:

EXPLAIN SELECT * FROM UserTag 
WHERE JSON_CONTAINS(userTags->"$", '[2,10]')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserTag
   partitions: NULL
         type: range
possible_keys: idx_user_tags
          key: idx_user_tags
      key_len: 9
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 
WHERE JSON_CONTAINS(userTags->"$", '[2,10]');
+--------+---------------+
| userId | userTags      |
+--------+---------------+
|      1 | [2, 6, 8, 10] |
+--------+---------------+
1 row in set (0.00 sec)

If you want to query people who have portraits of post-80s, post-90s and often watch movies, you can use the function JSON_OVERLAP:

EXPLAIN SELECT * FROM UserTag 
WHERE JSON_OVERLAPS(userTags->"$", '[2,3,10]')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserTag
   partitions: NULL
         type: range
possible_keys: idx_user_tags
          key: idx_user_tags
      key_len: 9
          ref: NULL
         rows: 4
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 
WHERE JSON_OVERLAPS(userTags->"$", '[2,3,10]');
+--------+---------------+
| userId | userTags      |
+--------+---------------+
|      1 | [2, 6, 8, 10] |
|      2 | [3, 10, 12]   |
+--------+---------------+
2 rows in set (0.01 sec)

summary

JSON type is a new data type added to MySQL version 5.7. Good use of JSON data type can effectively solve many practical problems in business. Finally, let me summarize today's highlights:

  • With JSON data types, MySQL version 8.0.17 or higher is recommended for better performance, while supporting Multi-Valued Indexes.

  • The advantage of JSON data types is that there is no need to predefine columns, and the data itself is descriptive.

  • Do not store significantly relational data in JSON, such as user balance, user name, user ID, etc., which is data that each user must contain;

  • JSON data types recommend static data storage that is not updated frequently

Topics: Database MySQL JSON