When using MySQL, please use JSON as your trump card

Posted by Jenling on Sun, 02 Jan 2022 21:27:21 +0100

Relational structured storage has some disadvantages, because it needs to define all columns and their corresponding types in advance. However, in the process of business development, it may be necessary to expand the description function of a single column. At this time, if the JSON data type can be used well, the boundary between the storage of relational and non relational data can be opened, and a better architecture choice can be provided for the business.

Of course, many students will encounter various problems when using JSON data types, and the easiest mistake is to simply understand JSON type as string type. But after reading this article, you will really realize the power of JSON data types, so as to better store unstructured data in practical work.

JSON data type

JSON (JavaScript Object Notation) is mainly used for data exchange between Internet application services. MySQL supports the JSON specification defined in RFC 7159, which mainly includes two types: JSON object and JSON array. The following is the JSON object, which is mainly used to store information about pictures:

{
 "Image": {
   "Width": 800,
   "Height": 600,
   "Title": "View from 15th Floor",
   "Thumbnail": {
     "Url": "http://www.example.com/image/481989943",
     "Height": 125,
     "Width": 100
   },
 "IDs": [116, 943, 234, 38793]
 }
}

As you can see, JSON type can well describe the relevant contents of data, such as the width, height, title, etc. of the picture (the types used here are integer and string types).

In addition to string, integer and date types, JSON objects also support array types for fields embedded in JSON, such as the IDs field in the above code.

Another JSON data type is array type, such as:

[
   {
     "precision": "zip",
     "Latitude": 37.7668,
     "Longitude": -122.3959,
     "Address": "",
     "City": "SAN FRANCISCO",
     "State": "CA",
     "Zip": "94107",
     "Country": "US"
   },
   {
     "precision": "zip",
     "Latitude": 37.371991,
     "Longitude": -122.026020,
     "Address": "",
     "City": "SUNNYVALE",
     "State": "CA",
     "Zip": "94085",
     "Country": "US"
   }
 ]

The above example demonstrates a JSON array with two JSON objects.

So far, many students may regard JSON as a large field string type. On the surface, there is nothing wrong. But in essence, JSON is a new type with its own storage format. It can also create an index on each corresponding field and make specific optimization, which can not be realized by traditional field strings. Another advantage of JSON type is that there is no need to predefine fields, and fields can be extended indefinitely. The columns of traditional relational databases need to be defined in advance. If you want to expand, you need to execute alter table ADD COLUMN ... This is a heavy operation.

It should be noted that JSON type is a function supported from MySQL version 5.7, and version 8.0 solves the log performance bottleneck of updating JSON. If you want to use JSON data types in a production environment, MySQL version 8.0 is highly recommended.

At this point, you have understood the basic concepts of JSON types. Next, let's enter the practical link: how to make good use of JSON types in business?

Business table structure design practice

User login design

In the database, JSON type is more suitable for storing some relatively static data with few modifications. For example, the storage of user login information is as follows:

DROP TABLE IF EXISTS UserLogin;

CREATE TABLE UserLogin (
    userId BIGINT NOT NULL,
    loginInfo JSON,
    PRIMARY KEY(userId)
);

As the current business login methods are becoming more and more diversified, just like the same account supports mobile phone, wechat and QQ account login, JSON type can be used to store login information here.

Next, insert the following data:

SET @a = '
{
   "cellphone" : "13918888888",
   "wxchat" : "Code farmer",
   "QQ" : "82946772"
}
';

INSERT INTO UserLogin VALUES (1,@a);

SET @b = '
{  
  "cellphone" : "15026888888"
}
';

INSERT INTO UserLogin VALUES (2,@b);

As can be seen from the above example, there are three ways for user 1 to log in: Mobile authentication code login, wechat login and QQ login, while user 2 only logs in with mobile authentication code.

If the JSON data type is not used, the table should be created in the following way:

SELECT
    userId,
    JSON_UNQUOTE(JSON_EXTRACT(loginInfo,"$.cellphone")) cellphone,
    JSON_UNQUOTE(JSON_EXTRACT(loginInfo,"$.wxchat")) wxchat
FROM UserLogin;
+--------+-------------+--------------+
| userId | cellphone   | wxchat       |
+--------+-------------+--------------+
|      1 | 13918888888 | Code farmer     |
|      2 | 15026888888 | NULL         |
+--------+-------------+--------------+
2 rows in set (0.01 sec)

Of course, every time you write JSON_EXTRACT,JSON_UNQUOTE is very troublesome. MySQL also provides - > > expression, which is exactly the same as the above SQL effect:

SELECT 
    userId,
    loginInfo->>"$.cellphone" cellphone,
    loginInfo->>"$.wxchat" wxchat
FROM UserLogin;

When the amount of JSON data is very large and users want to effectively retrieve JSON data, they can index a field in JSON by using MySQL's {function index}.

For example, in the above user login example, assuming that the user must bind a unique mobile phone number and wants to use the mobile phone number for user retrieval in the future, the following index can be created:

ALTER TABLE UserLogin ADD COLUMN cellphone VARCHAR(255) AS (loginInfo->>"$.cellphone");

ALTER TABLE UserLogin ADD UNIQUE INDEX idx_cellphone(cellphone);

The above SQL first creates a virtual column cellphone, which is calculated by the function logininfo - > "$. Cellphone". Then create a unique index idx on this virtual column_ cellphone. At this time, query through the virtual column cellphone, and you can see that the optimizer will use the newly created idx_cellphone index:

EXPLAIN SELECT  *  FROM UserLogin 
WHERE cellphone = '13918888888'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserLogin
   partitions: NULL
         type: const
possible_keys: idx_cellphone
          key: idx_cellphone
      key_len: 1023
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

Of course, we can complete the creation of virtual columns and functional indexes at the beginning of creating tables. The column cellphone created in the following table corresponds to the content in JSON and is a virtual column; uk_ idx_ Cell phone is the index created on the virtual column cell phone.

CREATE TABLE UserLogin (
    userId BIGINT,
    loginInfo JSON,
    cellphone VARCHAR(255) AS (loginInfo->>"$.cellphone"),
    PRIMARY KEY(userId),
    UNIQUE KEY uk_idx_cellphone(cellphone)
);

User Portrait Design

Some businesses need to make user portraits (that is, label users), and then make corresponding product recommendations according to users' labels through data mining technology. For example:

  • In the e-commerce industry, recommend corresponding products according to users' wearing preferences;

  • In the music industry, recommend corresponding songs according to the music style users like and the singers they often listen to;

  • In the financial industry, recommend corresponding financial products according to users' risk preferences and investment experience.

Here, I strongly recommend that you use JSON type to store user portrait information in the database, and conduct efficient query in combination with the characteristics of JSON array type and multi value index. Suppose there is a picture definition table:

CREATE TABLE Tags (
    tagId bigint auto_increment,
    tagName varchar(255) NOT NULL,
    primary key(tagId)
);

SELECT * FROM Tags;
+-------+--------------+
| tagId | tagName      |
+-------+--------------+
|     1 | 70 after         |
|     2 | 80 after         |
|     3 | 90 after         |
|     4 | 00 after         |
|     5 | go in for sports       |
|     6 | highly educated       |
|     7 | Petty bourgeoisie         |
|     8 | Have a room         |
|     9 | Have a car         |
|    10 | I often watch movies     |
|    11 | Love online shopping       |
|    12 | Love takeout       |
+-------+--------------+

You can see that the Tags table is a portrait definition table, which is used to describe how many Tags are currently defined, and then tag each user, such as user David. His Tags are post-80s, highly educated, petty bourgeoisie, have a house and often watch movies; User Tom, post-90s, often watches movies and loves takeout.

If the JSON data type is not used for label storage, the user label is usually accessed in one field by means of string and delimiter:

+-------+---------------------------------------+
|user    |label                                   |
+-------+---------------------------------------+
|David  |80 after ; highly educated ; Petty bourgeoisie ; Have a room ;I often watch movies   |
|Tom    |90 after ;I often watch movies ; Love takeout                 |
+-------+---------------------------------------

The disadvantage of this is that it is not easy to search for users of specific portraits. In addition, the separator is also a self convention. In fact, other data can be stored arbitrarily in the database, resulting in dirty data.

This problem can be solved by using JSON data types:

DROP TABLE IF EXISTS UserTag;
CREATE TABLE UserTag (
    userId bigint NOT NULL,
    userTags JSON,
    PRIMARY KEY (userId)
);

INSERT INTO UserTag VALUES (1,'[2,6,8,10]');
INSERT INTO UserTag VALUES (2,'[3,10,12]');

Among them, the Tags stored by userTags are those tag values defined in table Tags, which are only stored by JSON array type.

MySQL version 8.0.17 supports multi valued indexes, which is used to create indexes on JSON arrays through the functions member of and json_contains,json_overlaps to quickly retrieve index data. So you can create multi valued indexes on the UserTag table:

ALTER TABLE UserTag
ADD INDEX idx_user_tags ((cast((userTags->"$") as unsigned array)));

If you want to query that the user portrait is a user who often watches movies, you can use the function MEMBER OF:

EXPLAIN SELECT * FROM UserTag 
WHERE 10 MEMBER OF(userTags->"$")\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserTag
   partitions: NULL
         type: ref
possible_keys: idx_user_tags
          key: idx_user_tags
      key_len: 9
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 
WHERE 10 MEMBER OF(userTags->"$");
+--------+---------------+
| userId | userTags      |
+--------+---------------+
|      1 | [2, 6, 8, 10] |
|      2 | [3, 10, 12]   |
+--------+---------------+
2 rows in set (0.00 sec)

If you want to query the post-80s users who often watch movies, you can use the function JSON_CONTAINS:

EXPLAIN SELECT * FROM UserTag 
WHERE JSON_CONTAINS(userTags->"$", '[2,10]')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserTag
   partitions: NULL
         type: range
possible_keys: idx_user_tags
          key: idx_user_tags
      key_len: 9
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 
WHERE JSON_CONTAINS(userTags->"$", '[2,10]');
+--------+---------------+
| userId | userTags      |
+--------+---------------+
|      1 | [2, 6, 8, 10] |
+--------+---------------+
1 row in set (0.00 sec)

If you want to query users with post-80s and post-90s portraits who often watch movies, you can use the function JSON_OVERLAP:

EXPLAIN SELECT * FROM UserTag 
WHERE JSON_OVERLAPS(userTags->"$", '[2,3,10]')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: UserTag
   partitions: NULL
         type: range
possible_keys: idx_user_tags
          key: idx_user_tags
      key_len: 9
          ref: NULL
         rows: 4
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 
WHERE JSON_OVERLAPS(userTags->"$", '[2,3,10]');
+--------+---------------+
| userId | userTags      |
+--------+---------------+
|      1 | [2, 6, 8, 10] |
|      2 | [3, 10, 12]   |
+--------+---------------+
2 rows in set (0.01 sec)

summary

JSON type is a new data type in MySQL version 5.7. Making good use of JSON data type can effectively solve many practical problems in business. Finally, let me summarize today's highlights:

  • Using JSON data type, MySQL version above 8.0.17 is recommended, which has better performance and supports multi valued indexes;

  • The advantage of JSON data type is that there is no need to define columns in advance, and the data itself is very descriptive;

  • Do not use JSON to store data with obvious relationship, such as user balance, user name, user ID card, etc., which must be included by each user;

  • JSON data types are recommended for static data stores that are not updated frequently.

I hope it can help you!

For those who need to get free materials, add a little assistant vx: soxwv # to get materials for free!

Topics: Java Back-end Programmer architecture