mysql character set -- mysql from introduction to mastery (3)

Posted by cronus on Wed, 29 Dec 2021 02:38:59 +0100

Because the character set is related to the comparison rule, the comparison rule will be automatically transformed into the corresponding one after modifying the character set. On the contrary, the character set will also change after modifying the comparison rule.

mysql> SHOW VARIABLES LIKE 'character_set_server';
+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| character_set_server | latin1 |
+----------------------+--------+
1 row in set (0.03 sec)

mysql> set character_set_server = 'utf8mb4';
Query OK, 0 rows affected (0.01 sec)

mysql> SHOW VARIABLES LIKE 'character_set_server';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'collation_server';
+------------------+--------------------+
| Variable_name    | Value              |
+------------------+--------------------+
| collation_server | utf8mb4_general_ci |
+------------------+--------------------+
1 row in set (0.00 sec)

As can be seen from the above, the character set of our server is latin1, that is, the ISO 8859-1 character set. After modification with set, check again that the displayed character set is utf8mb4 character set, and the comparison rules have been modified accordingly.

For example, we add two Chinese characters "two" to the field of a gbk character set, where "two" represents 4 bytes in the gbk character set. If the current line is a utf8mb4 character set, it will occupy 6 bytes.

How the garbled code comes from: different character sets are converted to each other, resulting in coding errors and garbled code.

The concept of character set conversion: if you accept a byte string, first decode it into a string with utf8, and then encode it into a byte string with gbk. In the display according to gbk, we call this character set conversion.

How does the Mysql client server convert the character set?

We know that the client sent to the server is essentially a string, and the server returned to the client is also a string. In this process, we have processed many character set conversions, which is not a character set. This process will have three important system variables:

  1. character_set_client: the character set used by the server to decode the request.
  2. character_Set_connection: when the server processes the request, it will delete the requested string from character_ Set_ Convert client to character_Set_connection.
  3. character_Set_Result: the character set returned by the server to the client.
mysql> SHOW VARIABLES LIKE 'character_set_client';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| character_set_client | utf8  |
+----------------------+-------+
1 row in set (0.01 sec)

mysql>  SHOW VARIABLES LIKE 'character_set_results';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| character_set_results | utf8  |
+-----------------------+-------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'character_set_results';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| character_set_results | utf8  |
+-----------------------+-------+
1 row in set (0.00 sec)

From the results, we can see that the character set of the string request sent by the client is utf8, and the character set processed and returned to the client is also utf8. The next article focuses on how to convert in detail.

Topics: Java MySQL Back-end