Read the summary of chapter 11.3.2 The CHAR and VARCHAR Types in the official MySQL 5.7 document

Posted by Andre_Ng on Wed, 11 Mar 2020 12:59:35 +0100

Author: Wei Xinping, the 5th MySQL practical class student of zhishutang, the 10th MySQL optimization class student, now works as a teaching assistant.

Maximum length of varchar(n) and char(n)

I believe that as a dba, it is often asked this question. What is the maximum length. How many bytes or how many characters. In fact, this is not very clear. A simple understanding is how many words can be put. A letter, a space, a Chinese character and a punctuation mark are all 1 in length. Then varchar(2),char(2) can put two words at most. So how much can this n maximum be set to.

First of all, we need to talk about a big premise, that is, a row of the table can only store 65535 bytes at most, which is not the meaning of how many words can be put. Because of the different character sets, the maximum number of words can be stored is also different. For example, gbk stores two characters for one Chinese character, while utf8 stores three characters for one Chinese character

The setting range of N in char(n) is 0 to 255, that is to say, it can store up to 255 words.

The setting range of N in varchar(n) is different. It does not have a fixed maximum value, which will affect the maximum value of n with the number of bytes occupied by other columns in the table and the set of characters. And varchar itself takes one or two characters to determine the length of the subsequent data bytes.

Here's an example

admin@localhost [weixinping_test]  14:15:01>create table test_char(a char(256));
ERROR 1074  (42000):  Column length too big for column 'a'  (max =  255);  use BLOB or TEXT instead
    
admin@localhost [weixinping_test]  14:15:13>create table test_char(a char(255));
Query OK,  0 rows affected (0.03 sec)

If char is used when creating a table, an error will be reported if the value of n is set to more than 255. Again, verify that the number of bytes is not limited by this 255. We insert 110 Chinese characters, because our character set is utf8mb4, that is to say, a Chinese character is 3 bytes, 110 Chinese characters is 330 bytes, which will definitely exceed the 255 limit. give the result as follows

admin@localhost [weixinping_test]  14:15:45>select length(a)  from test_char;
+-----------+
| length(a)  |
+-----------+
|  330  |
+-----------+
1 row in  set  (0.01 sec)

Maximum value for varchar.

admin@localhost [weixinping_test]  14:39:34>create table test_varchar(a varchar(65533)  not  null  )  default CHARSET latin1;
Query OK,  0 rows affected (0.02 sec)
    
admin@localhost [weixinping_test]  14:42:17>create table test_varchar1(a varchar(65533)  not  null  )  default CHARSET utf8mb4;
ERROR 1074  (42000):  Column length too big for column 'a'  (max =  16383);  use BLOB or TEXT instead
    
admin@localhost [weixinping_test]  14:52:43>create table test_varchar2(a varchar(65530)  not  null,b varchar(65530)  not  null  )  default CHARSET latin1;
ERROR 1118  (42000):  Row size too large.  The maximum row size for the used table type,  not counting BLOBs,  is  65535.  This includes storage overhead, check the manual.  You have to change some columns to TEXT or  BLOBs

Why is the number 65533 here? Because varchar needs to use two bytes (when the number of data bytes exceeds 255) or one byte (when the data is less than or equal to 255) to store the specific data length. Therefore, it needs to subtract 2. In order to remove the influence of null flag bits, it needs to be set to not null. The first table was created successfully, and the second table failed to be created. The only difference is that the last character set uses one byte for each word of latin1, while utf8mb4 is variable in length, and the longest occupied character is 4. The maximum length becomes 65533/4, which is about 16383.

The last statement to create a table failed. The number is obviously less than 65533. As mentioned before, there is a big premise for the maximum length of a row. The maximum length of a row can only be 65535 bytes, which is shared by all columns.

When we have finished the length, let's talk about the space and case.

Space problem

In the char type, the last space is ignored, whether it is stored in the database or taken out. as follows

admin@localhost [weixinping_test]  15:05:24>insert into test_char() values('ac '),('ac     '),('a  c '),('    ac ');
Query OK,  4 rows affected (0.00 sec)
Records:  4  Duplicates:  0  Warnings:  0
    
admin@localhost [weixinping_test]  15:05:58>select LENGTH(a),a from test_char;
+-----------+--------+
| LENGTH(a)  | a |
+-----------+--------+
|  2  | ac |
|  2  | ac |
|  4  | a  c |
|  6  | ac |
+-----------+--------+
4 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:06:05>

But in varchar, it will show the truth, what is what. That is to say, spaces will be stored and displayed when they are taken out.

admin@localhost [weixinping_test]  15:07:42>insert into test_varchar() values('ac '),('ac     '),('a  c '),('    ac ');
Query OK,  4 rows affected (0.00 sec)
Records:  4  Duplicates:  0  Warnings:  0
    
admin@localhost [weixinping_test]  15:07:53>select LENGTH(a),a from test_varchar;
+-----------+---------+
| LENGTH(a)  | a |
+-----------+---------+
|  3  | ac |
|  7  | ac |
|  5  | a  c |
|  7  | ac |
+-----------+---------+
4 rows in  set  (0.00 sec)

After inserting data, we will compare these fields in many cases, such as where a = hello. But when it comes to spaces, what are the results.

admin@localhost [weixinping_test]  15:12:03>select  *  from test_char where a =  'ac    ';
+------+
| a |
+------+
| ac |
| ac |
+------+
2 rows in  set  (0.01 sec)
    
admin@localhost [weixinping_test]  15:12:43>select  *  from test_char where a =  'ac';
+------+
| a |
+------+
| ac |
| ac |
+------+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:12:48>select  *  from test_char where a like 'ac %';
Empty  set  (0.00 sec)

For char, when you use = again, the spaces in the characters will be ignored, but you can't use like. Since the spaces in the middle and in the front will be stored in, this will also be affected when comparing. Therefore, neither the first nor the second shows the result of "ac". There are only two ac results.

As for varchar, according to the principle, since spaces are stored when it is stored and there will be spaces when it is taken out, that is to say, the final space should also be taken into account in the comparison. Look at the results below

admin@localhost [weixinping_test]  15:19:12>select LENGTH(a),a from test_varchar where a =  'ac';
+-----------+---------+
| LENGTH(a)  | a |
+-----------+---------+
|  3  | ac |
|  7  | ac |
+-----------+---------+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:19:34>select LENGTH(a),a from test_varchar where a =  'ac                ';
+-----------+---------+
| LENGTH(a)  | a |
+-----------+---------+
|  3  | ac |
|  7  | ac |
+-----------+---------+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:19:49>select LENGTH(a),a from test_varchar where a =  ' ac                ';
Empty  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:20:05>select LENGTH(a),a from test_varchar where a =  ' a c                ';
Empty  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:20:15>select LENGTH(a),a from test_varchar where a like 'ac                ';
Empty  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:23:03>select LENGTH(a),a from test_varchar where a like 'ac    %';
Empty  set  (0.00 sec)

Just like the performance of char, if it's a space after it, no matter how many spaces I put, the same result will appear, two ac's. But if I put spaces in the front and middle, they won't be ignored in comparison. But like always takes the last space into account.

Let's talk about the case

By default, the case in the string is not distinguished without any configuration changes.

admin@localhost [weixinping_test]  15:35:13>insert into test_char() values('a'),('A');
Query OK,  2 rows affected (0.00 sec)
Records:  2  Duplicates:  0  Warnings:  0
    
admin@localhost [weixinping_test]  15:35:20>select  *  from test_char where a =  'a'
->  ;
+------+
| a |
+------+
| a |
| A |
+------+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:35:34>select  *  from test_char where a =  'A';
+------+
| a |
+------+
| a |
| A |
+------+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:35:53>select  *  from test_char where a like 'a';
+------+
| a |
+------+
| a |
| A |
+------+
2 rows in  set  (0.01 sec)
    
admin@localhost [weixinping_test]  15:41:18>insert into test_varchar(a) values('a'),('A');
Query OK,  2 rows affected (0.00 sec)
Records:  2  Duplicates:  0  Warnings:  0
    
admin@localhost [weixinping_test]  15:42:01>select  *  from test_varchar where a =  'a';
+---+
| a |
+---+
| a |
| A |
+---+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:42:16>select  *  from test_varchar where a =  'A';
+---+
| a |
+---+
| a |
| A |
+---+
2 rows in  set  (0.00 sec)
    
admin@localhost [weixinping_test]  15:42:20>select  *  from test_varchar where a like 'A';
+---+
| a |
+---+
| a |
| A |
+---+
2 rows in  set  (0.00 sec)

It can be seen that there is case sensitivity in saving, but when fetching, it is not case sensitive, and results can be obtained. Even like has the same result. The result of varchar is the same as that of char.

So what if I have to be case sensitive. There are two ways, one is to modify the table structure, the other is to modify the sql statement. If you want to modify the table structure, you should take the COLLATE after the modified character set. Let's talk about this next article. Let's talk about a simpler way. Use the binary keyword directly. as follows

admin@localhost [weixinping_test]  15:50:49>select  *  from test_varchar where binary a =  'A';
+---+
| a |
+---+
| A |
+---+
1 row in  set  (0.00 sec)

Finally, let's summarize

  1. The maximum length of char is 255 (the length is character or how many words, not bytes).

  2. The maximum length of varchar is variable. Because the maximum length of line is 65535 bytes, the change of character set will be different.

  3. The space at the end of the string will not be reserved in char, and it will be reserved in varchar, but the space after the condition will be automatically ignored when using = matching. But like will consider the following spaces.

  4. By default, string configuration is case insensitive. The easiest way to modify it is to use the binary keyword when using sql.

Topics: Database MySQL less SQL