Composite data types for MySQL: ENUM and SET

Posted by iRock on Wed, 17 Jul 2019 18:18:35 +0200

Common data types for MySQL include Number/Date/String, which includes simple data types of different lengths, such as Char/Varchar/Binary/blob/text. Sometimes we need more detailed data management, such as enumerations and collections, and composite types ENUM and SET.

ENUM Enumeration Type

ENUM is suitable for scenarios where only one of a set of fixed values can be selected, for example, gender can only be male or female.
The advantages of ENUM are:

  • You can only select from fixed values and limit illegal values at the database level.
  • Data storage uses numbers to store and takes up less space.

However, there are many things we should pay attention to when using it, and you will get wrong results if you are not careful.

Enumerate types using ENUM

mysql> create table test (name varchar(40), sex enum('male', 'female') );
mysql> insert into test (name, sex) values('a', 'male'), ('b', 'female'), ('c', 'male');
mysql> select * from test;
+------+--------+
| name | sex    |
+------+--------+
| a    | male   |
| b    | female |
| c    | male   |
+------+--------+
3 rows in set (0.00 sec)

When creating an enumeration type, we use the keyword enum, followed by a list of enumerable values that must be in the string format or else an error will be reported.If the case of the inserted value does not match, it is automatically converted to an enumerated value.

The actual value of the ENUM type data store is the index value

All our enumerated values are stored according to the index values in the list of enumerated values, such as the sex field above for ENUM ('male','female'):

Literal Stored Value
NULL NULL
'' 0
'male' 1
'female' 2

So if 1,000 records are stored as male, we might think the database stores 4,000 characters, but only 1,000 characters are stored.This encoded number will be converted to the actual value when querying.

We can test it with two examples:

mysql> select * from test where sex=1;
+------+------+
| name | sex  |
+------+------+
| a    | male |
| c    | male |
+------+------+
2 rows in set (0.00 sec)

mysql> select name, sex+0 from test;
+------+-------+
| name | sex+0 |
+------+-------+
| a    |     1 |
| b    |     2 |
| c    |     1 |
+------+-------+
3 rows in set (0.00 sec)

This way of storing and querying results in functions that process numbers and also uses stored values for calculations, such as SUM() and AVG():

mysql> select name, avg(sex) from test;
+------+--------------------+
| name | avg(sex)           |
+------+--------------------+
| a    | 1.3333333333333333 |
+------+--------------------+
1 row in set (0.00 sec)

Do not use numbers when reading or writing

Because of the indexed value storage features described above, don't use the enumeration type to store columns in numeric format, as this can cause a lot of confusion, such as:

mysql> create table test2 (numbers enum('0', '1', '2'));
Query OK, 0 rows affected (0.04 sec)

# In this case, 2 is considered an index value and is therefore'1';'2'is'2';'3' is'2'because it is not a legal value and will try with an index value.
mysql> insert into test2 (numbers) values (2), ('2'), ('3');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from test2;
+---------+
| numbers |
+---------+
| 1       |
| 2       |
| 2       |
+---------+
3 rows in set (0.00 sec)

Default value for enumeration type

Even if a column is set as an enumeration type, there are two additional values that are legal: NULL and''.

When we insert an illegal value, in loose mode, a normal empty character''is inserted with a value of 0.Errors will be reported in strict mode.

When the field is set to allow null, the NULL field can be inserted normally.When null is not allowed, if you do not fill in the value, the default value is used: enumerate the first value, such as male above.

There is no appropriate way to have a column of data insert legitimate enumeration values, except in strict mode.Defaults are not sufficient in many cases.

Sorting of enumeration types

When order by is used routinely, it is sorted alphabetically.However, since enumeration types are stored as index values, they are sorted by index values: NULL <''= 0 < 1 < 2.

If you want to sort by text type, you can use:

order by cast(col as char)
//perhaps
order by concat(col)

Restrictions on Enumerated Value Declarations

When creating a data type, enumeration values are not allowed as expressions, such as:

mysql> create table test (name varchar(40), sex enum('male', concat('fem', 'ale') );
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'concat('fem', 'ale') )' at line 1

Limit on the number of enumerated values

Enumeration values are stored in 1-2 bytes, so the upper limit is 2^16-1=65535.

SET Collection Type

The SET and ENUM types are very similar and are suitable for scenarios where only zero or more of a set of fixed values can be selected, such as one or more of the colors a person likes, such as red, yellow, blue, or neither.

The advantages of SET are similar to ENUM in that:

  • You can only select from fixed values and limit illegal values at the database level.
  • Data storage uses numbers to store and takes up less space.However, this may not be true when the number of enumerated values is large and the number of characters in the enumerated values is small.

Enumerate types using SET

mysql> create table test2 (name varchar(40), color set('red', 'green', 'blue', 'yellow'));
Query OK, 0 rows affected (0.04 sec)

mysql> insert into test2(name,color) values ('a', 'red'), ('b', 'red,green'), ('c', 'green,blue,yellow');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from test2;
+------+-------------------+
| name | color             |
+------+-------------------+
| a    | red               |
| b    | red,green         |
| c    | green,blue,yellow |
+------+-------------------+
3 rows in set (0.00 sec)

When created, we use the keyword set, followed by a list of enumerable values that must be in the string format or else an error will occur.

The actual values stored in the SET type data store are the sum of the index values

All of our enumerated values are stored according to the index values in the list, except by setting the position of the binary number to 1, which is the power of 2.For example, the above SET ('red','blue','green','yellow') has a color field with all values:

enum Binary Value Decimal digits
red 0001 1
blue 0010 2
green 0100 4
yellow 1000 8

When there are multiple values, the stored values are obtained by summing all the values.As a result, fewer data is stored, and when taken out, the encoded numbers are escaped to actual strings.

We can test it with two examples:

mysql> select name,color+0 from test2;
+------+---------+
| name | color+0 |
+------+---------+
| a    |       1 |
| b    |       3 |
| c    |      14 |
+------+---------+
3 rows in set (0.00 sec)

mysql> select name,color from test2 where color=14;
+------+-------------------+
| name | color             |
+------+-------------------+
| c    | green,blue,yellow |
+------+-------------------+
1 row in set (0.00 sec)

This way of storing and querying results in functions that process numbers and also uses stored values for calculations, such as SUM() and AVG():

mysql> select avg(color) from test2;
+------------+
| avg(color) |
+------------+
|          6 |
+------------+
1 row in set (0.00 sec)

Order and number of inserts

When inserting a value, the set type does not care about the order in which you insert it and the number of inserts of an enumerated value. It automatically recapitulates, sums up, and returns the value in the declared order when it is taken out:

mysql> insert into test2(name,color) values ('d', 'yellow,green,red,yellow');
Query OK, 1 row affected (0.00 sec)

mysql> select name,color from test2;
+------+-------------------+
| name | color             |
+------+-------------------+
| d    | red,green,yellow  |
+------+-------------------+
4 rows in set (0.00 sec)

Find Collection Values

Due to the special nature of the set type, there are special lookup functions:

mysql> select * from test2 where find_in_set('red', color);
+------+------------------+
| name | color            |
+------+------------------+
| a    | red              |
| b    | red,green        |
| d    | red,green,yellow |
+------+------------------+
3 rows in set (0.00 sec)

# This method does not work correctly when the lightred color appears
mysql> select * from test2 where color like '%red%';
+------+------------------+
| name | color            |
+------+------------------+
| a    | red              |
| b    | red,green        |
| d    | red,green,yellow |
+------+------------------+
3 rows in set (0.00 sec)

Set values are computed using bitwise operations

Previously, the enumeration values are duplicated and automatically summed just for ease of understanding. In fact, bitwise operations are performed to get the final value, such as 0001 + 0100 = 0101.

So we can look for values in a similar way:

mysql> select name,color from test2 where color & 10;
+------+-------------------+
| name | color             |
+------+-------------------+
| b    | red,green         |
| c    | green,blue,yellow |
| d    | red,green,yellow  |
+------+-------------------+
3 rows in set (0.00 sec)

mysql> select name,color from test2 where color & 12;
+------+-------------------+
| name | color             |
+------+-------------------+
| c    | green,blue,yellow |
| d    | red,green,yellow  |
+------+-------------------+
2 rows in set (0.00 sec)

I can't find out what this & symbol means above, but I guess the meaning of this & symbol is a bit operation, which returns true when both numbers are 1 in one position and false if neither is 1.

Specific can be calculated as follows: a,b,c,d are 0001,0011,1110,1011, 10 is 1100, 12 is 1010, can be calculated to the above results.The results from other numbers also match, so they should match my guess.

Sorting of enumeration types

When order by is used routinely, it is sorted alphabetically.However, since enumeration types are stored as index values, they are sorted by index values: NULL < 0 < 1 < 2.

If you want to sort by text type, you can use:

order by cast(col as char)
//perhaps
order by concat(col)

Limit on the number of enumerated values

Enumeration values are stored in 1-8 bytes, so the upper limit is 8*8=64.

Reference material

  1. 11.4.4 The ENUM Type: https://dev.mysql.com/doc/ref...
  2. 11.8 Data Type Storage Requirements: https://dev.mysql.com/doc/ref...
  3. 11.4.5 The SET Type: https://dev.mysql.com/doc/ref...

Topics: MySQL Database less SQL