The mysterious problem of primary key loss caused by technology sharing | load data

Posted by wystan on Wed, 15 Dec 2021 19:37:32 +0100

Welcome to the MySQL technical articles shared by the great SQL community. If you have any questions or want to learn, you can leave a message in the comment area below. After seeing it, you will answer it

The original content of GreatSQL community cannot be used without authorization. Please contact Xiaobian and indicate the source for reprint.
1. Find problems
2. Recurrence problem
3. Check import file
4. Cause of problem
5. Dealing with problems
6. Summary

1. Find problems

In a data migration task, Xiao Ling exported the data in the source database as a CSV file, and then imported the data into MySQL through load data. She was surprised to find that the id field was lost, just like this:

mysql> select * from t2;
+----+-------+---------------------+
| id | col1  | col2                |
+----+-------+---------------------+
                 |
            || TfdESTA
              |TESTA
          |4 | TEfdfdSTA
 | 5 | TEST5 | TESfddfdsfdsfdsfTA
      |TEST6 | TESffdfdfddTA
+----+-------+---------------------+
6 rows in set (0.00 sec)

The target database version and table structure are as follows:

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.25    |
+-----------+
1 row in set (0.00 sec)


mysql> show create table t2;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                           |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `id` int NOT NULL AUTO_INCREMENT,
  `col1` varchar(69) DEFAULT NULL,
  `col2` varchar(79) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------

After Xiaoling calmed down, she verified that the primary key id was not really lost through the following statement. It seems that there is only some display error:

mysql> select * from t2 where id=1;
+----+-------+------+
| id | col1  | col2 |
+----+-------+------+
  |1 | TEST1 | TA
+----+-------+------+
1 row in set (0.00 sec)

So Xiaoling decided to check the whole operation process and find out the cause of the problem.

2. Recurrence problem

First create a table t1

mysql> CREATE TABLE `t1` (
    ->   `id` int NOT NULL AUTO_INCREMENT,
    ->   `col1` varchar(60)  DEFAULT NULL,
    ->   `col2` varchar(70) DEFAULT NULL,
    ->   PRIMARY KEY (`id`)
    -> );
Query OK, 0 rows affected (0.01 sec)

In the windows environment, prepare a new test file T1 through Notepad Txt, view t1.txt in linux environment Txt reads as follows:

great@great-PC:~/Downloads/windows$ cat t1.txt
test1,test1
test2,test2
test3,test3
test4,test4
test5,test5
test6,test6

load and view the data. Found no confusion.

mysql> load data infile "/home/great/Downloads/windows/t1.txt" \
       into table t1 FIELDS TERMINATED BY ',' (col1,col2);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from t1;
+----+-------+--------+
| id | col1  | col2   |
+----+-------+--------+
 | 1 | test1 | test1
 | 2 | test2 | test2
 | 3 | test3 | test3
 | 4 | test4 | test4
 | 5 | test5 | test5
 | 6 | test6 | test6
+----+-------+--------+
6 rows in set (0.00 sec)

But it's a little strange that the edge line on the right is missing. Insert a data manually, and then query to see the data.

mysql> insert into t1(col1,col2) values('test7','test7');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1;
+----+-------+--------+
| id | col1  | col2   |
+----+-------+--------+
 | 1 | test1 | test1
 | 2 | test2 | test2
 | 3 | test3 | test3
 | 4 | test4 | test4
 | 5 | test5 | test5
 | 6 | test6 | test6
|  7 | test7 | test7  |
+----+-------+--------+
7 rows in set (0.00 sec)

mysql> select * from t1 where id=7;
+----+-------+-------+
| id | col1  | col2  |
+----+-------+-------+
|  7 | test7 | test7 |
+----+-------+-------+
1 row in set (0.00 sec)

Parse the binlog log

...

### INSERT INTO `test`.`t1`
### SET
###   @1=6 /* INT meta=0 nullable=0 is_null=0 */
###   @2='test6' /* VARSTRING(240) meta=240 nullable=1 is_null=0 */
###   @3='test6\r' /* VARSTRING(280) meta=280 nullable=1 is_null=0 */
# at 1223
#210923 11:19:09 server id 12345  end_log_pos 1250      Xid = 271
COMMIT/*!*/;

...

### INSERT INTO `test`.`t1`
### SET
###   @1=7 /* INT meta=0 nullable=0 is_null=0 */
###   @2='test7' /* VARSTRING(240) meta=240 nullable=1 is_null=0 */
###   @3='test7' /* VARSTRING(280) meta=280 nullable=1 is_null=0 */
# at 1500
#210923 11:19:15 server id 12345  end_log_pos 1527      Xid = 272
COMMIT/*!*/;

It can be clearly seen that when inserting txt data, test6 is not test6, but test\r. The test7 we manually inserted is indeed test7.

Then check the hexadecimal value of the actually inserted data.

mysql> select id,hex(col2) from t1;
+----+--------------+
| id | hex(col2)    |
+----+--------------+
|  1 | 74657374310D |
|  2 | 74657374320D |
|  3 | 74657374330D |
|  4 | 74657374340D |
|  5 | 74657374350D |
|  6 | 74657374360D |
|  8 | 7465737437   |
+----+--------------+
7 rows in set (0.00 sec)

The data load ed in is 0D more than the normal data manually inserted.

mysql> select  hex('\r') from t1;
+-----------+
| hex('\r') |
+-----------+
| 0D        |
| 0D        |
| 0D        |
| 0D        |
| 0D        |
| 0D        |
| 0D        |
+-----------+
7 rows in set (0.00 sec)


mysql> select  unhex('0D') from t1;
+-------------+
| unhex('0D') |
+-------------+
           |
           |
           |
           |
           |
           |
           |
+-------------+
7 rows in set (0.00 sec)

It is basically obvious that it is caused by the symbols in the txt file. The reason why the result of the query at the beginning is not standardized is that the length of col2 is 5. When you import data with different lengths, you can clearly see the differences.

mysql> load data infile "/home/great/Downloads/windows/oracle_objects.txt" \
       into table t1 FIELDS TERMINATED BY ',' (col1,col2);
Query OK, 2088 rows affected (0.02 sec)
Records: 2088  Deleted: 0  Skipped: 0  Warnings: 0


mysql> select * from t1   limit 30  ;
+----+-----------------------------+------------+
| id | col1                        | col2       |
+----+-----------------------------+------------+
     | test1                       | test1
     | test2                       | test2
     | test3                       | test3
     | test4                       | test4
     | test5                       | test5
     | test6                       | test6
|  8 | test7                       | test7      |
     | A                           | TABLE
     | A1                          | TABLE
     | AAA                         | TABLE
     | ABC123                      | TABLE
     | ABCDEF                      | TABLE
     | ACTIVE_ALARMS               | TABLE
 |15 | ADDAUTH                     | PROCEDURE
 |16 | ADDROLEAUTH                 | PROCEDURE
     | AGENT_AVAIL_PRIV            | TABLE
     | AGE_STAT_FORBIDTIMELOGIN    | TABLE
     | AGE_STAT_NOTFORBIDTIMELOGIN | TABLE
     | AGE_STAT_QUERYLARGE         | TABLE
     | APP_SCHEDULE_INFO           | TABLE
 |22 | AP_CLEARMUTEXROLL           | PROCEDURE
 |23 | AP_CLEARROLEBYSTSNUM        | PROCEDURE
 |24 | AP_CLEARROLEBYTEL           | PROCEDURE
 |25 | AP_CLEARSUPERROLL           | PROCEDURE
 |26 | AP_CREATE_RDDLV_FILE        | PROCEDURE
 |27 | AP_DEALACTLOG_OPERIP        | PROCEDURE
 |28 | AP_LOC_CRMBI_MSGSEND_LOG    | PROCEDURE
 |29 | AP_MENDCLICK                | PROCEDURE
 |30 | AP_PUB_UNLOCK_VERIFYCODE    | PROCEDURE
 |31 | AP_SETDBUSERANDPASS         | PROCEDURE
+----+-----------------------------+------------+
30 rows in set (0.00 sec)

3. Check import file

Confirm that it is the problem of the imported file. Let's check the hexadecimal status of this file.

The vscode plug-in hexdump is used here, and the results are as follows

  Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000: 74 65 73 74 31 2C 74 65 73 74 31 0D 0A 74 65 73    test1,test1..tes
00000010: 74 32 2C 74 65 73 74 32 0D 0A 74 65 73 74 33 2C    t2,test2..test3,
00000020: 74 65 73 74 33 0D 0A 74 65 73 74 34 2C 74 65 73    test3..test4,tes
00000030: 74 34 0D 0A 74 65 73 74 35 2C 74 65 73 74 35 0D    t4..test5,test5.
00000040: 0A 74 65 73 74 36 2C 74 65 73 74 36 0D 0A          .test6,test6..

The observation shows that there are 0A in addition to 0D

mysql> select  hex('\n') ;
+-----------+
| hex('\n') |
+-----------+
| 0A        |
+-----------+
1 row in set (0.00 sec)

It can be seen that in the imported txt file, the end of each line is \ R \ NAS a newline.

In different systems, there are different representations of line breaks.

The following is from Wikipedia

Expression of newline characters by application software and operating system:

ASCII based or compatible character sets use LF (Line feed, U+000A) or CR (Carriage Return, u > + 000d) or CR+LF respectively; the list of Line feed character codes of each system is listed below

LF: on Unix or Unix compatible systems (GNU/Linux, AIX, Xenix, Mac OS X,...), BeOS, Amiga, RISC OS

CR+LF: DOS (MS-DOS, PC-DOS, etc.), Microsoft Windows operating system (Microsoft Windows), most non Unix systems

CR: Apple II family, Mac OS to version 9

4. Cause of problem

It can be seen from the above that in txt text, the end of each line uses \ r\n as line feed, while linux system uses \ n as line feed, so \ r is inserted into the table as a character.

\r is processed as A carriage return character in mysql, so the article header will appear and the query results are not standardized.

5. Dealing with problems

Using load_ When importing data from data, you can use lines terminated by '\r\n' to tell mysql that the whole \ r\n is used as a line break.

Let's do it again.

mysql> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                           |
+---------------+----------+--------------+------------------+---------------------------------------------+
| binlog.000013 |      192 |              |                  | b5457ec7-f50a-11eb-ac22-2cf05daaf63e:1-6182 |
+---------------+----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)

mysql> load data infile "/home/great/Downloads/windows/t1.txt" \
       into table t1 fields terminated by ','lines terminated by '\r\n'(col1,col2);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from t1;
+----+-------+-------+
| id | col1  | col2  |
+----+-------+-------+
|  1 | test1 | test1 |
|  2 | test2 | test2 |
|  3 | test3 | test3 |
|  4 | test4 | test4 |
|  5 | test5 | test5 |
|  6 | test6 | test6 |
+----+-------+-------+
6 rows in set (0.00 sec)

Parse the binary log to see the actual insertion

...
### INSERT INTO `test`.`t1`
### SET
###   @1=6 /* INT meta=0 nullable=0 is_null=0 */
###   @2='test6' /* VARSTRING(240) meta=240 nullable=1 is_null=0 */
###   @3='test6' /* VARSTRING(280) meta=280 nullable=1 is_null=0 */
# at 532
#210923 14:39:18 server id 12345  end_log_pos 559       Xid = 385
COMMIT/*!*/;
...

The value inserted is correct.

6. Summary

The whole problem is caused by the inconsistency of line breaks between linux system and windows system. When re importing, you need to consider whether there are processing differences between files in different systems to avoid unpredictable consequences.

[reference document]

1.String Literals(https://dev.mysql.com/doc/refman/8.0/en/string-literals.html)

2.LOAD DATA Statement(https://dev.mysql.com/doc/refman/8.0/en/load-data.html)

Enjoy GreatSQL :)

Article recommendation:

Technology sharing | MGR best practice https://mp.weixin.qq.com/s/66u5K7a9u8GcE2KPn4kCaA

Technology sharing | the way to repair MGR Bug in Wanli database https://mp.weixin.qq.com/s/IavpeP93haOKVBt7eO8luQ

Macos system compiler percona and the difference of some functions on Macos system https://mp.weixin.qq.com/s/jAbwicbRc1nQ0f2cIa_2nQ

Technology sharing | using systemd to manage single machine and multiple instances of MySQL https://mp.weixin.qq.com/s/iJjXwd0z1a6isUJtuAAHtQ

Product | GreatSQL to create a better MGR ecology https://mp.weixin.qq.com/s/ByAjPOwHIwEPFtwC5jA28Q

Product | GreatSQL MGR optimization reference https://mp.weixin.qq.com/s/5mL_ERRIjpdOuONian8_Ow

About GreatSQL

GreatSQL is a MySQL branch maintained by Wanli database. It focuses on improving the reliability and performance of MGR and supports the parallel query feature of InnoDB. It is a branch version of MySQL suitable for financial applications.

Gitee: https://gitee.com/GreatSQL/GreatSQL

GitHub: https://github.com/GreatSQL/GreatSQL

Wechat & QQ group:

You can search and add GreatSQL community assistant wechat friends, send verification information "add group" to join GreatSQL/MGR communication wechat group

QQ group: 533341697 wechat assistant: wanlidbc

This article is composed of blog one article multi posting platform OpenWrite release!

Topics: MySQL

Programmer Think