Recently, at 05:00 a.m., I always get warnings from the SQL firewall:
|BASE||1059712468||xx||xx.xx.xx.xx||aea||Query||34||UPDATE approve SET operator = 0', operator_name ='system', comment ='departure', status ='1'WHERE (id = 48311')||updating|
When I first saw this data, the first reaction was probably the affected SQL. I had no time to pay attention to it, but after several early morning warnings, I had to analyze it in depth.
- Symptom characteristics
- Primary key update
- The status is updating
- Execution time is more than 30 seconds
- command is query
All this seems particularly normal.
- MySQL version
mysql Ver 14.14 Distrib 5.6.16, for linux-glibc2.5 (x86_64) using EditLine wrapper
- Table structure
CREATE TABLE `approve` ( `id` int(11) NOT NULL AUTO_INCREMENT, `reim_id` int(11) NOT NULL DEFAULT '0' , `user_name` varchar(20) NOT NULL DEFAULT '', `user_ids` varchar(100) NOT NULL DEFAULT '', `user_email` text COMMENT 'Be used for mail', `status` tinyint(1) NOT NULL DEFAULT '0' , `stagesub` smallint(3) NOT NULL DEFAULT '0' , `stage` smallint(3) NOT NULL DEFAULT '0' , `flag` tinyint(1) NOT NULL DEFAULT '0' , `operator` int(11) NOT NULL DEFAULT '0' , `operator_name` varchar(20) NOT NULL DEFAULT '', `comment` text, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `cs_userid` int(11) NOT NULL DEFAULT '0' , `cs_status` tinyint(4) NOT NULL DEFAULT '0' , `is_deficit` tinyint(1) NOT NULL DEFAULT '1' , `approve_type` tinyint(4) NOT NULL DEFAULT '1', PRIMARY KEY (`id`), KEY `list` (`user_ids`,`status`), KEY `next` (`flag`,`status`), KEY `detail` (`reim_id`), KEY `ix_userid` (`cs_userid`) ) ENGINE=InnoDB AUTO_INCREMENT=464885 DEFAULT CHARSET=utf8
III. Analytical process
- Analysis of SQL statement itself
1. This statement is normal, but also the primary key update, execution plan everything is normal. 2. State in show process list, command=Query, state=updating
- Manual execution
No problem, it's done in an instant.
dba> UPDATE approve SET `operator` = '0',`operator_name` = 'system',`comment` = 'Quit',`status` = '1' WHERE (`id` = '49384'); Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
- Possible causes of the problem
1. Will the problem of splicing SQL statements cause full-table scanning updates in the early morning when SQL has special characters? 1.1 In order to solve this problem, we simulate N times and print out all special characters. This problem is eliminated. 2. Server pressure problem, is it possible that in the early morning io, cpu pressure is particularly high, resulting in slow updating? 2.1 Looking at the monitoring at that time, all indicators are normal, so it is also excluded. 3. The database itself, MySQL has a bug? 3.1 No bug information has been found in this area. 4. Lock problem, the SQL statement was locked at that time? 4.1 No lock s are seen in the show process list.
- Lock-related exclusion
1. Initially, all troubleshooting came from the monitoring system and show processlist，Then check that the lock artifact is not in use, that is show engine innodb status \G ---TRANSACTION 51055827249, ACTIVE 20 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s) MySQL thread id 1060068541, OS thread handle 0x7fba06c6c700, query id 55990809665 xx aea updating UPDATE approve SET `operator` = '0',`operator_name` = 'system',`comment` = 'Quit',`status` = '1' WHERE (`id` = '49384') ------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 746 page no 624 n bits 216 index `PRIMARY` of table `aea`.`approve` trx id 51055827249 lock_mode X locks rec but not gap waiting Record lock, heap no 148 PHYSICAL RECORD: n_fields 19; compact format; info bits 0 0: len 4; hex 8000c0e8; asc ;; 1: len 6; hex 000be32a10cb; asc * ;; 2: len 7; hex 7a000004540557; asc z T W;; 3: len 4; hex 80002884; asc ( ;; 4: len 6; hex e69da8e58b87; asc ;; 5: len 6; hex 3b363430353b; asc ;6405;;; 6: len 19; hex 7979616e6740616e6a756b65696e632e636f6d; asc yy.com;; 7: len 1; hex 81; asc ;; 8: len 2; hex 8015; asc ;; 9: len 2; hex 8001; asc ;; 10: len 1; hex 80; asc ;; 11: len 4; hex 80000001; asc ;; 12: len 6; hex 73797374656d; asc system;; 13: len 6; hex e7a6bbe8818c; asc ;; 14: len 4; hex 59a4c993; asc Y ;; 15: len 4; hex 80000000; asc ;; 16: len 1; hex 80; asc ;; 17: len 1; hex 81; asc ;; 18: len 1; hex 81; asc ;; ------------------ ---TRANSACTION 51055825099, ACTIVE 21 sec 2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1 MySQL thread id 1060025172, OS thread handle 0x7fba05ad0700, query id 55990809629 xx aea cleaning up 2. The following conclusions can be drawn from the above fragment information 2.1 UPDATE approve Statement waiting for primary key index record lock，lock_mode X locks rec but not gap ， space id 746 page no 624， Recorded as primary key 49384 row 2.2 TRANSACTION 51055827249, ACTIVE 20 sec , This transaction lasts 20 seconds. 2.3 TRANSACTION 51055825099, ACTIVE 21 sec , This transaction lasts for 21 seconds, and according to this information, it is likely that it is held by the transaction. UPDATE approve Needed record lock 2.4 TRANSACTION 51055825099, 1 row lock(s) , Based on this information, the transaction can be further inferred. thread id 1060025172 Hold the record lock. 3. Unfortunately, I don't know what it is. SQL Statement that has been executed
1. How to verify the above inference? 2. How to find which SQL holds locks? 3. First, let's start with the table and find out which tables are written to SQL. Through Performance Schema, two suspicious-looking SQL s were discovered
|0c95e7f2105d7a3e655b8b4462251bf2||UPDATE approve SET operator = ? , operator_name = ? , comment = ? , status = ? WHERE ( id = ? )||15||xx||BASE||20170829|
|591226ca0ece89fe74bc6894ad193d71||UPDATE approve SET STATUS = ? , operator = ? , operator_name = ? , COMMENT = ? WHERE approve . id = ?||15||xx||BASE||20170829|
- Further verification
1. Through the above-mentioned SQL, if they update the same id, it is likely to lead to lock waiting. 2. To satisfy the above conjecture, a necessary condition must also be satisfied: the following statement must be executed before the above statement, and there is no commit 3. We went to the server for tcpdump grabbing and found the following: Capturing on Pseudo-device that captures on all interfaces Aug 29, 2017 10:20:23.560491000 xx.xx.xx.xx UPDATE approve SET status=1, operator=1, operator_name='system', comment='\xe7\xa6\xbb\xe8\x81\x8c' WHERE approve.id = 49384 Aug 29, 2017 10:20:23.589586000 xx.xx.xx.xx UPDATE approve SET `operator` = '0',`operator_name` = 'system',`comment` = '\xe7\xa6\xbb\xe8\x81\x8c',`status` = '1' WHERE (`id` = '49384') It just validates our inferences. 4. Manual simulation of this situation, the phenomenon is consistent with our fault, that is, the cause of the problem and find out.
- Problem solving
1. Get the developed code and find that it's python's code, and there's no auto_commit option. 2. The first transaction does not end and there is no commit, resulting in the next transaction waiting for resources to be locked. 3. Why do we need to update the same record twice? We need to know more about the code and business.
1. Next time you encounter a similar problem, you can avoid being confused by the surface phenomena of processlist and be good at understanding the mechanism of lock and the investigation of lock information. 2. When writing code, try to do small transactions, generally use auto_commit, if you need to show open transactions, you should also try to use as much commit as possible. 3. If MySQL can print out the lock directly in the show process list, the waiting lock status will be more humane. 4. Why can't you see which SQL statement holds the lock when showing engine InnoDB status? MySQL can help DBA diagnose problems better if it can provide such information.