Lao Zhang Mei is not only a fan of Jin Yong, but also a fan of the Three Kingdoms. I like to watch the fighting between Zhuge Liang of Shu State and Sima Yi of Wei State. The use of all kinds of magic tricks is classic. In view of the management of MySQL database, Mr. Zhang also has many ingenious ideas, which will be introduced to you one by one in the future. Speaking back to the three kingdoms, I personally prefer that Shu can be unified, but things go against our wishes. Unfortunately, it is Sima Yan of Wei that finally unifies the world. Some people attribute the failure of Shu State to an unsustainable Liu Miao, others attribute the failure to fate, or even say that "Wolong and Phoenix chicks have one" to get the world, while Liu Bei and Liu Bei have both. Now it sounds ridiculous. In fact, everyone's destiny is still in his own hands.
We should learn to do our best to know the destiny of the world, and try to do everything well without leaving a small detail. Especially in the field of database, we should be more careful. Once a teacher of mine told me that you should learn to integrate your work into your own blood. Only if you really love it, can you study it with your heart!
Every time he wrote a blog, he liked to say some chicken soup for the soul, but he did not like to listen to the old iron guys. In fact, I hope you can do everything with your heart, no matter what industry you are in, you will succeed sooner or later.
Lao Zhang's MySQL online course is also officially launched in 51CTO College. Students who want to learn can visit it.
Website: http://edu.51cto.com/course/10681.html
Any questions can be communicated with the teacher in time.
Today I'll share with you an article about the use of MySQL DBA's essential tools. It can help us manage our database and make our work more efficient.
This tool is an important branch of MySQL, percona, called percona-toolkit (a sharp Swiss army knife), which is a set of commands. Today I'd like to introduce some of our longest-used products in the production environment.
Download address of toolkit: https://www.percona.com/downloads/percona-toolkit/LATEST/
Installation process is very simple, first decompress:
tar -zxvf percona-toolkit-3.0.3_x86_64.tar.gz
Since it is a binary package, it can be decompressed and used directly in the percona-toolkit-3.0.3/bin directory.
One of the best strategies is to:
pt-online-schema-change
Functions can organize table structure online, collect fragments, and add fields and indexes to large tables. Avoid blocking read and write operations caused by lock tables. For MySQL version 5.7, you don't need to use this command, just online DDL.
The presentation process is as follows:
Because it is a test environment, we do not create a table with huge amount of data, mainly to let you understand the process.
This is the situation and structure of the data in the table.
mysql> select count(*) from su; +----------+ | count(*) | +----------+ | 100000 | +----------+ 1 row in set (0.03 sec) mysql> desc su; +-------+------------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------+------+-----+-------------------+-----------------------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | c1 | int(11) | NO | | 0 | | | c2 | int(11) | NO | | 0 | | | c3 | int(11) | NO | | 0 | | | c4 | int(11) | NO | | 0 | | | c5 | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | | c6 | varchar(200) | NO | | | |
The process of adding fields online:
[root@node3 bin]# ./pt-online-schema-change --user=root --password=root123 --host=localhost --alter="ADD COLUMN city_id INT" D=test,t=su --execute No slaves found. See --recursion-method if host node3 has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`su`... Creating new table... Created new table test._su_new OK. Altering new table... Altered `test`.`_su_new` OK. 2017-08-10T14:53:59 Creating triggers... 2017-08-10T14:53:59 Created triggers OK. 2017-08-10T14:53:59 Copying approximately 100163 rows... 2017-08-10T14:54:00 Copied rows OK. 2017-08-10T14:54:00 Analyzing new table... 2017-08-10T14:54:00 Swapping tables... 2017-08-10T14:54:00 Swapped original and new tables OK. 2017-08-10T14:54:00 Dropping old table... 2017-08-10T14:54:00 Dropped old table `test`.`_su_old` OK. 2017-08-10T14:54:00 Dropping triggers... 2017-08-10T14:54:00 Dropped triggers OK. Successfully altered `test`.`su`.
A new city_id field was added to the view result:
mysql> desc su; +---------+------------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+-------------------+-----------------------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | c1 | int(11) | NO | | 0 | | | c2 | int(11) | NO | | 0 | | | c3 | int(11) | NO | | 0 | | | c4 | int(11) | NO | | 0 | | | c5 | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | | c6 | varchar(200) | NO | | | | | city_id | int(11) | YES | | NULL | | +---------+------------------+------+-----+-------------------+-----------------------------+
Two tricks in the kitchen bag:
pt-query-digest
Function: Now capture online TOP 10 slow sql statements.
We all know that most of the database performance problems are caused by sql statements, so we have to catch these criminals. Make relevant optimization treatment in time.
The presentation process is as follows:
Slow sql statements can be sampled according to time intervals. since is an adjustable sql statement
[root@node3 bin]# ./pt-query-digest --since=24h /data/mysql/slow.log > 1.log
Look at the sql report, summarize what the slow statements are, and see the consumption of time.
The following is only part of the reporting process
cat 1.log # Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ================== ============= ===== ======= ===== ============== # 1 0x040ADBE3A1EED0A2 16.8901 87.2% 1 16.8901 0.00 CALL insert_su # 2 0x8E44F4ED46297D4C 1.3013 6.7% 3 0.4338 0.18 INSERT SELECT test._su_new test.su # 3 0x12E7CAFEA3145EEF 0.7431 3.8% 1 0.7431 0.00 DELETE su # MISC 0xMISC 0.4434 2.3% 3 0.1478 0.0 <3ITEMS> # Query 1: 0 QPS, 0x concurrency, ID 0x040ADBE3A1EED0A2 at byte 19060 ____ # Scores: V/M = 0.00 # Time range: all events occurred at 2017-08-02 12:12:07 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 2 1 # Exec time 47 18s 18s 18s 18s 18s 0 18s # Lock time 0 103us 103us 103us 103us 103us 0 103us # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 0 0 0 0 0 0 0 # Query size 0 21 21 21 21 21 0 21 # String: # Databases test # Hosts localhost # Users root # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # 10s+ ################################################################ call insert_su(50000)\G
You can see that the report lists the proportion of sql statement response time and the execution time of sql statement. It is convenient for us to observe intuitively which sentences have problems. (Only one sql is listed here)
The trick is three:
pt-heartbeat
Functional monitoring master-slave delay. Monitor how long the slave library lags behind the main library.
Environment Introduction: 192.168.56.132 main library, 192.168.56.133 slave Library
The operation is as follows:
Execute on the main repository:
[root@node3 bin]# ./pt-heartbeat --database test --update --create-table --daemonize -uroot -proot123
test monitors the synchronization library for me, creates a monitor table heartbeat under it, and the background process updates the table from time to time.
Execute statements on slave libraries that monitor master-slave synchronization latency:
master-server-id is the server-id of the main library, -h (ip of the main library)
[root@node4 bin]# ./pt-heartbeat --master-server-id=1323306 --monitor --database test -uzs -p123456 -h 192.168.56.132 0.00s [ 0.00s, 0.00s, 0.00s ] 0.00s [ 0.00s, 0.00s, 0.00s ] 0.00s [ 0.00s, 0.00s, 0.00s ] 0.00s [ 0.00s, 0.00s, 0.00s ] 0.00s [ 0.00s, 0.00s, 0.00s ] 0.00s [ 0.00s, 0.00s, 0.00s ]
The time is 0 seconds and there is no delay at present.
Four brilliant tricks:
pt-table-checksum
Functional check master-slave replication consistency
Principle: Check the consistency of mysql master-slave replication by executing check statement on the master, generate replace statement, then pass it to slave library by replication, and update the value of master_src by update. Finally, by detecting this_src and master_src from above
Value to determine whether replication is consistent.
Compare the differences between test libraries and execute on the main libraries:
[root@node3 bin]# ./pt-table-checksum --no-check-binlog-format --nocheck-replication-filters --databases=test --replicate=test.checksums --host=192.168.56.132 -uzs -p123456 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 08-10T16:01:02 0 0 1 1 0 0.013 test.heartbeat 08-10T16:01:02 0 0 0 1 0 0.015 test.su 08-10T16:01:02 0 0 0 1 0 0.011 test.t
It can be seen that diff is 0, which proves that there is no difference between master and slave test libraries.
Compare which tables in the test library are different (replicate-check-only needs to be added) and execute on the main library:
[root@node3 bin]# ./pt-table-checksum --no-check-binlog-format --nocheck-replication-filters --databases=test --replicate=test.checksums --replicate-check-only --host=192.168.56.132 -uzs -p123456 Differences on node4 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY test.t 1 1 1
It can be seen that the master and slave data of the table t under the test library are inconsistent.
Five brilliant strategies:
pt-slave-restart
Function: Monitor master-slave errors and try to restart MySQL master-slave
Note: Skip the command of error to solve the problem of multiple data from the database (error code 1062). If there are few data in slave database and errors are skipped, the problem of master-slave synchronization can not be solved at the root (error code 1032). It is necessary to find out what the missing data is first. If there are many missing data, it is suggested to rebuild the master-slave environment.
A 1062 error occurred from the library:
Slave_IO_Running: Yes Slave_SQL_Running: No Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table test.t; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000006, end_log_pos 757482
Need to execute from the library:
[root@node4 bin]# ./pt-slave-restart -uroot -proot123 --error-numbers=1062 2017-08-10T16:28:12 p=...,u=root node4-relay-bin.000002 751437 1062
After skipping the error, check the master-slave result:
Slave_IO_Running: Yes Slave_SQL_Running: Yes
Synchronization is back in line.
Six tricks in the kitchen bag:
pt-ioprofile
Function: It is convenient to locate IO problems and can be located by IO throughput.
[root@node3 bin]# ./pt-ioprofile Thu Aug 10 16:33:47 CST 2017 Tracing process ID 3907 total read pwrite write fsync filename 13.949355 0.839006 0.000000 0.286556 12.823793 /data/mysql/mysql-bin.000006 7.454844 0.000000 2.913702 0.000000 4.541142 /data/mysql/ib_logfile0 0.000193 0.000000 0.000000 0.000193 0.000000 /data/mysql/slow.log Read: Read data from a file. The file to be read is identified by a file descriptor, and the data is read into a pre-defined buffer. write: write data from the buffer into a file. pread: Because the kernel may temporarily suspend the process between lseek and read calls, it causes problems with synchronization. Calling pread is equivalent to calling lseek and read sequentially, which are equivalent to a bundled atomic operation. pwrite: Because between lseek and write calls, the kernel may temporarily suspend the process, causing problems with synchronization. Calling pwrite is equivalent to calling lseek and write sequentially, which are equivalent to a bundled atomic operation. fsync: Ensure that all the modified contents of the file are correctly synchronized to the hard disk, and the call will block until the device reports IO to complete. filename: The name of the file that interacts with the disk
From this report, we can see which file takes up more IO time and is the busiest to interact with disk, which is convenient to lock IO problems.
Because there are many commands in this toolkit, I'll introduce them to you first. Others who are interested can be studied in private.
Official address: https://www.percona.com/doc/percona-toolkit/LATEST/index.html
Finally, Lao Zhang hopes that everyone can become "Wolong or Fengxiao", and get one of you, the company's database is not worrying about it!! ___________