Welcome to the MySQL technical articles shared by the great SQL community. If you have any questions or want to learn, you can leave a message in the comment area below. After seeing it, you will answer it
In the MGR test, after artificially creating the disk full problem, the node is OO m killed
Problem description
In MySQL 8.0 26 vs GreatSQL 8.0. During the comparison test of 25, one link is to artificially create a disk full scenario to see whether MGR can respond to requests normally.
During the actual measurement, it is finally found that the node with full disk will eventually be given OOM Kill by the OS due to excessive memory consumption after a long enough duration.
I have reported BUG (#104979) for this problem. The following is a detailed record of the process.
First, use dd to copy empty files directly to fill the disk.
MySQL 8.0.26 test process
disk full report process and when it was OO killed
Take a look at MySQL 8.0 26 what is the output of the log when disk full is encountered:
# When disk full is prompted for the first time at 09:44:10.052558, you can actually write to the log, but you can't write data and binlog 2021-09-18T09:44:10.052558+08:00 10 [ERROR] [MY-000035] [Server] Disk is full writing './yejr-mgr3-relay-bin-group_replication_applier.000046' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs. 2021-09-18T09:44:10.052558+08:00 15 [ERROR] [MY-000035] [Server] Disk is full writing '/data/MySQL/binlog.000039' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs. 2021-09-18T09:54:10.109075+08:00 15 [ERROR] [MY-000035] [Server] Disk is full writing '/data/MySQL/binlog.000039' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs. 2021-09-18T09:54:10.109828+08:00 10 [ERROR] [MY-000035] [Server] Disk is full writing './yejr-mgr3-relay-bin-group_replication_applier.000046' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs. 2021-09-18T09:56:15.020870+08:00 152 [ERROR] [MY-010907] [Server] Error writing file '/data/MySQL/slow.log' (errno: 28 - No space left on device) # The last prompt for disk full is 10:04:10.166349. At this time, the log cannot be written at all 2021-09-18T10:04:10.166349+08:00 15 [ERROR] [MY-000035] [Server] Disk is full writing '/data/MySQL/binlog.000039' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs.
Starting from the disk full time, after about 2.5 hours, the memory consumption of mysqld process continued to rise, and finally caused oom kill
Sep 18 12:56:28 mgr3 kernel: docker-containe invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-500 ... Sep 18 12:56:29 mgr3 kernel: Out of memory: Kill process 9539 (mysqld) score 902 or sacrifice child Sep 18 12:56:29 mgr3 kernel: Killed process 9539 (mysqld), UID 27, total-vm:17020364kB, anon-rss:14644556kB, file-rss:0kB, shmem-rss:0kB
The number of transactions to be authenticated caught at a certain time during this period is actually more than this before being OO killed:
+--------------------------------------+----------+ | id | trx_que | +--------------------------------------+----------+ | dbe4f563-1622-11ec-8cc8-525400e802e2 | 87863918 | +--------------------------------------+----------+
Pay attention to the memory consumption of mysqld process
The following is the memory consumption of mysqld process
# At first, 3G 9539 3144872 /usr/local/mysql-8.0.26-linux-glibc2.12-x86_64/bin/mysqld --defaults-file=/data/MySQL/my.cnf ... # Up to 3.47G 9539 3641984 /usr/local/mysql-8.0.26-linux-glibc2.12-x86_64/bin/mysqld --defaults-file=/data/MySQL/my.cnf ... # Up to 7G 9539 7416908 /usr/local/mysql-8.0.26-linux-glibc2.12-x86_64/bin/mysqld --defaults-file=/data/MySQL/my.cnf ... # It continues to grow until it is finally killed by oom, reaching about 14G 9539 14638256 /usr/local/mysql-8.0.26-linux-glibc2.12-x86_64/bin/mysqld --defaults-file=/data/MySQL/my.cnf
OS layer oom killer related logs:
# mysqld process memory soared to about 14G Sep 18 12:56:29 mgr3 kernel: Out of memory: Kill process 9539 (mysqld) score 902 or sacrifice child Sep 18 12:56:29 mgr3 kernel: Killed process 9539 (mysqld), UID 27, total-vm:17020364kB, anon-rss:14644556kB, file-rss:0kB, shmem-rss:0kB
GreatSQL 8.0.25 test process
For comparison, I use greatsql 8.0 25 did the same test.
# First report disk full 2021-09-18T23:07:49.264992+08:00 89 [ERROR] [MY-000035] [Server] Disk is full writing '/data/GreatSQL/binlog.000085' (OS errno 28 - N o space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs. 2021-09-18T23:07:49.264992+08:00 82 [ERROR] [MY-000035] [Server] Disk is full writing './yejr-mgr3-relay-bin-group_replication_applie r.000147' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 se cs. # An error message is sent, indicating that data/binlog cannot be written because disk is full 2021-09-18T23:07:49.838301+08:00 84 [ERROR] [MY-011071] [Repl] Plugin group_replication reported: 'io full on data or binlog director y' # This node is automatically changed to RO 2021-09-18T23:07:49.852475+08:00 84 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set i nto read only mode after an error was detected.' 2021-09-18T23:07:49.852633+08:00 0 [ERROR] [MY-011486] [Repl] Plugin group_replication reported: 'Message received while the plugin i s not ready, message discarded.' # Next, you are ready to exit the cluster 2021-09-18T23:07:49.856907+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Re-using server node 0 host 17 2.16.16.53' 2021-09-18T23:07:49.856937+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Re-using server node 1 host 17 2.16.16.16' 2021-09-18T23:07:49.856950+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] pid 5211 Installed site start={dba4e34 1394386 1} boot_key={dba4e34 1394375 1} event_horizon=10 node 4294967295 chksum_node_list(&site->nodes) 704906340' 2021-09-18T23:07:55.380111+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installing leave view.' 2021-09-18T23:07:55.380169+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] ::install_view():: No exchanged data' 2021-09-18T23:07:55.380188+08:00 0 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'on_view_changed is called' # view change occurs and the cluster is officially exited 2021-09-18T23:07:55.380242+08:00 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.' 2021-09-18T23:07:55.382254+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Rejecting this message. The gr oup communication engine has already stopped.' ...The same logs in the middle 2021-09-18T23:07:55.382325+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Rejecting this message. The group communication engine has already stopped.' 2021-09-18T23:07:55.382333+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Rejecting this message. The group....The log was not finished, and the disk was completely filled # The following is the new log after I cleaned up disk space the next day 2021-09-19T07:10:52.071942+08:00 82 [ERROR] [MY-013309] [Repl] Plugin group_replication reported: 'Transaction '1:38481943' does not exist on Group Replication consistency manager while receiving remote transaction prepare.' 2021-09-19T07:10:52.071990+08:00 82 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.' 2021-09-19T07:10:52.072032+08:00 82 [Warning] [MY-011646] [Repl] Plugin group_replication reported: 'Skipping leave operation: member already left the group.' 2021-09-19T07:10:52.072049+08:00 82 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
It can be seen from the log details that when the disk space is full, GreatSQL will actively exit the node from the cluster, with very little impact on the whole cluster.
In addition, after exiting from the cluster, authentication transactions will not be received again, so there is no case that the memory continues to soar and is eventually OO killed. In the actual observation process, it is found that the memory has decreased
# Also ` ` memory consumption in the cluster 5211 2790736 /usr/local/GreatSQL-8.0.25-15-Linux-glibc2.17-x86_64-minimal/bin/mysqld ... # After exiting the cluster, the memory is reduced 5211 2801696 /usr/local/GreatSQL-8.0.25-15-Linux-glibc2.17-x86_64-minimal/bin/mysqld 5211 968876 /usr/local/GreatSQL-8.0.25-15-Linux-glibc2.17-x86_64-minimal/bin/mysqld ... # Since then, the memory has maintained this value 5211 969172 /usr/local/GreatSQL-8.0.25-15-Linux-glibc2.17-x86_64-minimal/bin/mysqld
From this comparison, the reliability of GreatSQL is really OK, and the reliability of the official MySQL MGR needs to be further strengthened.
Enjoy GreatSQL :)
Article recommendation:
GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6...
Wan Da #12, how can the MGR cluster automatically select the master without manual intervention
https://mp.weixin.qq.com/s/07...
"2021 data technology Carnival · ON LINE": evolution and practice of MySQL high availability architecture
https://mp.weixin.qq.com/s/u7...
Packet capture analysis of where an sql statement is slow
https://mp.weixin.qq.com/s/AY...
Wan Da #15, what are the conditions that may cause the MGR service to fail to start
https://mp.weixin.qq.com/s/in...
Technology sharing | why MGR consistency mode does not recommend AFTER
https://mp.weixin.qq.com/s/rN...
About GreatSQL
GreatSQL is a MySQL branch maintained by Wanli database. It focuses on improving the reliability and performance of MGR and supports the parallel query feature of InnoDB. It is a branch version of MySQL suitable for financial applications.
Gitee:
https://gitee.com/GreatSQL/Gr...
GitHub:
https://github.com/GreatSQL/G...
Bilibili:
https://space.bilibili.com/13...
Wechat & QQ group:
You can search and add GreatSQL community assistant wechat friends, send verification information "add group" to join GreatSQL/MGR communication wechat group
QQ group: 533341697
Wechat assistant: wanlidbc
This article is composed of blog one article multi posting platform OpenWrite release!