MHA High Availability Configuration and Failover

Posted by XiaoRulez on Mon, 06 Sep 2021 18:54:03 +0200

Catalog

1. Understanding MHA

1. Overview of MHA

2. Composition of MHA

3. Characteristics of MHA

4.MHA cluster architecture diagram

 2. Build MySQL MHA

1. Construction of MHA

2. Fault simulation

 Algorithms for failover alternate primary libraries:

Failure Repair Steps:

1. Understanding MHA

1. Overview of MHA

MHA (MasterHigh Availability) is an excellent set of software for failover and master-slave replication in MySQL highly available environment.

The emergence of MHA is to solve the MySQL single point problem.

During MySQL failover, MHA can automatically complete the failover within 0-30 seconds.

MHA ensures maximum data consistency during failover to achieve true high availability.

2. Composition of MHA

# MHA Node (Data Node)

MHA Node runs on each MySQL server.

** MHA Manager (Management Node)

MHA Manager can be deployed on a separate machine to manage multiple master-slave clusters.You can also deploy on a slave node.

The MHA Manager periodically detects master nodes in the cluster.When the master fails, it can automatically promote the slave of the latest data to the new master, then redirect all other slaves to the new master.The entire failover process is fully transparent to the application.

3. Characteristics of MHA

During automatic failover, MHA tries to save binary logs from the downtime master server to ensure maximum data loss

With semi-synchronous replication, you can greatly reduce the risk of data loss. If only one slave has received the latest binary logs, MHA can apply the latest binary logs to all other slave servers, thereby ensuring data consistency across all nodes.

Currently, MHA supports a master-slave architecture with at least three services, one master-two-slave

4.MHA cluster architecture diagram

 2. Build MySQL MHA

The schematic diagram used in the experiment

 Experimental ideas:

1. MHA architecture
1) Database Installation
2) One master, two subordinates
3) MHA build

2. Fault simulation
1) Main warehouse failure
2) Alternate primary library becomes primary Library
3) Rejoining the original failure master library recovery to MHA as slave Library

Experimental preparation:

Hostoperating systemIP AddressInstallation packages/software/tools
MHAmanagerCentOS7192.168.111.175MHAnode component, MHAmanager component
MasterCentOS7192.168.111.176mysql-boost-5.7.20.tar.gz, MHAnode components
Slave1CentOS7192.168.111.177mysql-boost-5.7.20.tar.gz, MHAnode components
Slave2CentOS7192.168.111.178mysql-boost-5.7.20.tar.gz, MHAnode components

1. Construction of MHA

Turn off all server firewalls and security mechanisms

systemctl stop firewalld
systemctl disable firewalld
setenforce 0

Modify the hostname of Master, Slave1, Slave2 nodes


Modify Master's Mysql Master Profile/etc/my.cnf

 

Modify Mysql main profile/etc/my.cnf for Slave1

 

Modify Mysql main profile/etc/my.cnf for Slave2


 

 Create two soft links on the Master, Slave1, Slave2 nodes (Master, Slave1, Slave2 all need to be configured)

 Configure mysql one master two slaves

(1) mysql authorization for all database nodes (Master, Slave1, Slave2 need to be configured)

 (2) View binaries and synchronization points on the Master node

 (3) Perform synchronization on Slave1, Slave2 nodes

change master to master_host='192.168.111.176',master_user='myslave',master_password='666666',master_log_file='master-bin.000001',master_log_pos=1747; 

 View data synchronization results on Slave1, Slave2 nodes

 

 Two slave libraries must be set to read-only mode (Slave1, Slave2)

 Insert Data Test Database Synchronization

Insert strip data into Master Library to test for synchronization

See if synchronization from server 

 Install MHA software

Install MHA-dependent environments on all servers, starting with epel source (4 servers)

yum install epel-release --nogpgcheck -y


yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

 

 Install MHA package

The node component must be installed on all servers first. For each operating system version, CentOS 7.4 must choose version 0.57.

The node component must be installed on all servers before the manager component can be installed on the MHA-manager node because the manager depends on the node component.

Install node components on all servers first

cd /opt
tar zxvf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57
perl Makefile.PL
make && make install

 Install the manager component on the HA manager node

cd /opt
tar zxvf mha4mysql-manager-0.57.tar.gz
cd mha4mysql-manager-0.57
perl Makefile.PL
make && make install

 Several tools are generated under /usr/local/bin after the manager component is installed, including the following:

masterha_check_sshCheck SHH configuration of MHA
masterha_check_replCheck MySQL replication status
masterha_mangerScript to start manager
masterha_check_statusDetecting the current MHA running state
masterha_master_monitorDetect whether master is down
masterha_master_switchControl failover (automatic or manual)
masterha_conf_hostAdd or remove configured server information
masterha_stopClose manager

Several scripts will also be generated under/usr/local/bin after the node component is installed

Note: These tools are usually triggered by MHAManager scripts and do not require human action:

The main points are as follows:

save_binary_logsSave and copy master's binary log
apply_diff_relay_logsRelay log events that identify differences and apply their differences to other slave s
filter_mysqlbinlogRemove unnecessary ROLLBACK events (MHA no longer uses this tool)
purge_relay_logsClear relay logs (will not block SQL threads)

 Configure Password-Free Authentication on All Servers

Password-free authentication configured on the manager node to all database nodes

ssh-keygen -t rsa 				#Press Enter all the way
ssh-copy-id 192.168.111.176
ssh-copy-id 192.168.111.177
ssh-copy-id 192.168.111.178

 Password-free authentication configured on master to database nodes slave1 and slave2

ssh-keygen -t rsa
ssh-copy-id 192.168.111.177
ssh-copy-id 192.168.111.178

Password-free authentication configured on slave1 to database nodes master and slave2

ssh-keygen -t rsa
ssh-copy-id 192.168.111.176
ssh-copy-id 192.168.111.178

Password-free authentication configured to database nodes master and slave1 on slave2

ssh-keygen -t rsa
ssh-copy-id 192.168.111.176
ssh-copy-id 192.168.111.177

 Configure MHA on the manager node

Copy the script on the manager node to the / usr/local/bin directory

cp -rp /opt/mha4mysql-manager-0.57/samples/scripts /usr/local/bin

Four execution files are copied

The role of scripts:

master_ip_failoverScript for VIP management on automatic switch
master_ip_online_changevip management during online switching
power_managerScript to shut down the host after a failure occurs
send_reportScript to send an alert after failover

Copy the above VIP-managed scripts for automatic switching to the / usr/local/bin directory, using master_ip_failover script to manage VIP and failover

cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin

 The modifications are as follows: (Delete the original content, copy the following directly and modify vip-related parameters)

[server default]
manager_log=/var/log/masterha/app1/manager.log      #manager log
manager_workdir=/var/log/masterha/app1.log        #manager working directory
master_binlog_dir=/usr/local/mysql/data/         #The location where Master saves the binlog, where the path matches that of the binlog configured in master so that MHA can find it
master_ip_failover_script=/usr/local/bin/master_ip_failover  #Set the switch script for automatic failover, which is the script above
master_ip_online_change_script=/usr/local/bin/master_ip_online_change  #Set switch script for manual switch
password=666666 		#Set the password for the root user in mysql, which is the password you created earlier to monitor the user
ping_interval=1				#Set the time interval for monitoring the main library to send ping packets, default is 3 seconds, failover automatically when three attempts fail
remote_workdir=/tmp			#Set the location where binlog will be saved for remote mysql when a switch occurs
repl_password=666666		#Set the password for the replication user
repl_user=myslave			#Set up users for replication users
report_script=/usr/local/send_report     #Set the script for alerts sent after a switch occurs
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.80.11 -s 192.168.80.12	#Specify the slave server IP address to check
shutdown_script=""			#Set up the fail host script to shut down after a failure occurs (the main purpose of this script is to shut down the host to prevent brain fissures, which are not used here)
ssh_user=root				#Set ssh login user name
user=mha					#Set monitoring user root

[server1]
hostname=192.168.111.176
port=3306

[server2]
hostname=192.168.111.177
port=3306
candidate_master=1
#Set as candidate master, after setting this parameter, master-slave switching will promote this slave from the library to the master library even if it is not the latest slave in the cluster

check_repl_delay=0
#By default, MHA will not choose a slave as a new master if it falls behind the master's relay logs exceeding 100M, because recovery of the slave will take a long time.By setting check_repl_delay=0, MHA triggered switch ignores replication delay when selecting a new master. This parameter sets candidate_forA host with master=1 is useful because this candidate must be a new master during the switching process

[server3]
hostname=192.168.111.178
port=3306
vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#############################Add Content Section#########################################
my $vip = '192.168.111.111';								#Specify the address of vip
my $brdc = '192.168.111.255';								#Specify vip broadcast address
my $ifdev = 'ens33';										#Specify vip-bound network card
my $key = '1';												#Specify vip-bound virtual network card serial number
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";		#Represents the value of this variable as ifconfig ens33:1 192.168.80.200
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";		#Represents the value of this variable as ifconfig ens33:1 192.168.80.200 down
my $exit_code = 0;											#Specify exit status code 0
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

 

 Create the MHA software directory and copy the configuration file, where the app1.cnf configuration file is used to manage the mysql node server
mkdir /etc/masterha

cp /opt/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha

Also delete the original contents of this file and follow these steps to modify it

[server default]
manager_log=/var/log/masterha/app1/manager.log      #manager log
manager_workdir=/var/log/masterha/app1.log        #manager working directory
master_binlog_dir=/usr/local/mysql/data/         #The location where Master saves the binlog, where the path matches that of the binlog configured in master so that MHA can find it
master_ip_failover_script=/usr/local/bin/master_ip_failover  #Set the switch script for automatic failover, which is the script above
master_ip_online_change_script=/usr/local/bin/master_ip_online_change  #Set switch script for manual switch
password=666666			#Set the password for the root user in mysql, which is the password you created earlier to monitor the user
ping_interval=1				#Set the time interval for monitoring the main library to send ping packets, default is 3 seconds, failover automatically when three attempts fail
remote_workdir=/tmp			#Set the location where binlog will be saved for remote mysql when a switch occurs
repl_password=666666		#Set the password for the replication user
repl_user=myslave			#Set up users for replication users
report_script=/usr/local/send_report     #Set the script for alerts sent after a switch occurs
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.111.177 -s 192.168.111.178	#Specify the slave server IP address to check
shutdown_script=""			#Set up the fail host script to shut down after a failure occurs (the main purpose of this script is to shut down the host to prevent brain fissures, which are not used here)
ssh_user=root				#Set ssh login user name
user=mha					#Set monitoring user root

[server1]
hostname=192.168.111.176
port=3306

[server2]
hostname=192.168.111.177
port=3306
candidate_master=1
#Set as candidate master, after setting this parameter, master-slave switching will promote this slave from the library to the master library even if it is not the latest slave in the cluster

check_repl_delay=0
#By default, MHA will not choose a slave as a new master if it falls behind the master's relay logs exceeding 100M, because recovery of the slave will take a long time.By setting check_repl_delay=0, MHA triggered switch ignores replication delay when selecting a new master. This parameter sets candidate_forA host with master=1 is useful because this candidate must be a new master during the switching process

[server3]
hostname=192.168.111.178
port=3306

Copy the following directly

vim /etc/masterha/app1.cnf						#Delete original content, copy and modify IP address of node server directly
[server default]
manager_log=/var/log/masterha/app1/manager
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=666666
ping_interval=1
remote_workdir=/tmp
repl_password=666666
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.111.177 -s 192.168.111.178
shutdown_script=""
ssh_user=root
user=mha

[server1]
hostname=192.168.111.176
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.111.177
port=3306

[server3]
hostname=192.168.111.178
port=3306

 The first configuration requires the virtual IP to be manually turned on on the Master node

Tests ssh passwordless authentication on the manager node and outputs success if normal

masterha_check_ssh -conf=/etc/masterha/app1.cnf

Testing the mysql master-slave connection on the manager node resulted in the word MySQL Replication Health is OK indicating normal.

masterha_check_ssh -conf=/etc/masterha/app1.cnf

 

 Note (troubleshooting)

 Start MHA on the manager node

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &


--remove_dead_master_conf: This parameter represents the old master library's ip This will be removed from the configuration file.
--manger_log: Log storage location.
--ignore_last_failover: By default, if MHA Continuous downtime detected, and less than 8 hours between downtime, will not occur Failover, The reason for this limitation is to avoid ping-pong Effect.This parameter represents ignoring the last time MHA Triggers the file generated by the switch, and by default, MHA When a switch occurs, the log directory is set up above app1.failover.complete File, the next time you switch, you will not be allowed to trigger a switch if you find that the file exists in that directory, unless you receive the deletion of the file after the first switch. For convenience, set this to--ignore_last_failover. 

 Looking at the MHA status, you can see that the current master is a Mysql1 node.

masterha_check_status --conf=/etc/masterha/app1.cnf

 Looking at the MHA log, you can also see that the current master is 192.168.111.176

 

cat /var/log/masterha/app1/manager.log | grep "current master"

View Master's VIP address 192.168.168.111 If it exists, the VIP address will not disappear because the manager node stops the MHA service. (Make sure you turn it on manually before you see it)

MHA is now complete 

To shut down the manager service, you can use the following command

masterha_stop --conf=/etc/masterha/app1.cnf
 Or you can use it directly kill process ID Mode is off.

2. Fault simulation

Monitor observation log records on the manager node

tail -f /var/log/masterha/app1/manager.log

 Stop mysql service on the Master node Mysql1

 After a normal automatic switch, the MHA process exits.HMA automatically modifies the contents of the app1.cnf file and deletes the downtime mysql1 node.See if mysql2 takes over VIP

 Algorithms for failover alternate primary libraries:

1 Generally, position/GTID is used to judge the quality of slave, which is closest to master and becomes the candidate.

2 If the data is consistent, select the alternate primary Library in the configuration file order.

3 Set weights (candidate_master=1), which forces the designation of alternatives by weight.
(1) By default, if a slave falls behind the relay logs of master 100M, it will fail even if it has weight.
(2) If check_Repl_If delay = 0, even logs are far behind, they are forced to be chosen as alternates.

Failure Repair Steps:

1. Repair mysql

2. Repair master-slave

3. Modify the configuration file app1.cnf on the manager node (add this record again because it will disappear automatically when it detects failures)

4. Start MHA on the manager node

Perform synchronization on the original primary library server Master

change master to master_host='192.168.111.177',master_user='myslave',master_password='666666',master_log_file='master-bin.000001',master_log_pos=1747;

 

 

 Start MHA on the manager node

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

 Solve the problem of incompatibility between Chinese and English

dos2unix /usr/local/bin/master_ip_failover 

Topics: Linux Database MySQL ssh server