Interpretation of Keepalived High Availability Mechanism

Posted by jdlev on Thu, 23 Sep 2021 11:27:43 +0200

Interpretation of Keepalived High Availability Mechanism

For the Keepalived installation, please refer to the previous article: High availability haproxy + kept installation and deployment configuration

preface:
In the previous article, I introduced how to install the keepalived+HAProxy service. This article mainly interprets the high availability mechanism of Keepalived, mainly including the following aspects:
1.

1, Environment configuration

requirement:

  1. Use 3 virtual machines to test high availability;
  2. All 3 machines are in standby mode
  3. Using haproxy as load balancer
  4. Use keepalived to avoid the single point of failure of haproxy, so as to achieve high availability.

IP allocation is as follows:

  • control01: 192.168.187.81
  • control02: 192.168.187.82
  • control03: 192.168.187.83
  • VIP: 192.168.187.84 - i.e. drift IP

The configuration information of the three hosts is shown in the following table:
Note: priority is the priority in the keepalived configuration file.

nodeIPmacpriorityActive and standby
control01192.168.187.8100:00:00:ad:d2:0a2backup
control02192.168.187.8200.00:00:11:45:a71backup
control03192.168.187.8300:00:00:92:80:ec3master

2, Relevant operation basis

1. linux packet capture analysis

# Packet capture command
tcpdump arp -i ovsbr0 >> arp_all.cap

Analysis process:

  1. Use tcpdump command to obtain arp protocol package of network card ovsbr0;
  2. Save results to arp_all.cap file;
  3. Open the file with wireshark for analysis.

reference resources: Linux Basics: capturing packets with tcpdump

2. View arp cache refresh time

Note: replace ovsbr0 in the path with the network card name of your computer.

[root@control01 ~]# cat /proc/sys/net/ipv4/neigh/ovsbr0/base_reachable_time
30

By default, the arp cache is updated every 30s, that is, the arp request packet will be sent every 30s.

3.

3, Keepalived analysis

The analysis process is as follows:

  1. Turn on the Keepalived service in all three hosts. At this time, the master node is control03;
  2. View the arp table of the three hosts at this time; The viewing command is: arp -a
  3. Open tcpdump in all three hosts to obtain ARP request packets and save them as ARP files_ all_ 01.cap, arp_ all_ 02.cap, arp_ all_ 03.cap;
  4. Wait for about 2min to obtain a certain number of arp request packets;
  5. Stop the Keepalived service at the control03 node. At this time, the control02 node becomes the master node;
  6. View the arp table of the three hosts at this time;
  7. Wait for about 2min again to obtain a certain number of arp request packets;
  8. Stop tcpdump and packet capturing on all nodes.
  9. Analyze the cap file and the Keepalived service log file of each host.

1. arp table analysis

Before stopping the keepalived service (master node: control03)

# control01
[root@control01 ~]# arp -a
_gateway (192.168.187.254) at 00:00:00:e1:00:01 [ether] on ovsbr0
? (192.168.187.84) at 00:00:00:92:80:ec [ether] on ovsbr0
control02 (192.168.187.82) at 00:00:00:11:45:a7 [ether] on ovsbr0
control03 (192.168.187.83) at 00:00:00:92:80:ec [ether] on ovsbr0
? (192.168.187.50) at 00:00:00:8d:68:d6 [ether] on ovsbr0

# control02
[root@control02 ~]# arp -a
_gateway (192.168.187.254) at 00:00:00:e1:00:01 [ether] on ovsbr0
control01 (192.168.187.81) at 00:00:00:ad:d2:0a [ether] on ovsbr0
control03 (192.168.187.83) at 00:00:00:92:80:ec [ether] on ovsbr0
? (192.168.187.84) at 00:00:00:92:80:ec [ether] on ovsbr0

# control03
[root@control03 ~]# arp -a
_gateway (192.168.187.254) at 00:00:00:e1:00:01 [ether] on ovsbr0
control02 (192.168.187.82) at 00:00:00:11:45:a7 [ether] on ovsbr0
control03 (192.168.187.84) at 00:00:00:ad:d2:0a [ether] on ovsbr0
control01 (192.168.187.81) at 00:00:00:ad:d2:0a [ether] on ovsbr0

After the keepalived service is stopped (master: go from control03 to control01)

# control01
[root@control01 ~]# arp -a
_gateway (192.168.187.254) at 00:00:00:e1:00:01 [ether] on ovsbr0
control01 (192.168.187.84) at 00:00:00:92:80:ec [ether] on ovsbr0
control02 (192.168.187.82) at 00:00:00:11:45:a7 [ether] on ovsbr0
control03 (192.168.187.83) at 00:00:00:92:80:ec [ether] on ovsbr0
? (192.168.187.50) at 00:00:00:8d:68:d6 [ether] on ovsbr0

# control02
[root@control02 ~]# arp -a
_gateway (192.168.187.254) at 00:00:00:e1:00:01 [ether] on ovsbr0
control01 (192.168.187.81) at 00:00:00:ad:d2:0a [ether] on ovsbr0
control03 (192.168.187.83) at 00:00:00:92:80:ec [ether] on ovsbr0
? (192.168.187.84) at 00:00:00:ad:d2:0a [ether] on ovsbr0

# control03
[root@control03 ~]# arp -a
_gateway (192.168.187.254) at 00:00:00:e1:00:01 [ether] on ovsbr0
control02 (192.168.187.82) at 00:00:00:11:45:a7 [ether] on ovsbr0
? (192.168.187.84) at 00:00:00:ad:d2:0a [ether] on ovsbr0
control01 (192.168.187.81) at 00:00:00:ad:d2:0a [ether] on ovsbr0

Comparing the arp cache tables before and after the keepalived service is stopped, we can see that:

  1. After stopping the keepalived service of control03, the VIP floats to the control01 node;
  2. Meanwhile, the MAC corresponding to VIP is also updated to the MAC of control01.

2. Packet capture analysis

Open ARP using wireshark_ all_ 01.cap, arp_ all_ 02.cap, arp_ all_ 03. Three files of cap, and the packet capturing results are shown in the figure below:

As can be seen from the above three figures:

  1. After the master node is abnormal, GARP request packets are sent on the three hosts;

    analysis:
    combination

  2. GARP request packets are sent twice, 5 packets are sent each time, with an interval of 5s;

    analysis:
    The sending times and intervals are default values and can be modified in the keepalived.conf configuration file,
    As follows:
    #The time interval at which the master node updates the arp. The default value is 0, indicating that the arp is not updated
    vrrp_garp_master_refresh 60
    #The master node updates the number of sent garp packets. The default value is 1
    vrrp_garp_master_refresh_repeat 2
    #The sending interval of each garp packet can be accurate to milliseconds, that is, 0.01s
    vrrp_garp_interval 0.01

  3. The Sender IP and Target IP of the three hosts are VIP, and the Sender MAC and Targer MAC are 00:00:00:ad:d2:0a;

    reflection:
    1. Why does each host node have a VIP address?
    2. Why is the Sender MAC of each host node the MAC of control01?
    3. After three host nodes have VIP s, what mechanism does keepalived use to determine control01 as the master node?
    analysis:
    Please see the analysis of the kept log section below.

3. keepalived log analysis

Note:
Due to some problems in viewing the OpenStack server-side kept logs, the log analysis in this part is based on the previous blog High availability haproxy + kept installation and deployment configuration Log of the configured local environment.
HA02 is the primary node.
IP allocation is as follows:

  • HA01: 192.168.131.11 priority 1 backup
  • HA02: 192.168.131.12 priority: 2 master
  • HA03: 192.168.131.13 priority: 3 backup
  • VIP: 192.168.131.20 - i.e. drift IP

For the complete log of the keepalived service, please refer to the blog: Kept service log
HA01 node keepalived log (excerpt):

Sep 23 14:48:28 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) Entering BACKUP STATE
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) Receive advertisement timeout
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) Entering MASTER STATE
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) setting VIPs.
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) Sending/queueing gratuitous ARPs on ens33 for 192.168.131.20
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) Master received advert from 192.168.131.13 with higher priority 3, ours 1
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) Entering BACKUP STATE
Sep 23 14:48:32 HA01 Keepalived_vrrp[1233]: (kolla_internal_vip_50) removing VIPs.

HA03 node keepalived log (excerpt):

Sep 23 14:04:40 HA03 systemd-logind: New session 9 of user root.
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Receive advertisement timeout
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Entering MASTER STATE
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) setting VIPs.
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Sending/queueing gratuitous ARPs on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Received advert from 192.168.131.11 with lower priority 1, ours 3, forcing new election
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Sending/queueing gratuitous ARPs on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Sending/queueing gratuitous ARPs on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Sending/queueing gratuitous ARPs on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20
Sep 23 14:06:31 HA03 Keepalived_vrrp[53352]: Sending gratuitous ARP on ens33 for 192.168.131.20

By analyzing the logs of nodes HA01 and HA03, we can see that:

  1. After detecting that the master node is disconnected, HA01 and HA03 immediately set themselves as master nodes, such as:

    Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Receive advertisement timeout
    Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) Entering MASTER STATE
    Sep 23 14:06:26 HA03 Keepalived_vrrp[53352]: (kolla_internal_vip_50) setting VIPs.

  2. After setting itself as the master node, send the garp broadcast immediately:

    1. By default, it is sent twice, 5 garp packets are sent each time, with an interval of 5s;
    2. The garp packet contains its own priority data;

  3. After HA03 receives the garp packet with priority 1 of HA01, it judges that it has a higher priority and immediately sends the garp packet twice again;
  4. After receiving the garp packet from HA03, HA01 determines that its priority is low and enters the backup state;
  5. VIP successfully bound to HA03 node.

4. keepalived configuration file

Topics: Load Balance OpenStack cloud computing