Principle and related experiments of Lianx network card binding (Bond0-6)

Posted by melsi on Sun, 26 Dec 2021 10:35:32 +0100

1. Network card binding principle

Under normal circumstances, the working mode of the network card is Direct Model. In this mode, the network card only receives frames whose destination address is its own Mac address. Other data frames are filtered out to reduce the burden of the driver. However, the network card also supports another hybrid mode, which can receive all frames on the network.

Bonding runs in this mode. Two network cards are used to form a virtual network card. The aggregated device looks like a separate Ethernet interface device. Generally speaking, the two network cards have the same IP address and are linked in parallel to aggregate into a logical link. The physical network card transmits the corresponding data frame to the bond driver for processing. Redundancy and load balancing of network devices can be realized through network card binding.

2. Bound category

2.0. Bond0: Round Robin (balanced round robin strategy)

Features: data packets are transmitted in sequence (i.e. the first packet goes to ens33, and the next packet goes to ens34... Keep cycling until the last transmission is completed). This mode provides load balancing.

Disadvantages: we know that if the data packets of a connection or session are sent from different interfaces and pass through different links halfway, the problem of disorderly arrival of data packets is likely to occur at the client, and the disorderly arrival data packets need to be sent again, so the throughput of the network will be reduced.

2.1. Bond1: active backup (primary backup policy)

Features: only one device is active. When one device goes down, the other is immediately converted from backup to primary. The Mac address is externally visible. From the outside, the Mac address of bond is unique to avoid confusion in the switch. This mode only provides fault tolerance and high availability of network connection.

Disadvantages: the resource utilization rate is low. Only one interface is working. When there are N network interfaces, the resource utilization rate is 1/N.

2.2. Bond2: balance XOR

Features: transmit packets based on the specified transmission policy. Via xmit_hash_policy configuration item to determine the specific transmission policy. The default is XOR Hash load sharing (XOR Hash). This mode provides load balancing and fault tolerance.
XOR hash load sharing rule: select the number of network cards = (source MAC address XOR target MAC address)% number of Slave network cards.
Overview of the strategy algorithm: when receiving and sending data, Bond selects the Slave to be scheduled through XOR hash algorithm. The purpose of using hash is to distribute the data to each Slave as much as possible, so as to realize load balancing; XOR XOR is used to avoid hash conflicts (for example, we use square middle method and other methods to avoid conflicts and optimize hash lookup efficiency), so as to realize the optimization of load balancing.

# How XOR hash load balancing works
 for instance:
here BOND2 of MAC by AA,BOND2 There are four of them Slave,No Slave_0,1,2,3
 here BOND2 Received from the network MAC Address is BB The hash is calculated according to the XOR hash:( AA XOR BB ) % 4 = 1
AA	| 1010 1010	
BB	| 1011 1011	
XOR	| 0001 0001 -> 16+1 -> 17
17 % 4 = 1
 Then the data will be handed over to Slave_1 Processing, and the following also comes from the MAC The data of the address is still handed over to Slave_1 handle. other MAC Address data will be assigned with a high probability to Slave_0,2,3,Small probability assignment to Slave_1. 
For different sources MAC Data, XOR hash can be evenly shared among network cards.

Q1: why XOR, and or?
A1: in terms of mathematics, the truth value of and or is unevenly distributed, which is different from (25% for the same 1 out of 1), or (25% for the same 0 out of 0), or can ensure a distribution of 50%: 50%, which can ensure better randomness in the hash process and avoid the probability of hash conflict.

Q2: since load balancing is achieved through XOR hashing, how does Bond2 achieve redundancy?
A2: in Bond2, the XOR hash algorithm is used to balance the number of Slave, which is actually the balance of the currently active Slave. For Bond2 of four Slave in normal operation, where Slave_ When 1 goes down, the number of Slave will change to 3, and the sequence of Slave will also change:
[0 1 2 3] normal status - > [0 2 3] slave1 down - > [0 1 2] new normal status
It was originally assigned to slave_ The data of 1 is saved by the new Slave_1 receive.

Through Xmit in Bond2_ hash_ Policy specifies the Slave scheduling policy, which defaults to layer2 mentioned above, that is, only the MAC address is used as the basis for distinguishing data allocation. You can increase the filtering of IP layer and application layer to ensure that data from different IP and even data from different ports can be allocated to different Slave, so as to achieve better load balancing. Of course, you should also set the most appropriate load balancing strategy according to the specific situation of the server. As for where to configure this configuration item, we will talk about it in the binding implementation later.

xmit_hash_policy Configuration item:

xmit_hash_policy=layer2
(source MAC XOR destination MAC) % slave quantity
 Explanation: using hardware MAC Addressable XOR To generate a hash

xmit_hash_policy=layer2+3
(((source IP XOR destination IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) % slave quantity
 Explanation: using hardware MAC Address and IP Address generation hash

xmit_hash_policy=layer3+4
((source port XOR dest port) XOR((source IP XOR dest IP) AND 0xffff)% slave quantity
 Explanation: use hardware port number and IP Address generation hash

2.3. Bond3: Broadcast

Features: all packets are sent from all network interfaces. This mode is applicable to the financial industry because they need a highly reliable network and are not allowed to have any problems. It needs to cooperate with the aggregation forced non negotiation mode of the switch.
Disadvantages: there is only redundancy mechanism, but it is too wasteful of network resources and local network equipment resources.

2.4. Bond4: IEEE 802.3ad (dynamic link aggregation)

Feature: at startup, an aggregation group will be created according to IEEE 802.3ad specification. Using the dynamic link aggregation policy, all Slave network cards share the same rate and duplex settings.
Necessary conditions:
1. Support the use of ethtool tool to obtain the rate and duplex settings of each slave network card;
2. The switch needs to support IEEE 802.3ad Dynamic link aggregation mode
IEEE 802.3ad specification:
IEEE 802.3ad mainly makes certain standards and specifications for link aggregation control protocol.

Link aggregation, also known as port aggregation and port bundling technology. The function is to bundle multiple low bandwidth ports of the switch into a high bandwidth link, and balance the link load through several ports to avoid link congestion.

2.5. Bond5: balance TLB (adapter transport load balancing)

Features: select a slave to send according to the load of each slave, and use the current slave when receiving. This mode requires the device driver of the slave interface to have ethtool support. Switch support is not required. On each slave, outgoing traffic is distributed according to the current load (calculated according to speed).
Necessary condition: ethtool supports obtaining the rate of each slave.
Disadvantages: sending load balancing is not supported

2.6. Bond6: balance ALB (adapter adaptive load balancing)

Features: this mode includes Bond5 mode, At the same time, the receive load balance also does not need the support of the switch. The receive load balance is realized through ARP negotiation. The binding driver intercepts the ARP response sent by the local machine and rewrites the source hardware address to the unique hardware address of a slave in the bond, so that different opposite ends can communicate with different hardware addresses.
Disadvantages: there are some problems in ARP negotiation (not deeply understood, only listed)

3. Implementation of binding

3.1. Modify multiple physical network card profiles

# Take bond1 as an example
# vim /etc/sysconfig/network-script/ifcfg-ens33
HWADDR=00:0C:29:E0:B9:BC
MACADDR=preserve
TYPE=Ethernet
NAME="bond1 slave 1"
UUID=ef2b8ef8-531f-4f09-902d-2442f096cb12
DEVICE=ens33
ONBOOT=yes
MASTER=bond1
SLAVE=yes
# vim /etc/sysconfig/network-script/ifcfg-ens34
HWADDR=00:0C:29:E0:B9:C6
MACADDR=preserve
TYPE=Ethernet
NAME="bond1 slave 2"
UUID=9a471884-2e42-4dd1-a876-1ad7aeedcc2f
DEVICE=ens34
ONBOOT=yes
MASTER=bond1
SLAVE=yes

3.2. Create a new bond network card configuration file and configure bond parameters

# Network part
BOOTPROTO=none
IPADDR=192.168.157.203
PREFIX=24
GATEWAY=192.168.157.2
DNS1=8.8.8.8
# IPV6 settings
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_PRIVACY=no
IPV6_ADDR_GEN_MODE=stable-privacy
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
# bond parameters
BONDING_OPTS="downdelay=0 miimon=1 mode=active-backup updelay=0"
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
NAME="Bond connection 1"
UUID=bacf3665-d467-4059-bd33-98d8919a3e41
DEVICE=bond1
ONBOOT=yes
FAIL_OVER_MAC=2

BONDING_OPTS key parameters:

mode=		:  Specify binding mode 0~6,Or the abbreviation of the corresponding mode

miimon= 	:  appoint ARP Link monitoring frequency, in milliseconds(ms)

downdelay	:  It is used to wait for a period of time after a link fault is found, and then disable one slave

updelay		:  The waiting time before activating a link when a link recovery is found

xmit_hash_policy :  Transmission strategy

3.3. Start bond module

systemctl network restart

4. Binding experiment

Experimental preparation

# VMware NAT mode uses static address assignment
# Address assignment
192.168.157.103 bond0
192.168.157.203 bond1
192.168.157.113 bond2
192.168.157.123 bond3
192.168.157.133 bond5
192.168.157.53  bond6
192.168.157.8   pinger
192.168.157.6-7 control group

# ifstat
# TX RATE is the transmission speed and RX RATE is the reception speed

Every 1.0s: ifstat                                      Tue Aug  3 02:22:58 2021
#kernel
Interface        RX Pkts/Rate    TX Pkts/Rate    RX Data/Rate    TX Data/Rate
                 RX Errs/Drop    TX Errs/Drop    RX Over/Rate    TX Coll/Rate
lo                     0 0             0 0             0 0             0 0
                       0 0             0 0             0 0             0 0
ens33                  1 0             0 0            60 0             0 0
                       0 0             0 0             0 0             0 0
ens34                  0 0             1 0             0 0           130 0
                       0 0             0 0             0 0             0 0
bond2                  1 0             1 0            60 0           130 0
                       0 0             0 0             0 0             0 0
                       
The above dynamic refresh is actually through watch -n 1 "ifstat" Command to refresh once a second

4.0 Bond0 experiment

4.0. 1 experimental steps

	1. stay bond0 Virtual machine background execution ping pinger command
	2. stay bond0 Virtual machine foreground execution watch -n 1 "ifstat" command 

4.0. 2 experimental results

	ifstat Shows that two network cards are alternately receiving data from ssh Linked network data

4.1 Bond1 experiment

4.1. 1 experimental steps

	1. stay bond1 Virtual machine foreground execution watch -n 1 "ifstat" command 
	2. stay bond1 Virtual machine others shell Medium execution ifdown ens33 Command, observe ifstat Post execution ifup ens33
	3. stay bond1 Virtual machine others shell Medium execution ifdown ens34 Command, observe ifstat Post execution ifup ens34

4.1. 2 experimental results

	1. bond1 After startup, from ifstat It can be seen that only one network card is working, and this network card is the main network card.
	2. After the primary network card hangs up and resumes operation, the network appears a short non response state and will recover after several seconds.
	3. After the network card is hung up and restored to operation, the network is in a short unresponsive state, and its identity is switched to the main network card.
	4. New join bond Your network card will automatically become the primary network card.

2.4.2 Bond2 experiment

4.2 Bond2 experiment

4.2. 1 experimental steps

	1. stay bond2 Virtual machine foreground execution watch -n 1 "ifstat" command 
	2. stay bond2 Virtual machine background execution ping pinger command
	3. Increase load: on the host Win10 Use on xftp And bond2 Data transmission

4.2. 2 experimental results

	1.from ifstat See in ping In the process, ens33 Only receive data, not send data, ens34 Send data only and do not accept data.
	2. After increasing the load, you can see ens33 Both receiving and sending data, ens34 Send data only and do not accept data.
	3. Corollary: if you continue to increase the load, you can see ens33 Both receiving and sending data, ens34 It also takes into account receiving and transmitting data.
	(This part of the experimental phenomenon cannot be reasonably explained and needs further understanding)

4.3 Bond3 experiment

4.3. 1 experimental steps

	1.stay bond3 Virtual machine rear foreground execution ping pinger command

4.3. 2 experimental results

	1.On receipt ping In the reply, each normal message will be received immediately DUP Warning message.
	(DUP yes DUPLICATE An abbreviation for ping Multiple duplicate value responses are received when the packet is received)
	This is because in bond3 ping Other hosts broadcast due to dual network cards icmp Message, so two copies will be received icmp Reply message, received ping of DUP Warning.

4.4 Bond4 experiment

4.4. 1 experimental steps

	Switch support is required, so the experiment cannot be carried out temporarily.

4.4. 2 experimental results

	nothing

4.5 Bond5 experiment

4.5. 1 experimental steps

	1. stay bond5 Virtual machine background execution ping pinger command
	2. stay bond5 Virtual machine foreground execution watch -n 1 "ifstat" command 

4.5. 2 experimental results

	from ifstat It can be seen that bond5 When the network sends data, the traffic will be basically evenly distributed between the two network cards.

4.6 Bond6 experiment

4.6. 1 experimental steps

	1. stay bond6 Virtual machine background execution ping pinger command
	2. stay bond6 Virtual machine foreground execution watch -n 1 "ifstat" command 
	3. stay pinger Virtual machine foreground execution ping bond6 command
	4. wireshark View the process by capturing packets

4.6. 2 experimental results

	1. from ifstat It can be seen that bond5 When the network sends and receives data, the traffic will be basically evenly distributed between the two network cards.
	2. You can see through the packet capture bond6 During data transmission ARP Negotiation process
# General process of ARP negotiation
IP: 
pinger 192.168.157.11
bond6 192.168.157.53
MAC: 
pinger       x:08
bond6 ens33  x:03 
bond6 ens37  x:f9
--------------------------------------------------------------------------
time line ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  
--------------------------------------------------------------------------
stage     |   ping request    |     reply     |       bond internal
IP        |    11 -> 53       |   53 -> 11    |       /           /
MAC       |    08 -> 03       |   f9 -> 08    |    f9 -> f9    03 -> 03
--------------------------------------------------------------------------

Topics: Linux network Operating System