How to use TCP Wrappers to protect MySQL from service interruption

Posted by Solarpitch on Tue, 18 Feb 2020 11:08:40 +0100

By Ananias Tsalouchidis Translated by: Meng Wei Original text: https://www.percona.com/blog/2020/01/07/how-securing-mysql-with-tcp-wrappers-can-cause-an-outage/

case

Protecting MySQL is always a challenge. There are general best practices for installing reinforcement, but the more complex your settings are, the more likely you are to encounter problems that are difficult to troubleshoot.

We recently looked at a case where MySQL began to become unavailable when active threads were high and exceeded a threshold (but not always the same).

During this period, there are many logs like the following, mysqld does not respond for a few seconds.

2019-11-27T10:26:03.476282Z 7736563 [Note] Got an error writing communication packets
2019-11-27T10:26:03.476305Z 7736564 [Note] Got an error writing communication packets

**"Got an error writing communication packets" * "is a very common log message, which may be caused by many reasons. (for official documents, please refer to the link at the end)

How do we deal with this problem and find the root cause

The first thing to do is to execute a simple loop remotely to determine whether this is a random occurrence, a network problem or a problem related to mysqld itself.

[RDBA] percona@monitoring1: ~ $ time for i in {1..100}; \
do mysql -h 10.0.2.14 -Bsse "show status like '%uptime';"; \
done
Uptime 3540
Uptime 3540
Uptime 3540
Uptime 3541
Uptime 3541
Uptime 3541
Uptime 3541
Uptime 3542
Uptime 3542
Uptime 3542
Uptime 3543
Uptime 3543
Uptime 3543
Uptime 3543
Uptime 3543
Uptime 3544
^C

What I want to do at first is to confirm the behavior of customer reports. Therefore, since all application servers are in remote locations (so the client is linked via TCP), would you like to confirm if any remote connections have been dropped (this is due to network problems)? Or is MySQL not responding for any reason?). You also want to verify if there is a scenario where a connection in X is dropped or dropped after a certain period of time. Identifying scenarios often helps to identify what the root cause is. Another reason to perform this remote connection loop is to verify that the problem occurs only at remote connections or at local connections (local connections will be tested later). In the network layer troubleshooting, no problems were found, so we decided to use another loop to link to mysqld locally through TCP. This test shows that MySQL is really not available (or at least not randomly accessible). Unfortunately, the local connection was not tested through the socket at that time. The network layer is completely bypassed by a socket connection. If you try to connect using a socket, you immediately realize that this is not actually a MySQL problem, because MySQL is always available (so something is blocking the connection at the network level). Here are more details. Continue troubleshooting and netstat shows that many connections are in the time \. Time? Wait indicates that the source side has closed the connection. Here is an example of using netstat to identify a TCP connection in a test environment.

[RDBA] percona@db4-atsaloux: ~ $ sudo netstat -a -t
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:sunrpc          0.0.0.0:*               LISTEN
tcp        0      0 db4-atsaloux:42000      0.0.0.0:*               LISTEN
tcp        0      0 localhost:domain        0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:nrpe            0.0.0.0:*               LISTEN
tcp        0      0 db4-atsaloux:ssh        10.0.2.10:35230         ESTABLISHED
tcp        0     36 db4-atsaloux:ssh        10.0.2.10:39728         ESTABLISHED
tcp        0      0 db4-atsaloux:49154      10.0.2.11:mysql         ESTABLISHED
tcp6       0      0 [::]:mysql              [::]:*                  LISTEN
tcp6       0      0 [::]:sunrpc             [::]:*                  LISTEN
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
tcp6       0      0 [::]:nrpe               [::]:*                  LISTEN
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50950         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50964         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50938         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50940         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51010         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50994         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50986         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:44110         ESTABLISHED
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50984         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50978         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51030         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50954         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51032         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51042         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50996         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51046         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51000         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50942         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51004         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:44108         ESTABLISHED
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50992         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50988         TIME_WAIT

This makes us realize that we may have exhausted the TCP connections on the TCP layer, because the number of TCP sessions has increased, and these sessions remain open until the time ﹐ wait timeout occurs. We wrote a related blog before (please refer to the link at the end of the article). This gives you a good idea of what a "time \. We initially tried to fine tune the port range IP [local] port [range] and adjust some kernel related options, such as TCP [TW] reuse, but unfortunately, we failed and still had the same problem. When checking network traffic, it was found that the host sent a large number of requests to the DNS server defined in / etc/resolv.conf. When it comes to network traffic detection, because the network infrastructure is not managed by us, we are unable to verify some things in the network layer. We confirmed from the customer's IT department that no errors were found at the network level. What we can do is to detect packets of traffic in and out of MySQL. Tcpdump helps to identify a large number of DNS requests and slow responses. The first command used for packet detection on the DB node is tcpdump dst port 3306 or src port 3306, and then use more specific filtering rules to exclude and filter out useless information, such as the traffic between the master server and the slave server. At that time, another thought occurred to verify that mysqld was trying DNS resolution for any reason. This explains one of the first problems. Check the variable skip ﹣ name ﹣ resolve and we find that it is set to ON, so mysqld should not perform any type of DNS resolution.

db4-atsaloux (none)> select @@skip_name_resolve;
+---------------------+
| @@skip_name_resolve |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

In order to further debug what MySQL actually does, we strace mysqld process.

Root cause

We noticed that the mysqld process accessed the / etc/hosts.allow and / etc/hosts.deny files too frequently.

root@db4-atsaloux:~# strace -e open,read -p$(pidof mysqld)
strace: Process 693 attached
# /etc/hosts.deny: list of hosts that are _not_ allowed to access the system.
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
# /etc/hosts.allow: list of hosts that are allowed to access the system.
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0

As we can see, some new connections take a long time to connect to MySQL. strace on mysqld pid shows frequent access to / etc/hosts.allow and / etc/hosts.deny. These files are directly related to TCP wrappers! Many system administrators think that TCP wrappers are outdated software (software development has stopped, but there are many alternatives), but they are still widely used. When using TCP wrappers, you must check each new connection based on the ACL and decide whether to allow remote hosts to connect to the service based on this ACL. During troubleshooting, it is found that DNS resolution has nothing to do with MySQL's skip ﹣ name ﹣ resolve function or mysqld itself. We know that this is not actually a mysqld problem. MySQL has started and can handle requests. Personally, I think it's a fatal weakness of "designed" software. Continue to check and find that there is a wrong DNS configuration / etc/resolv.conf, so when DNS response is slow or DNS does not respond, TCP wrappers make the connection to mysql stagnate or be discarded while waiting for DNS response.

summary

1. If you need to use TCP wrappers for any reason, please always note that any DNS problems may cause a pause or interrupt. 2. With TCP wrappers, even if skip ﹣ name ﹣ resolve is enabled, resolution will occur.

  • Even if you don't have TCP wrappers installed, some operating systems, such as some versions of CentOS or Ubuntu, are available by default.
  • If you want to set security rules for MySQL, be careful. Note that not every binary can use them. Binaries should be connected to the library of TCP wrappers. 3. If the MySQL service has TCP wrappers enabled and you really need them, you should ensure that DNS response is fast and secure, even if it is not easy to manage.
  • Add to the / etc/hosts file for hosts connected to mysqld service, so that no real DNS resolution will be done for each connection.
  • There are also some best times to configure DNS resolution. In my opinion, you should configure at least multiple DNS servers in / etc/resolv.conf, and you should use some local servers or servers closest to Linux servers, and possibly enable caching. 4. If you encounter a similar problem, you can check whether mysqld is built for TCP wrappers, execute ldd on mysqld binary, and then check whether it is connected to the TCP wrappers library. If yes, and the same problem is encountered, check the DNS configuration of the system and whether DNS is performing properly.
[RDBA] percona@db4-atsaloux: ~ $ ldd /usr/sbin/mysqld | grep libwrap
libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007fa80532c000)

Related documents and blog links: <Communication Errors and Aborted Connections> https://dev.mysql.com/doc/refman/5.7/en/communication-errors.html <Application Cannot Open Another Connection to MySQL> https://www.percona.com/blog/2014/12/08/what-happens-when-your-application-cannot-open-yet-another-connection-to-mysql/>)

Topics: Database MySQL DNS network ssh