Fault case | how does lsof "affect" MySQL to calculate the number of open file handles

Posted by orudie on Thu, 09 Dec 2021 13:44:39 +0100

Welcome to the MySQL technical articles shared by the great SQL community. If you have any questions or want to learn, you can leave a message in the comment area below. After seeing it, you will answer it

Different parameters added to lsof will produce different results. Be careful of "stepping on the pit".

1. Background:

It was accidentally found that the database could not be connected. In the err log of the database, there was an error of "Too many open files". We all know that the mysqld process triggered the handle restriction, resulting in the failure to establish a new connection.

Du Niang found the command to count the number of handles

lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr| head -n 10

It is found that the output results far exceed the results of ulimit -n. However, the error reporting phenomenon does not last long. Under normal circumstances, it is impossible to produce so many handles.

In order to study the root cause of this problem, the following tests are simulated on your own server.

The server is configured with 4C8G, MySQL 8.0.22, and lsof version 4.87 (why is lsof version specially introduced? It will be mentioned later, which is very important)

2. Test process:

1. Install MySQL on the server, set the following parameters, and restart the database service

max_connections=150
open_files_limit=1000

At this point, the database process is open_ files_ The limit is 2160

mysql> show variables like '%open%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| have_openssl               | YES   |
| innodb_open_files          | 1000  |
| mysqlx_port_open_timeout   | 0     |
| open_files_limit           | 2160  |
| table_open_cache           | 1000  |
| table_open_cache_instances | 16    |
+----------------------------+-------+
6 rows in set (0.01 sec)

The calculation formula is table_ open_ cache*2+max_ For details of connections + 10, see the official website below

2. At this time, the database has just started. Check the number of handles occupied by the process

[root@greatdb mysql]# ps -ef| grep mysql
root      6239  8644  0 16:25 pts/5    00:00:00 mysql -uroot
root     10134  8260  0 16:42 pts/1    00:00:00 mysql -uroot
root     10177     1  0 16:47 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysql/my.cnf
mysql    11604 10177  8 16:47 ?        00:00:02 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysql/logs/mysqld-error.log --ope
n-files-limit=1000 --pid-file=/data/mysql/mysqld.pid --socket=/data/mysql/mysql.sock --port=3306root     11696  8572  0 16:47 pts/2    00:00:00 grep --color=auto mysql
[root@greatdb mysql]# 
[root@greatdb mysql]# 
[root@greatdb mysql]# lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr| head -n 10
   5727 11604
   1575 1384
    525 1550
    455 879
    371 561
    135 1101
    130 1001
    120 16780
    114 1304
     88 16894
[root@greatdb mysql]#

You can see that the database just started. The number of handles 5727 counted by the above command has exceeded the set value of 2160, but the database can still be used and connected normally.

2. At this point, enter the database and modify the maximum connection limit to simulate the real situation of exceeding the handle

set global max_connections=10000

3. Create three partition tables in the database to quickly occupy the number of file handles

create database test;
create table test.tb1(id int, c varchar(10)) partition by hash(id) partitions 1024;
create table test.tb2(id int, c varchar(10)) partition by hash(id) partitions 1024;
create table test.tb3(id int, c varchar(10)) partition by hash(id) partitions 1024;

4. At this point, use the above command again to query the number of handles

[root@greatdb mysql]# lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr| head -n 10
  73485 11604
   1575 1384
    525 1550
    455 879
    371 561
    135 1101
    130 1001
    120 16780
    114 1304
     88 16894
[root@greatdb mysql]#

5. You can see that the number of handles "occupied" by mysql processes has reached 73485, far exceeding the actual allowable value. You begin to doubt that this statistical value is wrong.

6. Use another command lsof -p 11604 |wc -l; ls /proc/11604/fd |wc -l to count handles and open files. It is found that it is normal and within a reasonable range

[root@greatdb mysql]# lsof -p 11604 |wc -l;  ls /proc/11604/fd |wc -l
1066
1021
[root@greatdb mysql]#

7. Finally, the difference is found between lsof -n | head and lsof -p 11604 |head. The result of lsof -n has an additional column of TID. After observation, it is found that it is the thread number, its value and performance_ Thread in schema.threads_ ID can correspond to

[root@greatdb mysql]# lsof -p 11604 | head
COMMAND   PID  USER   FD      TYPE             DEVICE   SIZE/OFF    NODE NAME
mysqld  11604 mysql  cwd       DIR              253,1       4096 1310723 /data/mysql/data
mysqld  11604 mysql  rtd       DIR              253,1       4096       2 /
mysqld  11604 mysql  txt       REG              253,1  441771255  927033 /usr/local/mysql/bin/mysqld
mysqld  11604 mysql  DEL       REG               0,11            6526074 /[aio]
mysqld  11604 mysql  DEL       REG               0,11            6526073 /[aio]
mysqld  11604 mysql  DEL       REG               0,11            6526072 /[aio]
mysqld  11604 mysql  DEL       REG               0,11            6526071 /[aio]
mysqld  11604 mysql  DEL       REG               0,11            6526070 /[aio]
mysqld  11604 mysql  DEL       REG               0,11            6526069 /[aio]
[root@greatdb mysql]# 
[root@greatdb mysql]# 
[root@greatdb mysql]# lsof -n | head 
COMMAND     PID   TID    USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd       1          root  cwd       DIR              253,1       4096          2 /
systemd       1          root  rtd       DIR              253,1       4096          2 /
systemd       1          root  txt       REG              253,1    1632744     136135 /usr/lib/systemd/systemd
systemd       1          root  mem       REG              253,1      20064     132877 /usr/lib64/libuuid.so.1.3.0
systemd       1          root  mem       REG              253,1     265576     134383 /usr/lib64/libblkid.so.1.1.0
systemd       1          root  mem       REG              253,1      90176     132861 /usr/lib64/libz.so.1.2.7
systemd       1          root  mem       REG              253,1     157424     132876 /usr/lib64/liblzma.so.5.2.2
systemd       1          root  mem       REG              253,1      23968     133092 /usr/lib64/libcap-ng.so.0.0.0
systemd       1          root  mem       REG              253,1      19896     133076 /usr/lib64/libattr.so.1.1.0

8. After consulting with many parties and consulting materials, it is learned that after lsof version 4.82, the handle of each thread will be counted by default, that is, the - K parameter is added by default. After this implementation, the data counted by the previous command lsof -n|awk '{print }'|sort|uniq -c|sort -nr| head -n 10 is approximately equal to the actual number of handles * the number of threads, that is, the result we saw earlier, The statistics are large.

9. The correct syntax of the new lsof version statistics handle is lsof -n -Ki|awk '{print .}'|sort|uniq -c|sort -nr| head -n 10. The data is normal after statistics

[root@greatdb mysql]# lsof -Ki -n| awk '{print $2}' | sort | uniq -c | sort -nr | head
   1065 11604
    130 1001
     91 879
     84 8643
     84 8641
     84 8622
     84 8603
     84 8570
     84 8258
     84 8156

10. Create database session connections in batch by using the circular method. It is found that after the handle reaches the set limit, the number of handles will exceed 45. After that, manually create the connection and wait all the time. The error "Too many open files" appears in the MySQL err log

for x in {1..1139}; do nohup mysql -uroot -e 'select sleep(240)' & done

11. The number of handles occupied by the query process when a new connection cannot be established

[root@greatdb mysql]# lsof -p 11604 |wc -l
2205

12. So far, we have found the correct method to count the process handle, calculate the upper limit calculation method of MySQL handle, and accurately reproduce the abnormal scene.

3. Summary:

  • lsof counts the handle data, which is closely related to the version. If it is a version after 4.82, the number of process handles needs to be counted, and the parameter - Ki needs to be added
  • The number of handles occupied during MySQL startup is 84, when table_ open_ When the cache is set to 1000, a maximum of 1065 handles are occupied except for connection handles, mainly including shared library so, ibd files, log files, temporary files, etc. each link occupies a separate handle
  • The maximum number of handles of MySQL process is the parameter open_files_limit + 45, this is tested, and the source of the specific code is not found
  • The handle of MySQL is limited by the parameter max_connections,table_open_cache is affected jointly and is not limited by the operating system (when the operating system ulimit -n 1024 is tested, MySQL's open_files_limit can still be 2160)
  • File handles are shared between threads, so opening an ibd file, no matter how many session accesses, will occupy only one handle

Reference articles

https://github.com/rapidoid/r...

https://github.com/draios/sys...

https://dev.mysql.com/doc/ref...

Enjoy GreatSQL :)

Article recommendation:

Technology sharing | MGR best practice
https://mp.weixin.qq.com/s/66...

Technology sharing | the way to repair MGR Bug in Wanli database
https://mp.weixin.qq.com/s/Ia...

Macos system compiler percona and the difference of some functions on Macos system
https://mp.weixin.qq.com/s/jA...

Technology sharing | using systemd to manage single machine and multiple instances of MySQL
https://mp.weixin.qq.com/s/iJ...

Product | GreatSQL to create a better MGR ecology
https://mp.weixin.qq.com/s/By...

Product | GreatSQL MGR optimization reference
https://mp.weixin.qq.com/s/5m...

About GreatSQL

GreatSQL is a MySQL branch maintained by Wanli database. It focuses on improving the reliability and performance of MGR and supports the parallel query feature of InnoDB. It is a branch version of MySQL suitable for financial applications.

Gitee:
https://gitee.com/GreatSQL/Gr...

GitHub:
https://github.com/GreatSQL/G...

Wechat & QQ group:

You can scan code to add GreatSQL community assistant wechat friends, send verification information "add group" to join GreatSQL/MGR exchange wechat group, or directly scan code to join GreatSQL/MGR exchange QQ group.

This article is composed of blog one article multi posting platform OpenWrite release!

Topics: Database MySQL SQL