Nginx open_file_cache module file descriptor cache

Posted by pdaoust on Thu, 10 Mar 2022 05:12:01 +0100

In my previous blog, I introduced nginx caching, but nginx also has a very important caching function, which is only called open for open file handles and source information_ file_ cache,open_ file_ Cahce is also very helpful for us to optimize the performance of nginx.

NGINX has optimized static content. But in the case of high traffic websites, you can still use open_file_cache further improves performance. NGINX cache stores the recently used file descriptor and related metadata (such as modification time, size, etc.) in the cache. The cache does not store the contents of the requested file.


The following is just a general list of what is cached. Each cache corresponds to its source code. You can see that the file handle fd is cached. Caching the file handle means that you don't have to close one file and open another file every time, which reduces the operation of system call.

At the same time, the file size and file modification time are also cached. Error messages may be encountered when querying the file, and these messages are also cached. For example, 403, you don't need to open the file next time to determine whether it fails. Directories and so on are also cached.

The cached file meta information includes:

fd, after the file is opened once, fd remains for use
size
path
last modified time
...


Enable open of Nginx_ file_ The cache instruction can cache the open file handle, thus saving expensive open() system calls. By expanding the capacity of this cache, the actual hit rate on the line can be improved. However, the larger the cache capacity, the better. For example, when the capacity reaches 20000 elements, the lock of shared memory becomes a bottleneck. (the open_file_cache related configuration of Nginx can cache the meta information of static files, which can significantly improve the performance when these static files are accessed frequently)


open_file_cache module
1 open_file_cache
Enabling this directive will store a cache of the following information:

Open file descriptor and related metadata, such as size, modification time, etc
There are any errors related to finding files and directories, such as "permission denied", "file not found", etc
The cache defines a fixed size and removes least recently used (LRU) elements during overflow.
The cache evicts elements after a period of inactivity. This directive is disabled by default.
Examples are as follows:

http{
open_file_cache max=1000 inactive=20s;
}


In the above configuration, a cache is defined for 1000 elements. The inactive parameter configures an expiration time of 20 seconds. It is not necessary to set an inactive time period for this instruction. By default, the inactive time period is 60 seconds.

2 open_file_cache_valid

Syntax: open_file_cache_valid time;
Default: open_file_cache_valid 60s;
Context: http, server, location


NGINX open_file_cache saves a snapshot of information. Because the information changes at the source, the snapshot may become invalid after a period of time. open_file_cache_ The valid instruction defines the time period (in seconds) after which the open is revalidated_ file_ Elements in cache. By default, the element is rechecked after 60 seconds. Examples are as follows:

http{
open_file_cache_valid 30s;
}

 

After 60s, check whether the cached content is valid. If there is an update, it needs to be updated. The reason for this is that although the file is cached, other processes, such as users or other services, are not modifying the file through nginx, which will lead to the file pointed to by the fd handle of nginx cache is not the latest file, especially when the configuration time is very large and the number of cached file handles is very large, it is likely that the customer will get the expired file, Therefore, to set this time, we should ensure that after this time, if the files on the disk change, we can also get new files.

If your static file content changes frequently and requires high timeliness, you should generally open it_ file_ cache_ The value setting should be smaller so that it can be detected and updated in time.
If the changes are not frequent, you can set it larger and use reload nginx to forcibly update the cache after the changes.
If you don't care about the error and access log of static file access, you can turn it off to improve efficiency.

3 open_file_cache_min_uses

Syntax: open_file_cache_min_uses number;
Default: open_file_cache_min_uses 1;
Context: http, server, location

 


This directive can be used to configure the minimum number of accesses to mark an element as active. By default, the minimum number of accesses is set to 1 or more (at least how many accesses are required to remain in the cache). The following example

http{
open_file_cache_min_uses 4;
}
4 open_file_cache_errors
Syntax: open_file_cache_errors on | off;
Default: open_file_cache_errors off;
Context: http, server, location

 

 

Whether to cache the information of some file access errors. The default is off
As mentioned earlier, NGINX can cache errors that occur during file access. But this needs to be done by setting open_file_cache_errors directive. If error caching is enabled, NGINX will report the same error when accessing resources (not looking for resources). By default, the error cache is set to off.

http{
open_file_cache_errors on;
}


Here is a configuration example:

open_file_cache max=64 inactive=30d;
open_file_cache_min_uses 8;
open_file_cache_valid 3m;


max=64 indicates that the maximum number of cache files is set to 64. After exceeding this number, Nginx will discard the cold data according to the LRU principle.

inactive=30d and open_file_cache_min_uses 8 means that if a file is accessed less than 8 times in 30 days, it will be deleted from the cache.

open_file_cache_valid 3m means to check whether the file meta information in the cache is up-to-date every 3 minutes. If not, update it.
The general recommended configuration is

open_file_cache max=10000 inactive=30s;
open_file_cache_valid 60s;
open_file_cache_min_uses 2;
open_file_cache_errors on;


Why cache only file meta information instead of file content?
The key to this problem is sendfile(2)

Nginx uses sendfile(2) when serving static files. Of course, the premise is that you configure sendfile on. sendfile(2) directly transmits data in the kernel space. Compared with using read(2)/write(2), it saves two data copies between the kernel space and the user space. At the same time, the contents of these frequently read static files will be cached in the kernel space by the OS. Under this mechanism, we have the fd and size of the file in the cache, so we can call sendfile(2) directly.

If you want Nginx to cache the contents together, you need to use read(2) to copy the data from kernel space to user space every time the file changes, and then put it in user space. Each time you answer the request, you need to copy the data from user space to kernel space, and then write it to the socket. Compared with the previous way, this way has no advantages.

 

After saying so much, let's see the effect
Do not open_file_cache, comment out the instruction

server {
listen 80;
server_name www.test.com;
charset utf-8;
root html;
location / {
# open_file_cache max=10 inactive=60s;
# open_file_cache_min_uses 1;
# open_file_cache_valid 60s;
# open_file_cache_errors on;
}
}

[root@www ~]# /usr/local/nginx/sbin/nginx -s reload
[root@www ~]# ps -ef | grep nginx | grep -v grep | grep -v master
nginx 57654 55384 0 10:26 ? 00:00:00 nginx: worker process

#strace can track system calls and follow up access at this time. My nginx has only one worker process
[root@www ~]# strace -p 57654 #At this time, no one can visit. You can see it hanging in epoll_wait up
strace: Process 57654 attached
epoll_wait(10,


[root@www ~]# curl localhost:80 Now start visiting
[root@www ~]# strace -p 57654
strace: Process 57654 attached
epoll_wait(10,

[{EPOLLIN, {u32=13620784, u64=13620784}}], 512, -1) = 1
accept4(7, {sa_family=AF_INET, sin_port=htons(37808), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_NONBLOCK) = 4
epoll_ctl(10, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=13621248, u64=13621248}}) = 0
epoll_wait(10, [{EPOLLIN, {u32=13621248, u64=13621248}}], 512, 60000) = 1
recvfrom(4, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 1024, 0, NULL, NULL) = 73
stat("/usr/local/nginx/html/index.html", {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
setsockopt(4, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(4, [{"HTTP/1.1 200 OK\r\nServer: nginx/1"..., 253}], 1) = 253
sendfile(4, 5, [0] => [612], 612) = 612
write(6, "127.0.0.1 - - [08/Jun/2020:10:29"..., 90) = 90
close(5) = 0
setsockopt(4, SOL_TCP, TCP_CORK, [0], 4) = 0
epoll_wait(10, [{EPOLLIN|EPOLLRDHUP, {u32=13621248, u64=13621248}}], 512, 65000) = 1
recvfrom(4, "", 1024, 0, NULL, NULL) = 0
close(4) = 0
epoll_wait(10,

Can see epoll_ait Back accept Established a new TCP Connect, then call recvfrom Get the requested content, and then use stat Check whether the content of the home page you visit exists. You can see that it exists and the size is st_size=612 Bytes, and the access rights are st_mode=S_IFREG|0644 0644. Then open the file to get the handle open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 5,Now take this handle and return the data. When the data is returned, it is passed sendfile(4, 5, [0] => [612], 612) = 612. 

 



It should be noted here that sennfile is a key point of optimization, because sendfile is a 0 copy technology, that is, there is no need to read the user state in the disk, send it to the kernel state in the user state, and then send it from the network card. Instead, with the sendfile call, it directly tells the file and the file offset, and directly sends the content on the disk to the network card in the kernel state, So this performance is very high.
However, open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 5 and
close(5) is not necessary, because it is not necessary to open and close sendfile s, because nginx performance is not necessary to open as a user state (this is the key point of our optimization).
Send the data to the client through senfile and do epoll again_ wait(10, [{EPOLLIN|EPOLLRDHUP, {u32=13621248, u64=13621248}}], 512, 65000) = 1,epoll_ Wait (10), wait here for future requests.
open_file_cache, start instruction

server {
listen 80;
server_name www.test.com;
charset utf-8;
root html;
location / {
open_file_cache max=10 inactive=60s;
open_file_cache_min_uses 1;
open_file_cache_valid 60s;
open_file_cache_errors on;
}
}

#nginx reload
[root@www ~]# curl localhost:80
[root@www ~]# strace -p 58695
strace: Process 58695 attached
epoll_wait(10, [{EPOLLIN, {u32=13521056, u64=13521056}}], 512, -1) = 1
accept4(7, {sa_family=AF_INET, sin_port=htons(37812), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_NONBLOCK) = 3
epoll_ctl(10, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=13521520, u64=13521520}}) = 0
epoll_wait(10, [{EPOLLIN, {u32=13521520, u64=13521520}}], 512, 60000) = 1
recvfrom(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 1024, 0, NULL, NULL) = 73
open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
setsockopt(3, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(3, [{"HTTP/1.1 200 OK\r\nServer: nginx/1"..., 253}], 1) = 253
sendfile(3, 8, [0] => [612], 612) = 612
write(5, "127.0.0.1 - - [08/Jun/2020:10:48"..., 90) = 90
setsockopt(3, SOL_TCP, TCP_CORK, [0], 4) = 0
epoll_wait(10, [{EPOLLIN|EPOLLRDHUP, {u32=13521520, u64=13521520}}], 512, 65000) = 1
recvfrom(3, "", 1024, 0, NULL, NULL) = 0
close(3) = 0
epoll_wait(10,
#For the first time, there is no cache. You can see that it is used
open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 8
close(3)
#Visit again [root@www ~]# curl localhost:80
[{EPOLLIN, {u32=13521056, u64=13521056}}], 512, -1) = 1
accept4(7, {sa_family=AF_INET, sin_port=htons(37816), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_NONBLOCK) = 3
epoll_ctl(10, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=13521521, u64=13521521}}) = 0
epoll_wait(10, [{EPOLLIN, {u32=13521521, u64=13521521}}], 512, 60000) = 1
recvfrom(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 1024, 0, NULL, NULL) = 73
stat("/usr/local/nginx/html/index.html", {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
setsockopt(3, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(3, [{"HTTP/1.1 200 OK\r\nServer: nginx/1"..., 253}], 1) = 253
sendfile(3, 8, [0] => [612], 612) = 612
write(5, "127.0.0.1 - - [08/Jun/2020:10:50"..., 90) = 90
setsockopt(3, SOL_TCP, TCP_CORK, [0], 4) = 0
epoll_wait(10, [{EPOLLIN|EPOLLRDHUP, {u32=13521521, u64=13521521}}], 512, 65000) = 1
recvfrom(3, "", 1024, 0, NULL, NULL) = 0
close(3)
You can see that the second access receives the request and is called directly sendfile(3, 8, [0] => [612], 612) = 612
I didn't see anything above
open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
In fact, system calls are reduced here. When nginx This is very helpful when the number of visits is very large. and open_file_cache It is not only used for returned static files, but also valid for all open file types, whether log files or cache files. Therefore, we should confirm our use environment if resource files are often used nginx If the access and modification of other processes often change, you need to set the timeout correctly. (when our files are frequently nginx When a process other than is modified, it needs to ensure that its timeout is reasonable and can be accepted by the business scenario)

 



from: https://blog.csdn.net/qq_34556414/article/details/106660101

Topics: Nginx