[Java from 0 to architect] Nginx - basic and common configurations (reverse proxy, load balancing, dynamic and static separation)

Posted by manoj_jnics1 on Thu, 07 Oct 2021 23:32:04 +0200

Java from 0 to architect Directory: [Java from 0 to architect] learning record

Nginx Foundation

Nginx official website: http://nginx.org/

Network requests are divided into dynamic requests and static requests:

Dynamic request: it requires background program processing logic, such as querying database data
Static request: request some resources (html, css, js, png...)

Nginx: network static resource server, which can only process static requests
Tomcat: can handle dynamic requests + static requests

Tomcat supports up to 150 concurrency by default, and the data given on the official website of Nginx supports up to 5w concurrency
Tengine , Taobao is a server developed based on Nginx

Agent:

Forward proxy: for example, when you access YouTube, you can't directly access it in China. You need to bypass the domestic firewall through some proxy software. These wall climbing software are forward proxy (proxy client)
Reverse proxy: there are many servers in the background of Youtube. When users visit, YouTube hands over the user's request to one of the N servers in the background. This is the reverse proxy (proxy server)

For more network related knowledge reference: Network protocol from entry to bottom principle - agent

Common load balancing strategies:

Use hardware for load balancing, such as F5, Array and other load balancers
Use software for load balancing, such as Nginx and Tengine
Using alicloud load balancing SLB
Using Nginx + Keepalived
Other software load balancing technologies, such as LVS (Linux Virtual Server), HAProxy, etc

Advantages of Nginx:

Nginx can be compiled and run on most Unix Linux OS with Windows porting
Nginx is open source and free
Nginx can support responses with up to 50000 concurrent connections
Tomcat is about 300 by default
...

Nginx environment construction

Nginx official website: http://nginx.org/ , download the installation package

# Download the required dependent library files
yum install pcre -y
yum install pcre-devel -y
yum install zlib -y
yum install zlib-devel -y

# Unzip the installation package
tar -zxvf nginx-1.18.0.tar.gz -C /usr/local/src

# configure to check whether an error is reported
cd /usr/local/src/nginx-1.18.0
./configure --prefix=/usr/local/nginx

# Compile and install
make && make install

Note: if this error occurs. / configure: error: C compiler cc is not found
Execute this command: yum -y install gcc gcc-c++ autoconf automake make

Nginx related operations:

# Start command
/usr/local/nginx/sbin/nginx

# close command
/usr/local/nginx/sbin/nginx -s stop

# Restart command
/usr/local/nginx/sbin/nginx -s reload

# Regenerate log files
/usr/local/nginx/sbin/nginx -s reopen

# View port
netstat -ano | grep 80

To access the browser: http://192.168.52.128

In the / usr/local/nginx directory, you can see the following four directories:

conf configuration file
html web page file
logs log file
sbin main binary program

Nginx basic concepts

Master and Worker processes will be started in Nginx

Main work of Master process:

Receive requests from clients
Send a signal to the Worker process
Monitor the status of the Worker process
When the Worker process exits, a new Worker process will be started automatically

The main work of the Worker process is to handle the connection of the client

In the configuration file, you can configure:

# The number of Worker processes configured is related to the CPU kernel
worker_process 2;

Nginx configuration file structure

main: global setting. The set instruction will affect all other settings

events
http
- Upstream: upstream server settings, used for load balancing, and set up a series of back-end servers
- server: virtual host settings, mainly used to specify specific hosts and ports
  - Location: the URL matches the setting of a specific location, which is used to match the location of the web page

Server inherits main, location inherits server, and upstream neither inherits other settings nor is it inherited

Nginx configuration file general syntax

Nginx general syntax:

The configuration file consists of "instructions" (; ending statements) and "instruction blocks" ({})
Each instruction is marked with a semicolon; At the end, the instruction and parameter are separated by a space symbol
Each instruction block organizes multiple instructions together with curly braces {}
The include statement allows multiple configuration files to be combined to improve maintainability
Use # symbols to add comments to improve readability
Use the $symbol to reference variables
Some instruction arguments support regular expressions

nginx.conf common configuration

Virtual host configuration - server

# nginx virtual host configuration
server {
	listen 80; # Listening port
	server_name localhost; # Generally write domain names (one IP corresponds to multiple domain names)
	
	# Find the path of the resource
	location / {
		root html; # Relative path, html directory under nginx directory
		index index.html index.htm; # Page accessed by default
	}
}

Log file - log_format,access_log

The nginx access log is placed under logs/access.log, and the main format is used (user-defined format is supported)

# The main format is configured as follows
# The log file output format is equivalent to the global setting 
# log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
#                  '$status $body_bytes_sent "$http_referer" '
#                  '"$http_user_agent" "$http_x_forwarded_for"';

# access_log  logs/access.log  main;

tail -n 100 -f access.log # View log content command

Log file segmentation

After the system runs, we need to analyze the nginx log to get the response time-consuming url, request time, and the amount of requests and concurrency during this period, and find the scheme to optimize the system through analysis; Therefore, the operation and maintenance personnel are required to cut, analyze and process the nginx logs. We will regularly back up the logs by day / hour through the timer.

Ideas for realizing regular log backup:

Step 1: analyze how to implement log segmentation and write shell scripts
Step 2: schedule the script with a scheduled task

1. The log backup script backuplog.sh is placed in the sbin directory

#!/bin/sh

BASE_DIR=/usr/local/nginx
BASE_FILE_NAME=access.log

CURRENT_PATH=$BASE_DIR/logs
BAK_PATH=$BASE_DIR/datalogs

CURRENT_FILE=$CURRENT_PATH/$BASE_FILE_NAME
BAK_TIME=`/bin/date -d yesterday +%Y%m%d%H%M`
BAK_FILE=$BAK_PATH/$BAK_TIME-$BASE_FILE_NAME
mv $CURRENT_FILE $BAK_FILE

$BASE_DIR/sbin/nginx -s reopen

2. Execute scheduled task scheduling: crontab - E * / 1 * * SH / usr / local / nginx / SBIN / backlog.sh

Note: copy the script from Windows to Linux, and $'\ r': command not found may appear when executing the script. You need to convert the format: yum install dos2unix -y, dos2unix script file

Resource path matching - location

rule of grammar: location [=|~|~*|^~] /uri/ { ... }
	= The beginning indicates an exact match
	^~ Opening representation uri Start with a regular string,Understood as matching url Path is OK
		nginx incorrect url Do coding,So the request is/static/20%/aa,Can be rules^~ /static/ /aa Match to(Notice the spaces)
	~ The beginning represents a case sensitive regular match
	~*  The beginning represents a case insensitive regular match
	/ Universal matching,Any request will match

Multiple location In case of configuration, the matching order is:
=   --->   ^~   --->   ~/~*   --->   /

Matching cases:

For example, there are the following matching rules:

location = / {
   # Rule A
}
location = /login {
   # Rule B
}
location ^~ /static/ {
   # Rule C
}
location ~ \.(gif|jpg|png|js|css)$ {
   # Rule D
}
location ~* \.png$ {
   # Rule E
}
location / {
   # Rule H
}

Matching effect:

http://localhost / # rule A
http://localhost/login # rule B
http://localhost/register # rule H
http://localhost/static/a.html # rule C
http://localhost/a.gif # rule D
http://localhost/b.jpg # rule D
http://localhost/static/c.png # rule C
http://localhost/a.PNG # rule E
http://localhost/a.xhtml # rule H

Actual application configuration:

# Directly match the website root and visit the website home page through the domain name more frequently. Using this will speed up the processing
# This is directly forwarded to the back-end application server, or it can be a static home page
# First required rule
location = / {
    proxy_pass http://192.168.52.128:9999/index;
}
 
# The second required rule is to handle static file requests, which is the strength of nginx as an http server
# There are two configuration modes, directory matching or suffix matching. Choose one or use it together
location ^~ /static/ {
    root /webroot/static/;
}
location ~* \.(gif|jpg|jpeg|png|css|js|ico)$ {
    root /webroot/res/;
}
 
# The third rule is the general rule, which is used to forward dynamic requests to the back-end application server
# Non static file requests are dynamic requests by default. You can grasp them according to the actual situation
# After all, the current popularity of some frameworks with. PHP and. JSP suffixes is rare
location / {
    proxy_pass http://tomcat:8080/
}

global variable

Official configuration: http://nginx.org/en/docs

$args						#Parameter value in request
$query_string				#Same as $args
$arg_NAME				#Value of NAME in GET request
$is_args						#If there are parameters in the request, the value is "?", otherwise it is an empty string
$uri							#The current uri in the request (without request parameters, the parameters are in $args) can be different from the $request passed by the browser_ The value of uri, which can be modified through internal redirection or using the index instruction. The $uri does not contain the host name, such as "/ foo/bar.html".
$document_uri			#Same as $uri
$document_root			#The root directory or alias of the currently requested document
$host						#Priority: host name of HTTP request line > "host" request header field > server name matching the request
$hostname                #host name
$https						#If SSL security mode is enabled, the value is "on", otherwise it is an empty string.
$binary_remote_addr	#The binary form of the client address, with a fixed length of 4 bytes
$body_bytes_sent		#The number of bytes transmitted to the client, and the response header is not included; This variable is the same as Apache's Mod_ log_ The '% B' parameter in the config module remains compatible
$bytes_sent				#Number of bytes transmitted to the client
$connection				#Serial number of the TCP connection
$connection_requests	#Current number of requests for TCP connections
$content_length			#"Content length" request header field
$content_type            #"Content type" request header field
$cookie_name			#cookie name
$limit_rate					#Used to set the speed limit of the response
$msec						#Current Unix timestamp
$nginx_version			#nginx version
$pid                     		#PID of work process
$pipe                    	#If the request comes from pipeline communication, the value is "p", otherwise it is "."
$proxy_protocol_addr#Get the client address of the proxy access server. If it is direct access, the value is an empty string
$realpath_root           #The real path of the currently requested document root or alias converts all symbolic connections to real paths
$remote_addr			#Client address
$remote_port      		#Client port
$remote_user     		#The user name used for the HTTP basic authentication service
$request                 	#Request address on behalf of the client
$request_body  			#Request principal of client: this variable can be used in location to pass the request principal through proxy_pass，fastcgi_pass，uwsgi_pass and scgi_pass to the next level proxy server
$request_body_file    	#Save the client request body in a temporary file. After file processing, this file needs to be deleted. If you need to enable this function, you need to set the client_body_in_file_only.  If you pass the secondary file to the back-end proxy server, you need to disable the request body, that is, set the proxy_pass_request_body off，fastcgi_pass_request_body off，uwsgi_pass_request_body off，or scgi_pass_request_body off
$request_completion 	#If the request is successful, the value is "OK". If the request is not completed or the request is not the last part of a range request, it is empty
$request_filename    	#The file path of the current connection request is generated by the root or alias instruction and URI request
$request_length     	#Length of the request (including the address of the request, http request header and request body)
$request_method    	#HTTP request method, usually "GET" or "POST"
$request_time         	#The time used to process the client request; Timing starts from reading the first byte of the client
$request_uri          		#This variable is equal to the original uri containing some client request parameters, which cannot be modified. Please check the $uri to change or rewrite the uri, excluding the host name, for example: "/ cnphp / test. PHP? Arg = free"
$scheme              		#The requested Web protocol, "http" or "https"
$server_addr             #For the server-side address, it should be noted that in order to avoid accessing the linux system kernel, the ip address should be set in the configuration file in advance
$server_name  			#server name
$server_port      		#Server port
$server_protocol   		#The HTTP version of the server, usually "HTTP/1.0" or "HTTP/1.1"
$status           			#HTTP response code
$time_iso8601    		#ISO 8610 format for server time
$time_local       			#Server time (LOG Format)
$cookie_NAME     		#The cookie variable in the client request Header header, prefixed with "$cookie" plus the cookie name variable, and the value of this variable is the value of the cookie name
$http_NAME        		#Match any request header field; The second half of the variable NAME can be replaced by any request header field. For example, you need to obtain the HTTP request header in the configuration file: "accept language", $http_accept_language
$http_cookie				#cookie information
$http_post
$http_referer				#Reference address
$http_user_agent		#Client agent information
$http_x_forwarded_for#Equivalent to network access path. http://www.cnblogs.com/craig/archive/2008/11/18/1335809.html
$sent_http_NAME     	#Any HTTP response header field can be set; The second half of the variable NAME can be replaced with any response header field. If you need to set the response header content length, $sent_http_content_length is enough
$sent_http_cache_control
$sent_http_connection
$sent_http_content_type
$sent_http_keep_alive
$sent_http_last_modified
$sent_http_location
$sent_http_transfer_encoding

Reverse proxy configuration - proxy_pass

# Configure reverse proxy
# proxy_pass url;
proxy_pass http://192.168.52.128:8080;

# Note: after reverse proxy, obtain the client ip address as nginx server address
# You need nginx to forward and set the real ip address:
# Set the real ip address of the client 
proxy_set_header x-real-ip $remote_addr;

Load balancing configuration - upstream

# The list of servers for setting load balancing is configured in the Server module
# upstream myproject {
	# The weigth parameter represents the weight. The higher the weight, the greater the probability of being assigned
	# max_ Failure when Max_ Failures means that the back-end server is unavailable. The default value is 1. Set it to 0 to turn off the check
	# fail_timeout in future fail_ Within the timeout time, nginx will not send the request to the server that has been checked and marked as unavailable
# }

upstream p2p {   
	ip_hash;
	server 192.168.11.130:8888 weight=1 max_fails=2 fail_timeout=30s;
	server 192.168.11.194:8888 weight=1 max_fails=2 fail_timeout=30s;
	server 192.168.11.225:8888 weight=1 max_fails=2 fail_timeout=30s;
}

location / {
    proxy_pass http://p2p;
}

Load balancing configuration:

Round Robin polling, default
Least Connections
IP Hash performs Hash operation on the IP requested by the client
Generic Hash is a value of custom Hash
Random send

Load balancing may cause session loss: when nginx processes forwarding to different background servers, the session may be stored in server A, but the user's request is sent to server B, so it has no permission to operate

The solutions to the above problems are:

Scheme 1: requests from the same client are directly processed by the specified server
For example, use ip_ The hash rule processes the ip of the client to ensure that the same ip is processed by the same server
Scheme 2: Redis + Token is used as the distributed Session solution

Dynamic and static separation

Step 1: store the static resources in a directory that nginx can access
Part II: directory for configuring static resources

# Static resource access address 
location ~ .*\.(gif|jpg|jpeg|png|bmp|swf|css|js)$ {
    root /datas/crm/static/;
}
# Dynamic request access address
location / {
    proxy_pass http://localhost:8080;
}

Compression of static resources - gzip

Conditions met:

The HTTP header sent by the client must contain the accept encoding field, and its value contains the compression type gzip
Generally, browsers will send headers such as accept encoding: gzip, deflate and sdch
If the server enables gzip compression, the response header will contain content encoding: gzip. The client will judge whether the content returned by the server is gzip compressed content

# Turn on gzip compression    
gzip on;
# How big is the file to start compression
gzip_min_length 1k;
# The higher the compression level is, the more cpu will be consumed. The default value is 1
gzip_comp_level 2;
# The type of compressed file. For JPG and PNG images, the compression efficiency is not high
gzip_types text/plain application/javascript application/x-javascript text/css application/xml text/javascript;
# Judge whether decompression is required according to the HTTP header of the client
gzip_vary on;

Effect: after compression is enabled, the original 70k js resources are compressed to 18k

Topics: Java Nginx memcached

Programmer Think