Server architecture
The server system is Centos7
First, we need to know the external architecture of the system
General architecture:
1. Domain name - > ECS
2. Domain name - > CDN - > cloud server (OSS)
3. Domain name - > CDN - > ECS + RDS + Redis cache
4. Domain name - > CDN - > Load Balancing - > ECS + RDS (master-slave) + Redis cache
5. Domain name - > CDN - > WAF Firewall - > Load Balancing - > ECS + RDS (master-slave) + Redis cache
Then check the problems according to the actual situation step by step.
discover problems
1, Find problems
First, find the problem and determine which service has a problem in time, so as to facilitate the rapid positioning of the problem. Find the corresponding domain name and device
Zabbix monitoring hair nail alarm
Alibaba cloud monitoring alarm SMS
[[Alibaba cloud] Dear***,Cloud monitoring-Cloud database RDS edition<South China 1(Shenzhen)-*****-read-only>to<09:54>An alarm occurs, CPU Utilization rate (91%).88>=80),Duration 4 minutes
3.shell script email alarm
4. Other colleagues
Customer service and marketing colleagues report problems by telephone
2, Fast location problem
Network bandwidth (whether CDN is abnormal)
Does the domain name resolve to the origin
Log in to the Alibaba cloud CDN background to view the corresponding traffic
load balancing
Check whether the load balancing operates normally and whether the traffic is abnormal
Application layer server
Whether the load of ECS server is normal, whether the load of cpu and memory is too high, and whether the utilization rate of hard disk reaches 100%
Cache server
Whether the load of redis server is normal and how is the memory utilization
database server
Is the number of database connections normal
List all connection information of the current user;
show full processlist;
Kill the process and consume too long sql process
select concat('kill ', id, ';') from information_schema.processlist where command != 'Sleep' and time > 2*60 order by time desc;
Send the sql statement to the backend for analysis
Remote connection server
Problem: high CPU, high load and slow access (database is normal)
System level
View load
View load, CPU, memory, online time, high resource processes
# top Installation: yum -y install htop # htop
Check the top server load, memory consumption, df -h check the hard disk
top df
View nginx logs
If there are nginx logs, enter the nginx log directory
Sort by log size
Judge log access, corresponding duration, url, etc
cd /data/wwwroot/log ll -Srh tail -f XXX.XXX.COM-access.log Analyze logs to find the most IP Logs, most URL etc. GoAccess ,ELK View log in the background
View disk usage
df -h
View the current status of the disk
iostat -x -k 3 3 avg-cpu: %user %nice %system %iowait %steal %idle 3.70 0.00 2.25 0.41 0.00 93.64 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.01 0.83 0.30 1.48 11.34 12.13 26.30 0.01 6.15 7.41 5.89 0.24 0.04 vdb 0.00 0.17 0.02 0.28 0.08 2.75 19.15 0.00 3.22 2.01 3.29 0.26 0.01 vdc 0.10 0.84 3.09 0.56 105.22 20.57 68.94 0.02 7.96 3.29 33.74 1.33 0.49 If you find that the current disk is busy, check which one PID Busy: install yum install -y iotop # iotop -o -P -k -d 5
View external services and ports
# netstat -tunpl Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:62920 0.0.0.0:* LISTEN 29177/vsftpd tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 4393/httpd tcp 0 0 0.0.0.0:7300 0.0.0.0:* LISTEN 4697/php-fpm: maste
Check the specific PID
install yum install lsof lsof -p PID lsof -p 29177 lsof -p 4697
View system log
tail -400f /var/log/messages tail -f /var/log/messages tail -n100 /var/log/messages head -n100 /var/log/messages
View simplified thread tree
pstree -a >> /root/pstree.log
Network problems
ping domain name
ping www.XXX.com
View network nodes
Installation: yum install -y traceroute traceroute www.baidu.com
Problem: low CPU, high load, slow access (database)
Judgment database
1. Slow query
Check the slow query log, which may cause high load. Check the storage location according to the configuration file: log_slow_queries
2. Is there a system bottleneck
Upgrade system cpu, memory and hard disk,
Optimize the architecture, increase master-slave, one master and many slaves, etc.
3. Are there too many sleep connections
show full processlist;
4. View the maximum number of connections
View the maximum number of connections set show variables like 'max_connections'; Reset maximum connections set GLOBAL max_connections=300
Nginx protection basic command
If there are some abnormal accesses, you can join the WAF cooperating with Alibaba cloud.
IP address of the most accessed real users
cat www.XXXX.com-access.log |awk '{print $5}'| awk -F":" '{print $NF}' |sort|uniq -c|sort -nr|head -10
View the url to visit the top 10
cat www.XXX.com-access.log | awk '{print $10}' | sort | uniq -c | sort -nr | head -n 10
Maximum execution time: 10
cat www.XXX.com-access.log | sort -nr | head -n 10
View http_ Origin of referer:
cat www.XXX.com-access.log | awk -F"from:" '{print $NF}' |sort|uniq -c|sort -nr|head -10
Seal the IP address and view the specific referer source address
Server firewall sealing ip
Sealed IP segment
/sbin/iptables -I INPUT -s 61.37.80.0/24 -j DROP #The command to mask a single IP is deny 123.45.6.7 #Seal the entire segment, i.e. commands from 123.0.0.1 to 123.255.255.254 deny 123.0.0.0/8
Prohibit specific User Agents from accessing
if ($http_user_agent ~* (wget|curl|Firefox) ) { return 404; }
Jump to specific address attacks
rewrite ^/accounts/\+\$str\+ http://127.0.0.1/ redirect;
According to user_agent controls client access
location / { if ($http_user_agent ~ 'bingbot/2.0|MJ12bot/v1.4.2|Spider/3.0|YoudaoBot|Tomato|Gecko/20100315'){ return 403; } }
Picture anti-theft chain
valid_referers none blocked *.XXX.com server_names ~\.google\. ~\.baidu\.; if ($invalid_referer) { # return 403; rewrite ^/ http://www.XXX.com/daoling.png; }
host is not allowed for localhost access
if ($host = 'localhost') { return 403; }
agent is not allowed to be empty
if ($http_user_agent ~ ^$){ return 403; }
Binding host access is not allowed
if ($http_x_forwarded_for ~ ^$){ return 402; }