Nagios add monitoring host
Operation steps:
- 1. Modify the main configuration file of Nagios to separate the configuration of the server from the configuration of the monitoring items
- 2. Add monitoring host related information profile
- 3. Check configuration file syntax for correctness
- 4. Reload configuration for new profile to take effect
- 5. View Web interface validation configuration succeeded
1. Modify master profile
After installing Nagios, you can see a local monitoring case named localhost through the web interface. This is due to the template generated by Nagios configuration file, which is located in / usr/local/nagios/etc/objects/localhost.cfg:
$ tree /usr/local/nagios/etc/ # Nagios profile directory /usr/local/nagios/etc/ |-- cgi.cfg # web interface profile |-- htpasswd.users # User name and password authentication file when logging into Nagios Web page |-- nagios.cfg # Master profile |-- nrpe.cfg # Client profile |-- objects # Directory containing other profiles | |-- localhost.cfg # Used to define host monitoring
It is not recommended to write the new host configuration to be monitored in it, because when there are too many hosts to be monitored, this part will be very injured. Therefore, you can create a special folder under / usr/local/nagios/etc / to store the configuration information of the host and the service, and you can distinguish the machines with different functions or platforms by different folders. For example:
# Manage with different cloud platforms and functions tree /usr/local/nagios/etc/ |-- cgi.cfg |-- htpasswd.users |-- nagios.cfg |-- nrpe.cfg |-- objects |-- servers # New storage service and machine directory |-- aliyun # Storing machines related to aliyun platform | |-- product_Host.cfg | |-- product_Service.cfg | |-- staging_Host.cfg | `-- staging_Service.cfg `-- aws |-- product_Host.cfg |-- product_Service.cfg |-- staging_Host.cfg `-- staging_Service.cfg
Let's create a new servers folder under / usr/local/nagios/etc:
mkdir /usr/local/nagios/etc/servers
In order for Nagios to read our configuration file when it starts, we need to modify the main configuration file: / usr/local/nagios/etc/nagios.cfg
Add a new path information: cfg_dir=/usr/local/nagios/etc/servers
In this way, files in this directory will be loaded by nagios as long as they conform to the *. cfg naming.
Introduction to the main configuration file nagios.cfg
# Path for Nagios to retrieve monitoring command configuration file cfg_file=/etc/nagios/objects/commands.cfg # Nagios global variable configuration file, also known as resource (macro) definition file resource_file=/etc/nagios/resource.cfg # The status of Nagios stores the data file, and the obtained monitoring information exists in the file status_file=/usr/local/nagios/var/status.dat # Nagios monitoring status interval configuration, how many seconds to execute monitoring command by default status_update_interval=10 # Users running Nagios service nagios_user=nagios # Running group of Nagios service nagios_group=nagios # Nagios needs to open this configuration item if it needs to send commands in the Web interface check_external_commands=1 # Nagios sets the interval length of inspection time unit, which is 60s by default interval_length=60 # Whether Nagios enables the notification function? 1 indicates that notifications are enabled and 0 indicates that notifications are turned off. enable_notifications={1|0} # Nagios sets the execution timeout of the service detection command. If it fails to get the return data after the timeout, it will be determined as timeout service_check_timeout=60 # Nagios sets the execution timeout of the host detection command. If it fails to obtain the return data after the timeout, it is determined that the machine is hung host_check_timeout=30 # Nagios sets whether to enable the active service detection mechanism. 1 means enabled and 0 means closed. execute_service_checks={1|0} # Nagios sets whether to accept the result of passive service detection. 1 indicates acceptance and 0 indicates rejection. accept_passive_service_checks={1|0} # Nagios sets whether to enable the active host detection mechanism. 1 means enabled and 0 means closed. execute_host_checks={1|0} # Nagios sets whether to accept the result of passive host detection. 1 indicates acceptance and 0 indicates rejection. accept_passive_host_checks={1|0} # Nagios sets whether to enable event processing. 1 means enabled and 0 means closed. That is to say, the set processing mechanism is executed when the service fails. enable_event_handlers=1 # The path for Nagios to generate log files log_file=/usr/local/nagios/var/nagios.log # Nagios specifies the method of log dump. n means no log rollback log_rotation_method={n|h|d|w|m}
Add monitoring host related information profile
Add the host test to be monitored, and add its configuration information in the host information path we defined earlier: vim /usr/local/nagios/etc/servers/test.cfg
define host { use linux-server host_name test alias My first Apache server address 127.0.0.1 max_check_attempts 3 check_period 24x7 notification_interval 30 notification_period 24x7 }
Explain to you what they have written:
# Let the system recognize that the middle content block is used to set the device information through the define host keyword define host { # The use keyword indicates the template used. The template will be explained later. Here is the Linux server template use linux-server # The host_name keyword indicates the name of the machine, which is also displayed in the Web interface host_name test # Alias refers to the alias of a machine, which is generally used as a description of a machine alias alias My first Apache server # Address sets the IP address of the machine for data acquisition and passive monitoring address 127.0.0.1 # The maximum number of attempts, that is, the number of times to run the monitoring command again to get data in case of a service monitoring error max_check_attempts 3 # Time period of detection check_period 24x7 # Time interval for sending message reminders notification_interval 30 # Time period for sending message reminders notification_period 24x7 }
This configuration file basically describes the basic information and individual customized configuration of a device. Of course, there are also some configuration items like this:
- Host groups web ා host group, which can be used to manage monitoring content
- Check command check host alive check command
- Check interval 5
- Retry interval 1 retry interval, the interval of retry after a test failure
- contacts shiyanlou
- Contact group Linux admins contact group, the group of people who send alarm information when the machine has problems
- Notification options D, u, R which state to notify; d(Down), u(UNREACHABLE), r- (recovery), f, s}, each state and its symbols are as follows:
parameter | Meaning |
---|---|
d - DOWN | He hung up. |
u - UNREACHABLE | Unreachable |
r - UP(host recovery) | #Restorer state |
f - flapping | Abnormal abnormality |
s | Start or end of debugging downtime |
Check profile
After modifying the configuration file every time, we need to check whether our configuration file is correct through such commands. If you restart or reload the configuration file directly to make it effective, but there are errors in the configuration, our monitoring service will be directly unavailable. Therefore, we need to check every time before restarting:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
After checking that the configuration is correct, we reload the nagios service:
service nagios reload
Open the web login interface, click the Hosts tab on the left, and you can see the new host test (currently the local machine):
In summary, there are two steps to increase the monitored host:
- 1. Add / modify host profile
- 2. Reload nagios service
Add monitoring service
The monitoring host can only see the on-off status of the host, but we are often more concerned about whether the business running on the host is normal, so we need to add the services we need to monitor on nagios.
Create monitoring service profile
Generally, the monitoring host and monitoring service are placed in two configuration files, such as the following file structure:
|-- servers |-- test_Host.cfg `-- test_Service.cfg
The test_Service.cfg file is the configuration file of the monitoring service:
define service { host_name test service_description Check SSH check_command check_ssh max_check_attempts 2 check_interval 2 retry_interval 2 check_period 24x7 notification_interval 2 notification_period 24x7 notifications_enabled 1 register 1 }
Is it similar to adding a host's profile? Explain the differences:
- host_name: this configuration item tells Nagios which device the monitoring service is for
- Use: the use keyword is not used here, but there is also the concept of template in service. The usage is the same as that in host
- check_command: this configuration item specifies the command used by this monitoring service (the command used must be specified in the commands.cfg configuration file)
- Max? Check? Attempts: this configuration item specifies the maximum number of attempts when there is a problem in the service check
Only a few commonly used items are listed. For details, please refer to Official manual
Command profile
In the configuration file of the service, there is a configuration item named check_command, which specifies the command used to monitor the service. This command, defined by commands.cfg, exists in / usr/local/nagios/etc/objects/commands.cfg
Looking at this document, you can see that there are a series of commands defined in it:
Among them:
- command_name: the name of the command. The command we use in service
- Command line: that is, the script / plug-in used by the command, and the relevant parameters when using the script
For example, the command of the ssh service we want to monitor, check Hou ssh:
The script (or plug-in) it uses exists in the script named check? SSH under the path $USER1
The $USER1 variable is defined in the configuration file usr/local/nagios/etc/resource.cfg.
If you are curious, you can open it to have a look:
vi usr/local/nagios/etc/resource.cfg
It can be found that the value of USER1 variable is the path / usr/local/nagios/libexec, that is to say, the script that is actually executed in the back, check Φ SSH, is stored in / usr/local/nagios/libexec
Let's see what else there is in this path:
Originally, this path is used to store the scripts / plug-ins that are actually executed in nagios monitoring process. Therefore, we can also develop our own scripts / plug-ins according to the actual needs, put them in this path, and realize our own monitoring needs. The scripts / plug-ins in this path only need to have executable permissions, whether they are written by C, shell, PHP or python So.
Check profile
Also, after adding a new monitoring service, let's try to check whether the configuration file is correct:
nagios /usr/local/nagios/etc/nagios.cfg
Reload nagios service after confirmation:
service nagios reload
Open the web interface and click the Services tab on the left to see the newly added check SSH service:
If you see the tail pending status, please wait a second time. The polling interval for monitoring is not reached
Add a Contact
The default contact for monitoring is nagiosadmin. If you need to change it, please specify it in the file / usr/local/nagios/etc/objects/contact.cfg:
It can be modified according to the default template without explanation
add group
When encountering similar functions or services with the same business, we can manage them in groups. The syntax of defining groups is as follows:
define hostgroup{ hostgroup_name test_servers alias Servers For Test members T1,T2,T3 }
- Hostgroup? Name: set the name of the group
- alias: a description of the group
- Members: members owned in the group
If you do not want to write the members field as the elder, you can also add the following configuration items in the configuration file of each host:
hostgroups test_servers
When we need to monitor a certain service on each server (assuming ssh service) for a group of servers, we can add hostgroup ﹣ name to the definition of service, so that members of the corresponding group will monitor the service, which saves a lot of repetitive configuration work.
For example, we add a new host named test2, which is grouped as test group to monitor ssh services. The implementation steps are as follows:
- 1. First, define the groups of the two hosts in their configuration files:
hostgrous test_group
-2. Next, establish the monitoring service configuration file. In the configuration file:
# Delete content host_name test # Add content hostgroup_name test_group
And add a new definition of group in the blank space:
define hostgroup{ hostgroup_name test_group alias For test }
- 3. Check the configuration and reload the nagios service:
slightly
OK, that's the basic operation of nagios
-------------The End-------------