Add monitoring host and service in nagios

Posted by chuspy on Fri, 13 Mar 2020 10:45:33 +0100

Nagios add monitoring host

Operation steps:

  • 1. Modify the main configuration file of Nagios to separate the configuration of the server from the configuration of the monitoring items
  • 2. Add monitoring host related information profile
  • 3. Check configuration file syntax for correctness
  • 4. Reload configuration for new profile to take effect
  • 5. View Web interface validation configuration succeeded

1. Modify master profile

After installing Nagios, you can see a local monitoring case named localhost through the web interface. This is due to the template generated by Nagios configuration file, which is located in / usr/local/nagios/etc/objects/localhost.cfg:

$ tree /usr/local/nagios/etc/    # Nagios profile directory
/usr/local/nagios/etc/
|-- cgi.cfg                 # web interface profile
|-- htpasswd.users          # User name and password authentication file when logging into Nagios Web page
|-- nagios.cfg                 # Master profile
|-- nrpe.cfg                   # Client profile
|-- objects                    # Directory containing other profiles
|   |-- localhost.cfg          # Used to define host monitoring

It is not recommended to write the new host configuration to be monitored in it, because when there are too many hosts to be monitored, this part will be very injured. Therefore, you can create a special folder under / usr/local/nagios/etc / to store the configuration information of the host and the service, and you can distinguish the machines with different functions or platforms by different folders. For example:

# Manage with different cloud platforms and functions
tree /usr/local/nagios/etc/
|-- cgi.cfg                 
|-- htpasswd.users          
|-- nagios.cfg              
|-- nrpe.cfg                
|-- objects                 
|-- servers         # New storage service and machine directory
    |-- aliyun      # Storing machines related to aliyun platform
    |   |-- product_Host.cfg
    |   |-- product_Service.cfg
    |   |-- staging_Host.cfg
    |   `-- staging_Service.cfg
    `-- aws
        |-- product_Host.cfg
        |-- product_Service.cfg
        |-- staging_Host.cfg
        `-- staging_Service.cfg

Let's create a new servers folder under / usr/local/nagios/etc:

mkdir /usr/local/nagios/etc/servers

In order for Nagios to read our configuration file when it starts, we need to modify the main configuration file: / usr/local/nagios/etc/nagios.cfg

Add a new path information: cfg_dir=/usr/local/nagios/etc/servers

In this way, files in this directory will be loaded by nagios as long as they conform to the *. cfg naming.

Introduction to the main configuration file nagios.cfg

# Path for Nagios to retrieve monitoring command configuration file
cfg_file=/etc/nagios/objects/commands.cfg                        

# Nagios global variable configuration file, also known as resource (macro) definition file
resource_file=/etc/nagios/resource.cfg                           

# The status of Nagios stores the data file, and the obtained monitoring information exists in the file
status_file=/usr/local/nagios/var/status.dat                     

# Nagios monitoring status interval configuration, how many seconds to execute monitoring command by default
status_update_interval=10        

# Users running Nagios service 
nagios_user=nagios

# Running group of Nagios service
nagios_group=nagios                                             

# Nagios needs to open this configuration item if it needs to send commands in the Web interface
check_external_commands=1                                        

# Nagios sets the interval length of inspection time unit, which is 60s by default
interval_length=60                                               

# Whether Nagios enables the notification function? 1 indicates that notifications are enabled and 0 indicates that notifications are turned off.
enable_notifications={1|0}                                       

# Nagios sets the execution timeout of the service detection command. If it fails to get the return data after the timeout, it will be determined as timeout
service_check_timeout=60

# Nagios sets the execution timeout of the host detection command. If it fails to obtain the return data after the timeout, it is determined that the machine is hung
host_check_timeout=30                                            

# Nagios sets whether to enable the active service detection mechanism. 1 means enabled and 0 means closed.
execute_service_checks={1|0}

# Nagios sets whether to accept the result of passive service detection. 1 indicates acceptance and 0 indicates rejection.
accept_passive_service_checks={1|0}                              

# Nagios sets whether to enable the active host detection mechanism. 1 means enabled and 0 means closed.
execute_host_checks={1|0}                                        

# Nagios sets whether to accept the result of passive host detection. 1 indicates acceptance and 0 indicates rejection.
accept_passive_host_checks={1|0}                                

# Nagios sets whether to enable event processing. 1 means enabled and 0 means closed. That is to say, the set processing mechanism is executed when the service fails.
enable_event_handlers=1                                         

# The path for Nagios to generate log files
log_file=/usr/local/nagios/var/nagios.log 

# Nagios specifies the method of log dump. n means no log rollback
log_rotation_method={n|h|d|w|m}

Add monitoring host related information profile

Add the host test to be monitored, and add its configuration information in the host information path we defined earlier: vim /usr/local/nagios/etc/servers/test.cfg

define host {
        use                             linux-server
        host_name                       test
        alias                           My first Apache server
        address                         127.0.0.1
        max_check_attempts              3
        check_period                    24x7
        notification_interval           30
        notification_period             24x7
}

Explain to you what they have written:

# Let the system recognize that the middle content block is used to set the device information through the define host keyword
define host {
        # The use keyword indicates the template used. The template will be explained later. Here is the Linux server template
        use                             linux-server

        # The host_name keyword indicates the name of the machine, which is also displayed in the Web interface
        host_name                       test

        # Alias refers to the alias of a machine, which is generally used as a description of a machine alias
        alias                           My first Apache server

        # Address sets the IP address of the machine for data acquisition and passive monitoring
        address                         127.0.0.1

        # The maximum number of attempts, that is, the number of times to run the monitoring command again to get data in case of a service monitoring error
        max_check_attempts              3

        # Time period of detection
        check_period                    24x7

        # Time interval for sending message reminders
        notification_interval           30

        # Time period for sending message reminders
        notification_period             24x7
}

This configuration file basically describes the basic information and individual customized configuration of a device. Of course, there are also some configuration items like this:

  • Host groups web ා host group, which can be used to manage monitoring content
  • Check command check host alive check command
  • Check interval 5
  • Retry interval 1 retry interval, the interval of retry after a test failure
  • contacts shiyanlou
  • Contact group Linux admins contact group, the group of people who send alarm information when the machine has problems
  • Notification options D, u, R which state to notify; d(Down), u(UNREACHABLE), r- (recovery), f, s}, each state and its symbols are as follows:
parameter Meaning
d - DOWN He hung up.
u - UNREACHABLE Unreachable
r - UP(host recovery) #Restorer state
f - flapping Abnormal abnormality
s Start or end of debugging downtime

Check profile

After modifying the configuration file every time, we need to check whether our configuration file is correct through such commands. If you restart or reload the configuration file directly to make it effective, but there are errors in the configuration, our monitoring service will be directly unavailable. Therefore, we need to check every time before restarting:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

After checking that the configuration is correct, we reload the nagios service:

service nagios reload

Open the web login interface, click the Hosts tab on the left, and you can see the new host test (currently the local machine):

In summary, there are two steps to increase the monitored host:

  • 1. Add / modify host profile
  • 2. Reload nagios service

Add monitoring service

The monitoring host can only see the on-off status of the host, but we are often more concerned about whether the business running on the host is normal, so we need to add the services we need to monitor on nagios.

Create monitoring service profile
Generally, the monitoring host and monitoring service are placed in two configuration files, such as the following file structure:

|-- servers         
    |-- test_Host.cfg
    `-- test_Service.cfg

The test_Service.cfg file is the configuration file of the monitoring service:

define service {
      host_name                       test
      service_description             Check SSH
      check_command                   check_ssh
      max_check_attempts              2
      check_interval                  2
      retry_interval                  2
      check_period                    24x7
      notification_interval           2
      notification_period             24x7
      notifications_enabled           1
      register                        1
}

Is it similar to adding a host's profile? Explain the differences:

  • host_name: this configuration item tells Nagios which device the monitoring service is for
  • Use: the use keyword is not used here, but there is also the concept of template in service. The usage is the same as that in host
  • check_command: this configuration item specifies the command used by this monitoring service (the command used must be specified in the commands.cfg configuration file)
  • Max? Check? Attempts: this configuration item specifies the maximum number of attempts when there is a problem in the service check
    Only a few commonly used items are listed. For details, please refer to Official manual

Command profile

In the configuration file of the service, there is a configuration item named check_command, which specifies the command used to monitor the service. This command, defined by commands.cfg, exists in / usr/local/nagios/etc/objects/commands.cfg
Looking at this document, you can see that there are a series of commands defined in it:

Among them:

  • command_name: the name of the command. The command we use in service
  • Command line: that is, the script / plug-in used by the command, and the relevant parameters when using the script
    For example, the command of the ssh service we want to monitor, check Hou ssh:

The script (or plug-in) it uses exists in the script named check? SSH under the path $USER1

The $USER1 variable is defined in the configuration file usr/local/nagios/etc/resource.cfg.

If you are curious, you can open it to have a look:

vi usr/local/nagios/etc/resource.cfg

It can be found that the value of USER1 variable is the path / usr/local/nagios/libexec, that is to say, the script that is actually executed in the back, check Φ SSH, is stored in / usr/local/nagios/libexec

Let's see what else there is in this path:

Originally, this path is used to store the scripts / plug-ins that are actually executed in nagios monitoring process. Therefore, we can also develop our own scripts / plug-ins according to the actual needs, put them in this path, and realize our own monitoring needs. The scripts / plug-ins in this path only need to have executable permissions, whether they are written by C, shell, PHP or python So.

Check profile

Also, after adding a new monitoring service, let's try to check whether the configuration file is correct:

nagios /usr/local/nagios/etc/nagios.cfg

Reload nagios service after confirmation:

service nagios reload

Open the web interface and click the Services tab on the left to see the newly added check SSH service:

If you see the tail pending status, please wait a second time. The polling interval for monitoring is not reached

Add a Contact

The default contact for monitoring is nagiosadmin. If you need to change it, please specify it in the file / usr/local/nagios/etc/objects/contact.cfg:

It can be modified according to the default template without explanation

add group

When encountering similar functions or services with the same business, we can manage them in groups. The syntax of defining groups is as follows:

define hostgroup{
    hostgroup_name        test_servers
    alias                Servers For Test
    members                T1,T2,T3
    }
  • Hostgroup? Name: set the name of the group
  • alias: a description of the group
  • Members: members owned in the group

If you do not want to write the members field as the elder, you can also add the following configuration items in the configuration file of each host:

hostgroups test_servers

When we need to monitor a certain service on each server (assuming ssh service) for a group of servers, we can add hostgroup ﹣ name to the definition of service, so that members of the corresponding group will monitor the service, which saves a lot of repetitive configuration work.

For example, we add a new host named test2, which is grouped as test group to monitor ssh services. The implementation steps are as follows:

  • 1. First, define the groups of the two hosts in their configuration files:
hostgrous test_group

-2. Next, establish the monitoring service configuration file. In the configuration file:

# Delete content
host_name              test

# Add content
hostgroup_name         test_group

And add a new definition of group in the blank space:

define hostgroup{
    hostgroup_name          test_group
    alias                   For test
}
  • 3. Check the configuration and reload the nagios service:

slightly

OK, that's the basic operation of nagios

-------------The End-------------

Published 17 original articles, won praise 1, visited 20000+
Private letter follow

Topics: ssh Linux Apache AWS