A brief introduction to prometheus

Posted by chintupintu03 on Mon, 27 May 2019 20:41:26 +0200

prometheus monitoring system

Recently, due to the company's need to build a large data platform, prometheus monitoring system has been replaced as required by consultants.
prometheus official website: https://prometheus.io/
Personal understanding (not necessarily right): prometheus monitoring consists of three parts: prometheus (server), exporter (agent), and alert manager (alarm).
Among them, the core of Prometheus is a time series database, through which we can capture and store data, and obtain the data we need through some query statements defined by prometheus; the core of exporter is a static web, exporter exposes metric s value through constantly updating static web; alert manager is an alarm interface, which receives and communicates the alarms pushed by prometheus. Alert yourself by defining some rules.
As mentioned earlier, prometheus core is a database, so if we need to show it, we need to use it with grafana to make a beautiful interface. This section will be mentioned in the next article.
The advantage of prometheus is that it is a service-based alarm system. Different exporters can achieve different effects for different services. Since I have not used other exporters for the first time, friends who want to know can visit the official website.
Here are some simple configurations. I have made some comments on the important configurations. We can initially build a prometheus monitoring system to monitor some basic information.

server terminal

deploy

cd /usr/local/
wget http://1.1.17.28/software/linux/prometheus/prometheus-1.7.1.linux-amd64.tar.gz
tar  -zxvf prometheus-1.7.1.linux-amd64.tar.gz
cd prometheus-1.7.1.linux-amd64
nohup  ./prometheus   &
echo "/usr/local/prometheus-1.7.1.linux-amd64/prometheus"" >> /etc/rc.local

Main configuration file: prometheus.yml

[root@prometheus local]# cat  prometheus-1.7.1.linux-amd64/prometheus.yml
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
    monitor: 'codelab-monitor'
# Alarm rule file
rule_files:
  - 'prometheus.rules'

scrape_configs:
# Monitor oneself, can be matched or not
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']
# node_exporter target configuration, grasp the basic information of node (CPU, memory, etc.), can be based on different services to establish job s, lable
  - job_name:       'node'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:

      - targets: ['1.1.17.28:9100']
        labels:
          severity: 'all'
          group: 'tool'
          hostname: 'yum-server'

      - targets: ['1.1.11.27:9100']
        labels: 
          severity: 'all'
          group: 'dev'
          hostname: 'app1'

      - targets: ['1.1.11.28:9100']
        labels: 
          severity: 'all'
          group: 'dev'
          hostname: 'app2'    
      - targets: ['1.1.11.15:9100']
        labels:
          severity: 'all'
          group: 'hadoop'
          hostname: 'hadoop1'
      - targets: ['1.1.11.16:9100']
        labels:
          severity: 'all'
          group: 'hadoop'
          hostname: 'hadoop2'
      - targets: ['1.1.11.17:9100']
        labels:
          severity: 'all'
          group: 'hadoop'
          hostname: 'hadoop2'

      - targets: ['1.1.10.12:9100']
        labels:
          severity: 'all'
          group: 'db_anl'
          hostname: 'DB_ETL'
# Alert manager configuration
alerting:
   alertmanagers: 
   - scheme: http
     static_configs:
     - targets: 
        - "1.1.17.17:9093"

Warning rules: prometheus.rules

# CPU alarm 
ALERT cpu_overload
  IF node_load1 >= 80
  FOR 3m
  LABELS { severity = "all" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} cpu_load1 over 80% for 3 minutes",
    description = "{{ $labels.instance }} of job {{ $labels.job }} cpu_load1 over 80% for 3 minutes.",
  }


# Memory alarm
ALERT memory_overload
  IF (node_memory_MemTotal-node_memory_MemFree)/node_memory_MemTotal >= 0.8
  FOR 3m
  LABELS { severity = "all" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} memory_load over 80% for 3 minutes",
    description = "{{ $labels.instance }} of job {{ $labels.job }} memory_load over 80% for 3 minutes.",
  }

node-export deployment

node-export exposes metric only through static web and starts after installation without configuration

cd /usr/local/
wget http://1.1.17.28/software/linux/prometheus/node_exporter-0.14.0.linux-amd64.tar.gz
tar -zxvf  node_exporter-0.14.0.linux-amd64.tar.gz 
cd node_exporter-0.14.0.linux-amd64
nohup ./node_exporter &
#Write Boot Start
echo "/usr/local/node_exporter-0.14.0.linux-amd64/node_exporter"  >>  /etc/rc.local

alert

deploy

cd /usr/local
wget http://1.1.17.28/software/linux/prometheus/alertmanager-0.8.0.linux-amd64.tar.gz
tar -zxvf   alertmanager-0.8.0.linux-amd64.tar.gz
cd alertmanager-0.8.0.linux-amd64
nohup   ./alertmanager   & 
echo "/usr/local/alertmanager-0.8.0.linux-amd64/alertmanager" >> /etc/rc.local

Alarm notification profile

Only mail alerts are configured

[root@prometheus local]# cat alertmanager-0.8.0.linux-amd64/alertmanager.yml
global:
  smtp_smarthost: 'smtp.xxx.com:25'
  resolve_timeout: 5m
  smtp_from: '123@xxx.com'
  smtp_auth_username: '123@xxx.com'
  smtp_auth_password: '123123123'
  smtp_require_tls: false
#templates: 
#- '/usr/local/alertmanager-0.8.0.linux-amd64/alert_templates/123.tmpl'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h 
  receiver: 'hwj'
  routes:
 # - match_re:
 #     service: ^(foo1|foo2|baz)$
 #   receiver: hwj
 #   routes:
  - match:
      severity: 'all' 
    receiver: 'hwj'
receivers:
- name: 'hwj'
  email_configs:
  - to: '123@xxx.com'
    send_resolved: true
  - to: '456@xxx.com'
    send_resolved: true

Topics: Linux Hadoop Database yum