Install monitoring prometheus

Posted by tnewton on Tue, 25 Jan 2022 03:08:34 +0100

23. Monitoring prometheus

Official documents: https://prometheus.io/docs
https://github.com/coreos/prometheus-operator
https://www.qikqiak.com/k8strain/monitor/prometheus/

I brief introduction

1. Component architecture

Prometheus Server

The core component of the service pulls and stores monitoring data from the Exporter through pull metrics, and provides a set of flexible query language (PromQL).

Pushgateway

Similar to a transfer station, it receives the data actively pushed and waits for the Prometheus Server to pull the data.
Some nodes can only push data in push mode for some reasons. For example, the pull action of Prometheus Server has intervals. If some containers or tasks survive for a short time, they may end before Prometheus Server comes to find them. Therefore, these containers or tasks need to actively send their own monitoring information to Prometheus, The Pushgateway receives and temporarily transfers these active push data.

Jobs/Exporter

It is responsible for collecting the performance data of the target object (host, container...) and providing it to the Prometheus Server through the HTTP interface.
Export er means that some applications may not have their own / metrics interface for Prometheus. In this case, we need to use additional exporter services to provide indicator data for Prometheus. For more exports, please see: https://prometheus.io/docs/instrumenting/exporters/

Service Discovery

Prometheus supports a variety of service discovery mechanisms: file, DNS, Consul,Kubernetes,OpenStack,EC2, etc.
The process of service discovery is not complicated. Through the interface provided by a third party, Prometheus queries the list of targets to be monitored, and then polls these targets to obtain monitoring data.

Alertmanager

After receiving alerts from the Prometheus server, it will remove duplicate data, group, route to the other party's acceptance method, and send an alarm. Common receiving methods include email, pageduty, etc.

2. Brief description of Prometheus configuration file

Binary can be installed through/ prometheus --config.file=prometheus.yml to specify the configuration file to run Prometheus. In k8s, the configuration file is generally saved in the form of configmap, and then used by mounting
The basic configuration of the configuration file is as follows

global:
  scrape_interval:     15s
  evaluation_interval: 15s
rule_files:
  # - "first.rules"
  # - "second.rules"
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

The global module controls the global configuration of Prometheus Server:
scrape_interval: indicates the frequency at which prometheus grabs indicator data. The default value is 15s. We can override this value
evaluation_interval: used to control the frequency of evaluating rules. prometheus uses rules to generate new time series data or generate alerts
rule_files: Specifies the location of the alarm rule. prometheus can load the rule according to this configuration to generate new time series data or alarm information. At present, we have not configured any alarm rules.
scrape_configs is used to control which resources prometheus monitors. A job monitoring prometheus itself is configured here. If other resources need to be monitored, add the configuration here.
If prometheus is mentioned below YML refers to the prometheus configuration file. Other detailed configuration usage will be described below.

Binary installation can be through https://prometheus.io/download Download Prometheus. There is no extension here

II k8s deployment configuration prometheus

Create namespace: Kube mon in advance

1.prometheus-cm.yaml see

Configmap saves the prometheus configuration, and then modifies the prometheus configuration by modifying configmap

2.prometheus-rbac.yaml see

3.prometheus-dp.yaml see

prometheus performance and data persistence. Here, we will directly persist data through hostPath
--storage.tsdb.path=/prometheus specify the data directory in the container, and then mount the directory declaration to the host / data/prometheus
The nodeSelector pins the Pod to a node with a monitor=prometheus tag
Label the target node: kubectl label node 172.10.10.21 monitor=prometheus

4.prometheus-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: prometheus-nodeport
  namespace: kube-mon
  labels:
    app: prometheus
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - name: web
      port: 9090
      targetPort: http
      nodePort: 8003

Create a NodePort type service to access the prometheus page
http://172.10.10.21:8003

The current prometheus is a single node, which can meet the basic use. How to build a highly available prometheus? In the future, we will try to build prometheus-operator

III Using prometheus

1. Monitor coreDns service

When we deployed coreDns, we configured prometheus: 9153 in the configMap. You must have guessed that coreDns itself provides the / metrics interface. We can directly configure prometheus for monitoring

prometheus.yaml addition

    scrape_configs:
    - job_name: 'coredns'
      static_configs:
      - targets: ['kube-dns.kube-system:9153']

Use Kube DNS Kube system is because prometheus and coreDns are not in the same namespace, so the namespace is brought when writing the service domain name

2. Monitoring traefik services

traefik add parameter to enable metrics

--metrics=true
--metrics.prometheus=true
--metrics.prometheus.buckets=0.100000, 0.300000, 1.200000, 5.000000
--metrics.prometheus.entryPoint=traefik

Test metrics interface
curl http://192.168.177.109:8080/metrics
prometheus configuration job

    - job_name: 'traefik'
      static_configs:
        - targets: ['traefiktcp.default:8080']

Here's traefik TCP Default is the domain name of traefik svc in coreDns
curl -XPOST http://172.20.101.3:9090/ -/Reload overload prometheus configuration

Grab a screenshot of grafana to show the effect, and then explain the configuration and use of grafana in detail

In the next section, let's see how to use grafana. After grafana is ready, we can monitor the k8s cluster

3. Monitor ingress nginx

Add metrics service
The ingress nginx installed in the previous sections has its own metrics interface. We only need to expose it through svc

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx-metrics
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  type: ClusterIP
  ports:
    - name: metrics
      port: 10254
      targetPort: 10254
      protocol: TCP
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

View svc and test

➜  ~ kubectl get svc -n ingress-nginx
NAME                    TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)                    AGE
ingress-nginx           NodePort    192.168.175.255   <none>        80:8480/TCP,443:8443/TCP   59m
ingress-nginx-metrics   ClusterIP   192.168.201.61    <none>        10254/TCP                  21m
➜  ~ curl http://192.168.201.61:10254/metrics
go_gc_duration_seconds{quantile="0"} 9.336e-06
go_gc_duration_seconds{quantile="0.25"} 1.7991e-05
go_gc_duration_seconds{quantile="0.5"} 3.8044e-05
go_gc_duration_seconds{quantile="0.75"} 0.000102869
......

Configure prometheus monitoring

    - job_name: 'ingress-nginx'
      static_configs:
        - targets: ['ingress-nginx-metrics.ingress-nginx:10254']

reload prometheus

Later, extend grafana to display monitoring statistics

Think about a problem. If you start traefik or ingress nginx of multiple nodes, the monitoring interface configured here obtains the monitoring information of different nodes every time, and its statistical data is inaccurate. The ideal statistical data is the summary of the data of multiple nodes. How to deal with this problem?

appendix

prometheus-cm.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-mon
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']

prometheus-rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-mon
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-mon

prometheus-dp.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: kube-mon
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      nodeSelector:
        monitor: prometheus    # Deploy to the specified node to prevent pod drift
      containers:
      - image: prom/prometheus:v2.26.0
        name: prometheus
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"  # Specify tsdb data path
        - "--storage.tsdb.retention.time=24h"
        - "--web.enable-admin-api"  # Controls access to the admin HTTP API, including functions such as deleting time series
        - "--web.enable-lifecycle"  # Support hot update. Execute localhost:9090/-/reload directly and take effect immediately
        - "--web.console.libraries=/usr/share/prometheus/console_libraries"
        - "--web.console.templates=/usr/share/prometheus/consoles"
        ports:
        - containerPort: 9090
          name: http
        volumeMounts:
        - mountPath: "/etc/prometheus"
          name: config-volume
        - mountPath: "/prometheus"
          name: data
        resources:
          requests:
            cpu: 100m
            memory: 512Mi
          limits:
            cpu: 100m
            memory: 512Mi
      securityContext:
        runAsUser: 0    # runAsUser=0 specifies that the running user is root, otherwise the container may not have permission to use the host volumes mounted below
      volumes:
      - name: data
        hostPath:
          path: /data/prometheus/
      - name: config-volume
        configMap:
          name: prometheus-config

Topics: Kubernetes Prometheus

Programmer Think