k8s-1.15 deploy kube-prometheus-0.3.0

Posted by noisenet on Wed, 11 Mar 2020 11:41:02 +0100

Article directory


We found the existence of Kube Prometheus on the Prometheus operator page. This is a project that takes Prom operator as the core. It defines the monitoring resources related to cluster and should be used for deployment.

This paper is based on Kube Prometheus

  • Modify the storage of grafana, Prometheus and alertmanager to local pvc
  • Added inress access
  • Add the service monitor of ingress nginx (verify that the added monitoring is correct)
  • Modified the error of custom metrics and implemented hpaV2

Deploy Kube Prometheus

Download related files

# Decompression error
wget https://github.com/coreos/kube-prometheus/archive/v0.3.0.tar.gz

Kube Prometheus project composition

Kube Prometheus is roughly divided into the following parts

  • grafana
  • kube-state-metrics
  • alertmanager
  • node-exporter
  • prometheus-adapter
  • prometheus
  • serviceMonitor

It includes Kube state metrics and Prometheus adapter projects. It will be described separately for Prometheus adapter later.

Submission of resources

Follow the instructions in the documentation to commit the resources in setup first

Submit files in setup

[root@docker-182 manifests]# k apply -f setup/
namespace/monitoring created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com configured
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created

[root@bj-k8s-master-56 ~]# k -n monitoring get all
NAME                                      READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-6685db5c6-fsfsp   1/1     Running   0          80s


NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
service/prometheus-operator   ClusterIP   None         <none>        8080/TCP   81s


NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator   1/1     1            1           81s

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-6685db5c6   1         1         1       81s

Submit files in manifest

[root@docker-182 manifests]# k apply -f .
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created

crd resources created by Kube Prometheus

[root@bj-k8s-master-56 ~]# k get crd -o wide
NAME                                    CREATED AT
alertmanagers.monitoring.coreos.com     2019-11-26T03:48:24Z
podmonitors.monitoring.coreos.com       2020-03-04T07:11:14Z
prometheuses.monitoring.coreos.com      2019-11-26T03:48:24Z
prometheusrules.monitoring.coreos.com   2019-11-26T03:48:24Z
servicemonitors.monitoring.coreos.com   2019-11-26T03:48:24Z

The prometheus resource defines how the prometheus service should run

[root@bj-k8s-master-56 ~]# k -n monitoring get prometheus
NAME   AGE
k8s    36m

Similarly, alertmanager defines the operation of alertmanager resources

[root@bj-k8s-master-56 ~]# kubectl -n monitoring get alertmanager
NAME   AGE
main   37m

prometheus and alertmanager are both statefullset controllers

[root@bj-k8s-master-56 ~]# k -n monitoring get statefulset -o wide
NAME                READY   AGE   CONTAINERS                                                       IMAGES
alertmanager-main   3/3     34m   alertmanager,config-reloader                                     quay.io/prometheus/alertmanager:v0.18.0,quay.io/coreos/configmap-reload:v0.0.1
prometheus-k8s      1/2     33m   prometheus,prometheus-config-reloader,rules-configmap-reloader   quay.io/prometheus/prometheus:v2.11.0,quay.io/coreos/prometheus-config-reloader:v0.34.0,quay.io/coreos/configmap-reload:v0.0.1

Modify grafana

The default grafana does not have a configmap for the configuration file, but uses sqlite stored in / var/lib/grafana mounted in emptydir

  • Create pvc of local type for grafana and mount it to / var/lib/grafana
  • Create cm of profile and mount
  • Creating an ingress resource

Create directory on 32.94, create pv, pvc

[root@bj-k8s-node-84 ~]# mkdir /data/apps/data/pv/monitoring-grafana
[root@bj-k8s-node-84 ~]# chown 65534:65534 /data/apps/data/pv/monitoring-grafana

[root@docker-182 grafana]# k apply -f grafana-local-pv.yml,grafana-local-pvc.yml 
persistentvolume/grafana-pv created
persistentvolumeclaim/grafana-pvc created

Create database

MariaDB [(none)]> create database k8s_55_grafana default character set utf8;
Query OK, 1 row affected (0.01 sec)

MariaDB [(none)]> grant all on k8s_55_grafana.* to grafana@'%';
Query OK, 0 rows affected (0.05 sec)

MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.03 sec)

Create svc of grafana MySQL

[root@docker-182 grafana]# k apply -f grafana-mysql_endpoint.yaml 
service/grafana-mysql created
endpoints/grafana-mysql created

Create cm for grafana.ini

[root@docker-182 grafana]# k55 apply -f grafana-config_cm.yaml 
configmap/grafana-config created

Modify the configuration in grafana deployment, add the operation of mounting configmap and pvc

[root@docker-182 grafana]# cp /data/apps/soft/ansible/kubernetes/kube-prometheus-0.3.0/manifests/grafana-deployment.yaml ./

[root@docker-182 grafana]# diff /data/apps/soft/ansible/kubernetes/kube-prometheus-0.3.0/manifests/grafana-deployment.yaml ./grafana-deployment.yaml 
35a36,38
>         - mountPath: /etc/grafana/grafana.ini
>           name: grafana-ini
>           subPath: grafana.ini
124c127,129
<       - emptyDir: {}
---
>       #- emptyDir: {}
>       - persistentVolumeClaim:
>           claimName: grafana-pvc
203a209,211
>       - configMap:
>           name: grafana-config
>         name: grafana-ini

Resubmit grafana deploy

[root@docker-182 grafana]# k apply -f grafana-deployment.yaml 
deployment.apps/grafana configured

Creating ingress for grafana

[root@docker-182 ingress-nginx]# cat ../kube-prometheus/grafana/grafana_ingress.yaml 
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    # use the shared ingress-nginx
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    #nginx.ingress.kubernetes.io/app-root: /
spec:
  rules:
  - http:
      paths:
      - path: /mygrafana(/|$)(.*)
        backend:
          serviceName: grafana
          servicePort: 3000
[root@docker-182 grafana]# k apply -f grafana_ingress.yaml 
ingress.networking.k8s.io/grafana-ingress created

Create a service for the Kube scheduler

By default, only the service monitor of the Kube scheduler is available. The svc of the defined monitoring object Kube system / Kube scheduler is not implemented, so you need to add one manually.

[root@bj-k8s-master-56 ~]# k -n monitoring get servicemonitor kube-scheduler
NAME             AGE
kube-scheduler   19h

[root@docker-182 kube-prometheus]# k apply -f prometheus-kubeSchedulerService.yaml 
service/kube-scheduler created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
endpoints/kube-scheduler configured

Create the service of Kube Controller Manager

[root@docker-182 kube-prometheus]# k apply -f prometheus-kubeControllerManagerService.yaml
service/kube-controller-manager created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
endpoints/kube-controller-manager configured

There is also proxy monitoring in grafana

This is supposed to be monitoring for the Kube proxy, but the resource file defining the Kube proxy is not found. Let's skip it for now.

Change the storage of prometheus to local pvc

I want to find out how to modify the working mode of the Prometheus operator on the basic resources (that is, modify the generated statefullset before submitting resources, rather than after submitting them to the cluster).

It is found that the Kube Prometheus project uses jsonnet language. According to his documents, it needs to be customized in this language, and then generate relevant yaml files.

Of course, I can also modify the generated yaml file in his manifests, which is used here.

Try to modify the resource file of prometheus to change the running parameters for prom

# spec: New
containers:
  - name: prometheus
    args:
    - --web.console.templates=/etc/prometheus/consoles
    - --web.console.libraries=/etc/prometheus/console_libraries
    - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
    - --storage.tsdb.path=/prometheus
    - --storage.tsdb.retention.time=360h # It used to be 24 hours
    - --web.enable-lifecycle
    - --storage.tsdb.no-lockfile
    - --web.route-prefix=/

[root@docker-182 manifests]# k apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured

It is feasible to do so.

Then we need to change the storage volume of prometheus to pvc to prevent the data from being emptied after pod reconstruction.

Change the storage of prom to local pvc

  1. Create pv
  2. Create pvc, prometheus-k8s-db-prometheus-k8s-0 and prometheus-k8s-db-prometheus-k8s-1
  3. Then reference in statefulset (the following is an example of reference in statefuls, which is added in prometheus resource, which will be slightly different)
volumeClaimTemplates:
- metadata:
    name: db
  spec:
    accessModes: ["ReadWriteOnce"]
    resources:
      requests:
        storage: 1Gi

The name of pvc is volume_name + '-' + pod_name. In the existing statefullset, the name of data volume is prometheus-k8s-db,
So that's what the name looks like, and it has to be these two names.
Originally, statefullset can dynamically generate pvc using storageClass, but there is no relevant storage resource here, so you choose to manually create pvc and then reference it.

Create directory and pvc

# 1000 and 1001 are uid and gid of the user running prom on the host
[root@docker-182 kube-prometheus]# ansible 10.111.32.94 -m file -a "path=/data/apps/data/pv/prometheus-k8s-db-prometheus-k8s-0 state=directory owner=1000 group=1001"

[root@docker-182 kube-prometheus]# ansible 10.111.32.178 -m file -a "path=/data/apps/data/pv/prometheus-k8s-db-prometheus-k8s-1 state=directory owner=1000 group=1001"


# Create pv and pvc
[root@docker-182 kube-prometheus]# k55 apply -f prometheus-k8s-db-prometheus-k8s-0_pv.yml 
persistentvolume/prometheus-k8s-db-prometheus-k8s-0 created
[root@docker-182 kube-prometheus]# k55 apply -f prometheus-k8s-db-prometheus-k8s-0_pvc.yml 
persistentvolumeclaim/prometheus-k8s-db-prometheus-k8s-0 created
[root@docker-182 kube-prometheus]# k55 apply -f prometheus-k8s-db-prometheus-k8s-1_pv.yml 
persistentvolume/prometheus-k8s-db-prometheus-k8s-1 created
[root@docker-182 kube-prometheus]# k55 apply -f prometheus-k8s-db-prometheus-k8s-1_pvc.yml 
persistentvolumeclaim/prometheus-k8s-db-prometheus-k8s-1 created

Submit update

# prometheus-prometheus.yaml added under spec
  storage:
    volumeClaimTemplate:
      metadata:
        name: prometheus-k8s-db
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi


[root@docker-182 kube-prometheus]# k apply -f prometheus-prometheus.yaml

Check the status of pvc and the information in pod, successful.

[root@bj-k8s-master-56 ~]# k -n monitoring get pvc
NAME                                 STATUS   VOLUME                               CAPACITY   ACCESS MODES   STORAGECLASS    AGE
grafana-pvc                          Bound    grafana-pv                           16Gi       RWO            local-storage   5d1h
prometheus-k8s-db-prometheus-k8s-0   Bound    prometheus-k8s-db-prometheus-k8s-0   200Gi      RWO            local-storage   3m43s
prometheus-k8s-db-prometheus-k8s-1   Bound    prometheus-k8s-db-prometheus-k8s-1   200Gi      RWO            local-storage   3m33s


# The original volume information is
      - emptyDir: {}
        name: prometheus-k8s-db

# Covering the original emptyDir
  volumes:
  - name: prometheus-k8s-db
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0

Change the storage of alertmanager to local pvc

The pvc name should be alertmanager-main-db-alertmanager-main-0,
alertmanager-main-db-alertmanager-main-1,alertmanager-main-db-alertmanager-main-2

Create pvc

[root@docker-182 kube-prometheus]# ansible 10.111.32.94 -m file -a "path=/data/apps/data/pv/alertmanager-main-db-alertmanager-main-0 state=directory owner=1000 group=1001"

[root@docker-182 kube-prometheus]# ansible 10.111.32.94 -m file -a "path=/data/apps/data/pv/alertmanager-main-db-alertmanager-main-1 state=directory owner=1000 group=1001"

[root@docker-182 kube-prometheus]# ansible 10.111.32.178 -m file -a "path=/data/apps/data/pv/alertmanager-main-db-alertmanager-main-2 state=directory owner=1000 group=1001"


# Submit pv and pvc resources

[root@docker-182 alertmanager]# ls -1r |while read line; do k apply -f ${line};done
persistentvolume/alertmanager-main-db-alertmanager-main-2 created
persistentvolumeclaim/alertmanager-main-db-alertmanager-main-2 created
persistentvolume/alertmanager-main-db-alertmanager-main-1 created
persistentvolumeclaim/alertmanager-main-db-alertmanager-main-1 created
persistentvolume/alertmanager-main-db-alertmanager-main-0 created
persistentvolumeclaim/alertmanager-main-db-alertmanager-main-0 created

Modify the alertmanager resource file and submit changes

# Add storage parameter under spec
  storage:
    volumeClaimTemplate:
      metadata:
        name: alertmanager-main-db
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
            
# Submit changes
[root@docker-182 alertmanager]# k apply -f alertmanager-alertmanager.yaml 
alertmanager.monitoring.coreos.com/main configured

Verification is correct.

[root@bj-k8s-master-56 ~]# k -n monitoring get pvc |grep alertmanager
alertmanager-main-db-alertmanager-main-0   Bound    alertmanager-main-db-alertmanager-main-1   10Gi       RWO            local-storage   4h1m
alertmanager-main-db-alertmanager-main-1   Bound    alertmanager-main-db-alertmanager-main-2   10Gi       RWO            local-storage   4h1m
alertmanager-main-db-alertmanager-main-2   Bound    alertmanager-main-db-alertmanager-main-0   10Gi       RWO            local-storage   4h1m

[root@bj-k8s-master-56 ~]# k -n monitoring get statefulset alertmanager-main -o yaml
...
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: alertmanager-main-db
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      volumeMode: Filesystem
    status:
      phase: Pending
...

[root@bj-k8s-master-56 ~]# k -n monitoring get pod -o wide |grep alertmanager 
alertmanager-main-0                   2/2     Running   0          3m56s   10.20.60.180    bj-k8s-node-84.tmtgeo.com     <none>           <none>
alertmanager-main-1                   2/2     Running   0          3m56s   10.20.245.249   bj-k8s-node-178.tmtgeo.com    <none>           <none>
alertmanager-main-2                   2/2     Running   0          3m56s   10.20.60.179    bj-k8s-node-84.tmtgeo.com     <none>           <none>

Create the serviceMonitor of ingress nginx

[root@docker-182 ingress-nginx]# cat ingress-serviceMonitor.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: ingress-nginx
  name: ingress-nginx
  namespace: monitoring
spec:
  endpoints:
  - interval: 15s
    port: "10254"
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    matchNames:
    - ingress-nginx
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx

The error that prom can't access resources caused by insufficient permissions of the default rbac

After submitting, prom reported an error

level=error ts=2020-03-10T10:33:21.196Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:263: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"ingress-nginx\""
level=error ts=2020-03-10T10:33:22.197Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:264: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"ingress-nginx\""
level=error ts=2020-03-10T10:33:22.198Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:265: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"ingress-nginx\""

At first glance, there must be no permission, but why can resources in the default Kube system namespace be obtained?

Create a clusterRole and bind it to prometheus-k8s sa (although you can also directly change the original clusterRole configuration)

[root@docker-182 kube-prometheus]# cat my-prometheus-clusterRoleBinding.yml 
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: my-prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs:
  - get 
  - list 
  - watch
- apiGroups: [""]
  resources:
  - configmaps
  verbs: 
  - get
- nonResourceURLs: 
  - /metrics
  verbs: 
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: my-prometheus
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: monitoring

[root@docker-182 kube-prometheus]# k apply -f my-prometheus-clusterRoleBinding.yml
clusterrole.rbac.authorization.k8s.io/my-prometheus created
clusterrolebinding.rbac.authorization.k8s.io/my-prometheus created

After configuring clusterRoleBinding, you can find the endpoints of ingress nginx, but only 80 and 443 exist in the default endpoints, and its metrics port 10254 is not defined in the configuration of daemonset, so it can't be detected here.

[root@bj-k8s-master-56 ~]# k -n ingress-nginx get endpoints -o wide
NAME            ENDPOINTS                                                        AGE
ingress-nginx   10.111.32.178:80,10.111.32.94:80,10.111.32.178:443 + 1 more...   4d17h

spec.ports added in ingress-nginx-svc

- name: metrics
  port: 10254
  targetPort: 10254

Submit changes

[root@docker-182 ingress-nginx]# k apply -f ingress-nginx-svc.yaml
service/ingress-nginx configured


# 10254 already exists in endpoints
[root@bj-k8s-master-56 ~]# k -n ingress-nginx get endpoints
NAME            ENDPOINTS                                                          AGE
ingress-nginx   10.111.32.178:80,10.111.32.94:80,10.111.32.178:10254 + 3 more...   4d17h

Modified in serviceMonitor of ingress nginx

endpoints:
  - interval: 15s
    port: metrics
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

In this way, the target of ingress nginx is added.

Add dashboard of ingress nginx in grafana

Load the dashboard from https://github.com/kubernetes/ingress-nginx/tree/master/deploy/grafana/dashboards

I found that many label s in the dashboard Request Handling Performance are old and useless.

There are many things in nginx progress controller that are useless

metrics api in Kube prom

Kube state metrics and Prometheus adapter are included in Kube Prometheus.

The apiservice of Prometheus adapter is v1beta1.metrics.k8s.io.

[root@bj-k8s-master-56 ~]# k get apiservice |grep prome
v1beta1.metrics.k8s.io                 monitoring/prometheus-adapter   True        54d

This is not true. The apiservice of v1beta1.metrics.k8s.io belongs to the metrics server of kubernetes. After Kube Prom uses it, it will cause some problems in applications that depend on this api, such as hpa resources

[root@bj-k8s-master-56 ~]# kubectl -n kube-system get pod -o wide |grep metrics
metrics-server-7ff49d67b8-mczv8           1/1     Running   2          51d     10.20.245.239   bj-k8s-node-178.tmtgeo.com    <none>           <none>

Test hpav2

hpa v2

[root@docker-182 hpa]# k apply -f .
horizontalpodautoscaler.autoscaling/metrics-app-hpa created
deployment.apps/metrics-app created
service/metrics-app created
servicemonitor.monitoring.coreos.com/metrics-app created

Report errors

 Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Warning  FailedComputeMetricsReplicas  18m (x12 over 21m)  horizontal-pod-autoscaler  Invalid metrics (1 invalid out of 1), last error was: failed to get object metric value: unable to get metric http_requests: unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
  Warning  FailedGetPodsMetric           73s (x80 over 21m)  horizontal-pod-autoscaler  unable to get metric http_requests: unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered

Submit resources under experimental / custom metrics API

Repair the aipservice of metrics server first

[root@docker-182 metrics-server]# k apply -f metrics-apiservice.yaml
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured

[root@docker-182 custom-metrics-api]# ls *.yaml |while read line; do k apply -f ${line};done
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-server-resources created
apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources created
configmap/adapter-config configured
clusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created
servicemonitor.monitoring.coreos.com/sample-app created
service/sample-app created
deployment.apps/sample-app created
horizontalpodautoscaler.autoscaling/sample-app created
[root@docker-182 custom-metrics-api]# pwd
/data/apps/soft/ansible/kubernetes/kube-prometheus-0.3.0/experimental/custom-metrics-api

v1beta1.metrics.k8s.io returns to normal, and v1beta1.custom.metrics.k8s.io reports an error

[root@bj-k8s-master-56 ~]# k get apiservices |grep metric
v1beta1.custom.metrics.k8s.io          monitoring/prometheus-adapter   False (FailedDiscoveryCheck)   3m5s
v1beta1.metrics.k8s.io                 kube-system/metrics-server      True                           54d

The error message is

Status:
  Conditions:
    Last Transition Time:  2020-03-11T07:45:25Z
    Message:               failing or missing response from https://10.20.60.171:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.20.60.171:6443/apis/custom.metrics.k8s.io/v1beta1: 404
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:    <none>


[root@bj-k8s-master-56 ~]# k -n monitoring get pod -o wide |grep adap
prometheus-adapter-68698bc948-qmpvr   1/1     Running   0          7d     10.20.60.171    bj-k8s-node-84.tmtgeo.com     <none>           <none>

I don't really have access to the relevant api

[root@bj-k8s-master-56 ~]# curl -i -k https://10.20.60.171:6443/apis/custom.metrics.k8s.io
HTTP/1.1 404 Not Found
Content-Type: application/json
Date: Wed, 11 Mar 2020 07:56:10 GMT
Content-Length: 229

{
  "paths": [
    "/apis",
    "/apis/metrics.k8s.io",
    "/apis/metrics.k8s.io/v1beta1",
    "/healthz",
    "/healthz/ping",
    "/healthz/poststarthook/generic-apiserver-start-informers",
    "/metrics",
    "/version"
  ]
}

Try a new image on 36.55

No image found for latest label

[root@bj-k8s-node-84 ~]# docker pull quay.io/coreos/k8s-prometheus-adapter-amd64:latest
Error response from daemon: manifest for quay.io/coreos/k8s-prometheus-adapter-amd64:latest not found
  • https://quay.io/repository/coreos/k8s-prometheus-adapter-amd64 :

Found a new tag on its website: v0.6.0 (you can directly modify the image to let kubelet download the image itself, but the network is not good, so manually down to all node s)

[root@bj-k8s-node-84 ~]# docker pull quay.io/coreos/k8s-prometheus-adapter-amd64:v0.6.0

Change image, resubmit

[root@docker-182 adapter]# grep image: prometheus-adapter-deployment.yaml
        image: quay.io/coreos/k8s-prometheus-adapter-amd64:v0.6.0
[root@docker-182 adapter]# k apply -f prometheus-adapter-deployment.yaml 
deployment.apps/prometheus-adapter configured

It's back to normal

[root@bj-k8s-master-56 ~]# k get apiservices |grep custom
v1beta1.custom.metrics.k8s.io          monitoring/prometheus-adapter   True        117m

[root@bj-k8s-master-56 ~]#k -n monitoring get pod -o wide | grep adapter
prometheus-adapter-7b785b6685-z6gfp   1/1     Running   0          91s    10.20.60.183    bj-k8s-node-84.tmtgeo.com     <none>           <none>


[root@bj-k8s-master-56 ~]# curl -i -k https://10.20.60.183:6443/apis/custom.metrics.k8s.io
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 11 Mar 2020 09:41:45 GMT
Content-Length: 303

{
  "kind": "APIGroup",
  "apiVersion": "v1",
  "name": "custom.metrics.k8s.io",
  "versions": [
    {
      "groupVersion": "custom.metrics.k8s.io/v1beta1",
      "version": "v1beta1"
    }
  ],
  "preferredVersion": {
    "groupVersion": "custom.metrics.k8s.io/v1beta1",
    "version": "v1beta1"
  }
}

hpav2 is back to normal

[root@bj-k8s-master-56 ~]# k get hpa
NAME              REFERENCE                TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
metrics-app-hpa   Deployment/metrics-app   36133m/800m   2         10        4          159m
myapp             Deployment/myapp         23%/60%       2         5         2          178m
sample-app        Deployment/sample-app    400m/500m     1         10        1          126m

Now think about it. When Kube Prometheus is deployed, the Prometheus adapter takes up v1beta1.metrics.k8s.io, and the top command can be used normally. This is because the image of quay.io/cores/k8s-prometheus-adapter-amd64: v0.5.0 is the reason that there is only / apis/metrics.k8s.io/v1beta1. This record is in curl above.

Reference resources

  • https://github.com/coreos/prometheus-operator : prometheus-operator github page
  • https://github.com/coreos/kube-prometheus : kube-prometheus github page
  • Https://github.com/coreos/prometheus-operator/blob/master/documentation/api.md: prometheus-operator API documentation
  • Https://www.cnblogs.com/skyflag/p/11480988.html: kubernetes monitoring ultimate solution - Kube promethues (a deployment instance)
Published 2 original articles, won praise 0, visited 57
Private letter follow

Topics: Docker Nginx Kubernetes ansible