Cluster network monitoring using KubeNurse

Posted by OuchMedia on Wed, 12 Jan 2022 18:54:50 +0100

WeChat Public Number: Operations Development Story by Joke

Preface

In Kubernetes, networks are provided through third-party network plug-ins, whose implementation is so complex that they often run into difficulties when troubleshooting network problems. So what is the way to monitor all the network connections in a cluster?

kubenurse is one such project that monitors all network connections in a cluster and provides monitoring metrics for Prometheus to collect.

Kubenurse

The deployment of kubenurse is simple, using Daemonset to deploy on cluster nodes, and Yaml files are located in the example directory of the project.

After successful deployment, a check request is sent to/alive every 5 seconds, and then a variety of methods are run inside it to detect the cluster network in an all-round way. To prevent excessive network traffic, the check results are cached for 3 seconds. Its detection mechanism is as follows:From the figure above, we can see that kubenurse detects ingress, dns, apiserver, kube-proxy.

All checks create public metrics that can be used to detect:

  • SDN network latency and errors

  • Network latency and errors between Kubelet s

  • Pod and apiserver communication problems

  • Ingress round-trip network latencies and errors

  • Service round-trip network latencies and errors (kube-proxy)

  • Kube-apiserver problem

  • Kube-dns (CoreDns) error

  • External DNS Resolution Error (ingress url Resolution)

Then these data are mainly reflected by two monitoring indicators:

  • kubenurse_errors_total: error counters by error type

  • kubenurse_request_duration: Distribution of request time by type

These indicators are identified by Type, corresponding to several different detection targets:

  • api_server_direct: Detect API Server directly from node

  • api_server_dns: Detect API Server from node through DNS

  • me_ingress: Detect the Service through Ingress

  • me_service: Use Service to detect this service service

  • path_$KUBELET_HOSTNAME: mutual detection between nodes

Then these indicators are divided by P50, P90, P99 quantiles, which can confirm the status of cluster network according to different situations.

Install Deployment

Deployment is done directly here using official deployment files. But there are a few changes that need to be made. (1) First clone the code locally

git clone https://github.com/postfinance/kubenurse.git

(2) Enter the example directory and modify ingress.yaml configuration, mainly add the domain name, as follows.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  name: kubenurse
  namespace: kube-system
spec:
  rules:
  - host: kubenurse-test.coolops.cn
    http:
      paths:
      - backend:
          serviceName: kubenurse
          servicePort: 8080


(2) Update daemonset.yaml configuration, mainly changes ingress entry domain name, as follows.

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: kubenurse
  name: kubenurse
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: kubenurse
  template:
    metadata:
      labels:
        app: kubenurse
      annotations:
        prometheus.io/path: "/metrics"
        prometheus.io/port: "8080"
        prometheus.io/scheme: "http"
        prometheus.io/scrape: "true"
    spec:
      serviceAccountName: nurse
      containers:
      - name: kubenurse
        env:
        - name: KUBENURSE_INGRESS_URL
          value: kubenurse-test.coolops.cn      # Where to change
        - name: KUBENURSE_SERVICE_URL
          value: http://kubenurse.kube-system.svc.cluster.local:8080
        - name: KUBENURSE_NAMESPACE
          value: kube-system
        - name: KUBENURSE_NEIGHBOUR_FILTER
          value: "app=kubenurse"
        image: "postfinance/kubenurse:v1.2.0"
        ports:
        - containerPort: 8080
          protocol: TCP
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Equal
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Equal


(4) Create a new ServiceMonitor to obtain the indicator data as follows:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubenurse
  namespace: monitoring
  labels:
    k8s-app: kubenurse
spec:
  jobLabel: k8s-app
  endpoints:
  - port: "8080-8080" 
    interval: 30s
    scheme: http
  selector:
    matchLabels:
     app: kubenurse
  namespaceSelector:
    matchNames:
    - kube-system

(5) Deploy the application and execute the following commands in the example directory.

kubectl apply -f .

(6) Wait for all applications to become running as follows.

# kubectl get all -n kube-system -l app=kubenurse
NAME                  READY   STATUS    RESTARTS   AGE
pod/kubenurse-fznsw   1/1     Running   0          17h
pod/kubenurse-n52rq   1/1     Running   0          17h
pod/kubenurse-nwtl4   1/1     Running   0          17h
pod/kubenurse-xp92p   1/1     Running   0          17h
pod/kubenurse-z2ksz   1/1     Running   0          17h

NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/kubenurse   ClusterIP   10.96.229.244   <none>        8080/TCP   17h

NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/kubenurse   5         5         5       5            5           <none>          17h


(7) Go to prometheus to see if you are getting the data normally.Check to see if the indicator is working.(8) Now you can draw a picture on grafana showing the monitoring data as follows.

Reference Documents

[1]https://github.com/postfinance/kubenurse[2]https://github.com/postfinance/kubenurse/tree/master/examples

Public Number: Operations and Maintenance Development Story

github: https://github.com/orgs/sunsharing-note/dashboard

Love life, love operations

If you think the article is good, click on the top right corner to select Send to Friends or Forward to Friends Circle. Your support and encouragement are my greatest motivation. Please pay attention to me if you like it.

Scavenging 2D Code

Focus on me and maintain premium content irregularly

Reminder

If you like this article, please share it with your circle of friends and follow me for more information.

                                          ........................

Topics: Docker Kubernetes network