k8s log collection and deployment efk elastic search + fluent D + kibana

Posted by pestilence669 on Tue, 26 Oct 2021 07:14:08 +0200

k8s log collection and deployment efk elastic search + fluent D + kibana

After the k8s cluster is built, because the pods are distributed in different node s, the log viewing becomes more complicated. When the number of pods is small, you can query the log through the log command provided by kubectl. With the increase of the number of pods, the complexity of log query also increases exponentially, and the location problem becomes extremely difficult.
There is an urgent need to build a cluster log collection system. At present, there are two mainstream systems:
ELK: Filebeat (Collection), Logstash (filtering), Kafka (buffering), Elasticsearch (storage), Kibana (display)
EFK: fluent D (Collection), Elasticsearch (storage), Kibana (display)
EFK is also an officially recommended scheme. This paper summarizes the construction and deployment of EFK and some pits encountered.

  1. Preface and environment
    Fluentd is a popular open source data collector. We will install fluentd on the Kubernetes cluster node, obtain container log files, filter and convert log data, and then transfer the data to the Elasticsearch cluster, where it is indexed and stored.
    Elasticsearch is a real-time, distributed and scalable search engine, which allows full-text and structured search and log analysis. It is usually used to index and search a large amount of log data, and it can also be used to search many different kinds of documents.
    Kibana is Elasticsearch's powerful data visualization dashboard. Kibana allows you to browse Elasticsearch log data through the Web interface, or customize query criteria to quickly retrieve the log data in elasticccsearch.
$ kubectl get node -o wide

NAME                STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
k8s-elasticsearch   Ready    <none>                 86m   v1.21.0   172.16.66.169   <none>        CentOS Linux 8   4.18.0-305.19.1.el8_4.x86_64   docker://20.10.9
k8s-master          Ready    control-plane,master   86m   v1.21.0   172.16.66.167   <none>        CentOS Linux 8   4.18.0-305.19.1.el8_4.x86_64   docker://20.10.9
k8s-node1           Ready    <none>                 86m   v1.21.0   172.16.66.168   <none>        CentOS Linux 8   4.18.0-305.19.1.el8_4.x86_64   docker://20.10.9
k8s-node2           Ready    <none>                 86m   v1.21.0   172.16.66.170   <none>        CentOS Linux 8   4.18.0-305.19.1.el8_4.x86_64   docker://20.10.9

# node1 and node2 deploy two node express web applications
$ kubectl get pod -o wide

NAME                                  READY   STATUS    RESTARTS   AGE   IP               NODE        NOMINATED NODE   READINESS GATES
websvr1-deployment-67fd6cf9d4-9fcfv   1/1     Running   0          62m   10.244.36.65     k8s-node1   <none>           <none>
websvr1-deployment-67fd6cf9d4-bdhn8   1/1     Running   0          62m   10.244.169.129   k8s-node2   <none>           <none>
websvr1-deployment-67fd6cf9d4-n6xt2   1/1     Running   0          62m   10.244.169.130   k8s-node2   <none>           <none>
websvr2-deployment-67dfc4f674-79wrd   1/1     Running   0          62m   10.244.36.68     k8s-node1   <none>           <none>
websvr2-deployment-67dfc4f674-bwdwx   1/1     Running   0          62m   10.244.36.67     k8s-node1   <none>           <none>
websvr2-deployment-67dfc4f674-ktfml   1/1     Running   0          62m   10.244.36.66     k8s-node1   <none>           <none>

Because the elasticsearch cluster occupies a large amount of memory, in order to avoid competing with the business container for resources, the elasticsearch cluster should be isolated from the business container. (the elasticsearch cluster can also be deployed separately, or even built on the company's intranet. It only needs that fluent can communicate with the elasticsearch network normally.) in the production environment, it should be ensured that at least three physical machines can be used to build the elasticsearch cluster, and the memory of a single physical machine should be more than 2G. Here, elasticsearch is deployed separately in the k8s elasticsearch node as a test, with 8G memory.

  1. Namespace

To differentiate the business, create a new space to deploy elastic search

$ kubectl create ns kube-log
namespace/kube-log created

$ kubectl get ns

NAME              STATUS   AGE
default           Active   3h37m
ingress-nginx     Active   3h5m
kube-log          Active   39s
kube-node-lease   Active   3h37m
kube-public       Active   3h37m
kube-system       Active   3h37m
  1. Create a headless service

There are several components in a cluster: pod-a, svc-b, pod-b1 and pod-b2. When pod-a wants to access the application in pod-b, it will first call svc-b, and then svc-b will randomly forward the request to pod-b1 or pod-b2.
If there is a requirement: pod-a needs to be connected to pod-b1 and pod-b2 at the same time, svc-b forwarding obviously can not meet the requirement. How can pod-a get the IP addresses of pod-b1 and pod-b2? It can be realized by using the handle service.

vim handlessSvc.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  namespace: kube-logging
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
    - port: 9200
      name: rest
    - port: 9300
      name: inter-node

$ kubectl apply -f handlessSvc.yaml

$ kubectl get svc -n kube-log

NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
elasticsearch   ClusterIP   None         <none>        9200/TCP,9300/TCP   76s

#Here, cluster IP is none, which means headless service
  1. Installing nfs

Install nfs on the elasticsearch deployment node. Install nfs on the k8s elasticsearch node here

$ yum install -y nfs-utils
$ systemctl start nfs-server    #The old version of nfs started as systemctl start nfs
$ chkconfig nfs-server on       #Old version: chkconfig NFS server on
$ systemctl enable nfs-server   #The old version is systemctl enable nfs

#Create nfs shared directory
$ mkdir /data/eslog -p
$ vim /etc/exports
> /data/eslog *(rw,no_root_squash)                  #Set the IP address allowed to access the directory, which can be set to *, that is, all IP addresses are allowed
$ exportfs -arv
#Configuration effective
$ systemctl restart nfs-server                      #The old version is systemctl restart nfs
  1. Create sa account and authorize
$ serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-provisioner
$ kubectl apply -f serviceaccount.yaml
$ vim rbac.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services", "endpoints"]
    verbs: ["get"]
  - apiGroups: ["extensions"]
    resources: ["podsecuritypolicies"]
    resourceNames: ["nfs-provisioner"]
    verbs: ["use"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-provisioner
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-provisioner
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-provisioner
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-provisioner
  apiGroup: rbac.authorization.k8s.io
$ kubectl apply -f rbac.yaml
  1. Create a pod and run nfs provisioner (deployed on the node with nfs installed)
$ vim npv.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nfs-provisioner
spec:
  selector:
    matchLabels:
      app: nfs-provisioner
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-provisioner
    spec:
      nodeName: k8s-elasticsearch                  #The deployment to k8s elasticsearch node is specified here. If the es cluster is distributed on different physical machines, the nodeSelector + tag can be used to specify the deployment
      serviceAccount: nfs-provisioner
      containers:
        - name: nfs-provisioner
          image: registry.cn-hangzhou.aliyuncs.com/open-ali/nfs-client-provisioner:latest
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: eslog/nfs                      #PROVISIONER_NAME is eslog/nfs. eslog/nfs should be consistent with the provisioner of the following storageclass
            - name: NFS_SERVER
              value: 172.16.66.169                  #This needs to write the ip address where the nfs server is located. Here is the k8s elastic search address
            - name: NFS_PATH
              value: /data/eslog                    #share directory
      volumes:
        - name: nfs-client-root
          nfs:
            server: 172.16.66.169                   #This is the ip address of the nfs server. You need to write your own nfs address
            path: /data/eslog                        
$ kubectl apply -f npv.yaml
  1. Create storageclass
$ vim class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: es-block-storage
provisioner: eslog/nfs
$ kubectl apply -f class.yaml

$ kubectl get pod -o wide

NAME                                  READY   STATUS              RESTARTS   AGE     IP               NODE                NOMINATED NODE   READINESS GATES
nfs-provisioner-75cf88b6c9-wg6b6      0/1     Running             0          6m41s   <none>           k8s-elasticsearch   <none>           <none>
websvr1-deployment-67fd6cf9d4-9fcfv   1/1     Running             0          5h20m   10.244.36.65     k8s-node1           <none>           <none>
websvr1-deployment-67fd6cf9d4-bdhn8   1/1     Running             0          5h20m   10.244.169.129   k8s-node2           <none>           <none>
websvr1-deployment-67fd6cf9d4-n6xt2   1/1     Running             0          5h20m   10.244.169.130   k8s-node2           <none>           <none>
websvr2-deployment-67dfc4f674-79wrd   1/1     Running             0          5h19m   10.244.36.68     k8s-node1           <none>           <none>
websvr2-deployment-67dfc4f674-bwdwx   1/1     Running             0          5h19m   10.244.36.67     k8s-node1           <none>           <none>
websvr2-deployment-67dfc4f674-ktfml   1/1     Running             0          5h19m   10.244.36.66     k8s-node1           <none>           <none>

$ kubectl get storageclass

NAME               PROVISIONER   RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
es-block-storage   eslog/nfs     Delete          Immediate           false                  55m
  1. Deploy elasticsearch

elasticsearch is deployed in stateful and orderly manner

$ vim es.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
  namespace: kube-log
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      nodeName: k8s-elasticsearch                        #The deployment to k8s elasticsearch node is specified here. If the es cluster is distributed on different physical machines, the nodeSelector + tag can be used to specify the deployment
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
        imagePullPolicy: IfNotPresent
        resources:
            limits:
              cpu: 1000m                                 #A single container can use up to 1 CPU
            requests:
              cpu: 100m                                  #A single container is guaranteed to have at least 0.1 CPU
        ports:
        - containerPort: 9200
          name: rest                                     #Consistent with the handle service
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: cluster.name                           #Cluster name
            value: k8s-logs   
          - name: node.name                              #Node name, obtained through matedata.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: discovery.seed_hosts                   #Set the discovery method of node interconnection in elasticsearch cluster. Since they are all under the same namespace, we can shorten it to es Cluster - [0,1,2]. Elasticsearch
            value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
          - name: cluster.initial_master_nodes 
            value: "es-cluster-0,es-cluster-1,es-cluster-2"
          - name: ES_JAVA_OPTS
            value: "-Xms512m -Xmx512m"                   #Tell the JVM to use a minimum and maximum heap of 512MB
      initContainers:                                    #Here, several Init containers running before the main application are defined. These initial containers are executed in the defined order. The main application container will not be started until the execution is completed.
      - name: fix-permissions
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
       #The first container named fix permissions is used to run the chown command to change the users and groups of the Elasticsearch data directory to 1000:1000 (the UID of the Elasticsearch user).
       #By default, Kubernetes uses root to mount the data directory, which makes Elasticsearch unable to the data directory
      - name: increase-vm-max-map
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
       #The second container named increase VM Max map is used to increase the operating system's limit on mmap count. By default, this value may be too low, resulting in an out of memory error
      - name: increase-fd-ulimit
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
       #The last initialization container is used to execute the ulimit command to increase the maximum number of open file descriptors.
       #In addition, the elastic notes for production use document also mentioned that it is best to disable swap for performance reasons. Of course, for Kubernetes clusters, it is also best to disable swap partition
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]                               #It can only be mount ed to a single node for reading and writing
      storageClassName: es-block-storage                             #This object needs to be created in advance. We use NFS as the storage backend, so we need to install a corresponding provisioner driver
      resources:
        requests:
          storage: 10Gi                                              #Each PV size is set to 10G
$ kubectl apply -f es.yaml

$ kubectl get pod -owide -n kube-log

NAME           READY   STATUS     RESTARTS   AGE   IP       NODE                NOMINATED NODE   READINESS GATES
es-cluster-0   0/1     Init:0/3   0          10m   <none>   k8s-elasticsearch   <none>           <none>

#You can see that it is always initializing, which is caused by the failure of elasticsearch:7.2.0 image pull. You can manually pull on the node where es is deployed:
$ docker pull elasticsearch:7.2.0
#Rename to mirror name in yaml:
$ docker tag 0efa6a3de177 docker.elastic.co/elasticsearch/elasticsearch:7.2.0

Check the running status again and find that it is still initializing. After consulting a large number of data, it is found that in CentOS 8, the kubelet configuration file needs to be modified manually. Modify it on the master node:

$ vim /etc/kubernetes/manifests/kube-apiserver.yaml

#Add at the end of spec.containers.command:
- --feature-gates=RemoveSelfLink=false

#Restart kubelet
service kubelet restart

#Review es status again:
$ kubectl get pod -owide -n kube-log

NAME           READY   STATUS    RESTARTS   AGE     IP              NODE                NOMINATED NODE   READINESS GATES
es-cluster-0   1/1     Running   0          21m     10.244.117.10   k8s-elasticsearch   <none>           <none>
es-cluster-1   1/1     Running   0          2m11s   10.244.117.11   k8s-elasticsearch   <none>           <none>
es-cluster-2   1/1     Running   0          115s    10.244.117.12   k8s-elasticsearch   <none>           <none>

$ kubectl get svc -n kube-log

NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
elasticsearch   ClusterIP   None         <none>        9200/TCP,9300/TCP   3h48m

At this time, elasticsearch is successfully deployed

  1. kibana deployment
$ kibana.yaml
apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: kube-log
  labels:
    app: kibana
spec:
  type: NodePort                                                #For test convenience, we set the Service to NodePort type
  ports:
  - port: 5601
  selector:
    app: kibana
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: kube-log
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      nodeName: k8s-elasticsearch                            #The deployment to k8s elasticsearch node is specified here. If the es cluster is distributed on different physical machines, the nodeSelector + tag can be used to specify the deployment
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.2.0         #kibana version should be consistent with es version
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        env:
          - name: ELASTICSEARCH_URL
            value: http://elasticsearch:9200 # can be set as the DNS address of the handle service
        ports:
        - containerPort: 5601
$ kubectl apply -f kibana.yaml

#Here, if kibana cannot be pulled down for a long time, you can manually pull it from the official docker by referring to the above es deployment image
$ kubectl get pod -o wide -n kube-log

NAME                      READY   STATUS    RESTARTS   AGE     IP              NODE                NOMINATED NODE   READINESS GATES
es-cluster-0              1/1     Running   0          33m     10.244.117.10   k8s-elasticsearch   <none>           <none>
es-cluster-1              1/1     Running   0          13m     10.244.117.11   k8s-elasticsearch   <none>           <none>
es-cluster-2              1/1     Running   0          13m     10.244.117.12   k8s-elasticsearch   <none>           <none>
kibana-5dd9f479dc-gbprl   1/1     Running   0          4m59s   10.244.117.13   k8s-elasticsearch   <none>           <none>

$ kubectl get svc -n kube-log -owide

NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE     SELECTOR
elasticsearch   ClusterIP   None             <none>        9200/TCP,9300/TCP   3h57m   app=elasticsearch
kibana          NodePort    10.102.222.139   <none>        5601:32591/TCP      5m11s   app=kibana

At this time, the kibana log management system can be accessed normally by accessing the elasticsearch server through the public network on port 32591. Finally, we need to deploy fluent D and send the logs of each pod to the elasticsearch service.

  1. Fluent deployment

Using the daemonset controller to deploy the fluent D component can ensure that each node in the cluster can run a pod copy of the same fluent D, so that the logs of each node in the k8s cluster can be collected. In the k8s cluster, the input and output logs of the container application will be redirected to the json file in the node node, Fluent D can tail and filter logs and convert logs into specified formats and send them to the elastic search cluster. In addition to container logs, fluent D can also collect the logs of kubelet, Kube proxy and docker

$ fluentd.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: kube-log
  labels:
    app: fluentd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
  labels:
    app: fluentd
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  verbs:
  - get
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluentd
  namespace: kube-log
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-log
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1
        imagePullPolicy: IfNotPresent
        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "elasticsearch.kube-logging.svc.cluster.local"
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          - name: FLUENT_ELASTICSEARCH_SCHEME
            value: "http"
          - name: FLUENTD_SYSTEMD_CONF
            value: disable
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
$ kubectl apply -f fluentd

$ kubectl get pod -owide -n kube-log

NAME                      READY   STATUS    RESTARTS   AGE    IP               NODE                NOMINATED NODE   READINESS GATES
es-cluster-0              1/1     Running   0          20h    10.244.117.10    k8s-elasticsearch   <none>           <none>
es-cluster-1              1/1     Running   0          19h    10.244.117.11    k8s-elasticsearch   <none>           <none>
es-cluster-2              1/1     Running   0          19h    10.244.117.12    k8s-elasticsearch   <none>           <none>
fluentd-65ngd             1/1     Running   0          141m   10.244.36.69     k8s-node1           <none>           <none>
fluentd-h8j2z             1/1     Running   0          141m   10.244.117.14    k8s-elasticsearch   <none>           <none>
fluentd-prsgv             1/1     Running   0          141m   10.244.169.131   k8s-node2           <none>           <none>
fluentd-wtsf9             1/1     Running   0          141m   10.244.235.193   k8s-master          <none>           <none>
kibana-5f64ccf544-4wjwv   1/1     Running   0          66m    10.244.117.15    k8s-elasticsearch   <none>           <none>

So far, the log collection cluster has been deployed.

  1. verification
    Now you can access kibana log management system by accessing kibana nodePort. The use of kibana system will be discussed separately in the blog later
    This article has been run through as a whole. If there are omissions and errors, please correct them

Topics: Docker ElasticSearch Kubernetes