k8s log collection and deployment efk elastic search + fluent D + kibana
After the k8s cluster is built, because the pods are distributed in different node s, the log viewing becomes more complicated. When the number of pods is small, you can query the log through the log command provided by kubectl. With the increase of the number of pods, the complexity of log query also increases exponentially, and the location problem becomes extremely difficult.
There is an urgent need to build a cluster log collection system. At present, there are two mainstream systems:
ELK: Filebeat (Collection), Logstash (filtering), Kafka (buffering), Elasticsearch (storage), Kibana (display)
EFK: fluent D (Collection), Elasticsearch (storage), Kibana (display)
EFK is also an officially recommended scheme. This paper summarizes the construction and deployment of EFK and some pits encountered.
- Technical stack capabilities required for this article: k8s,docker,node,express
- linux system version: CentOS 8.4
- k8s version: 1.21
- For k8s deployment under CentOS 8. X, please refer to: k8s installation and deployment (latest verification of CentOS 8. X, hands-on teaching)
- For the experssweb application setup for testing, please refer to: k8s deploy node express web application and map public network access with ingress nginx (latest verification of CentOS 8. X, hands-on teaching)
- Preface and environment
Fluentd is a popular open source data collector. We will install fluentd on the Kubernetes cluster node, obtain container log files, filter and convert log data, and then transfer the data to the Elasticsearch cluster, where it is indexed and stored.
Elasticsearch is a real-time, distributed and scalable search engine, which allows full-text and structured search and log analysis. It is usually used to index and search a large amount of log data, and it can also be used to search many different kinds of documents.
Kibana is Elasticsearch's powerful data visualization dashboard. Kibana allows you to browse Elasticsearch log data through the Web interface, or customize query criteria to quickly retrieve the log data in elasticccsearch.
$ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-elasticsearch Ready <none> 86m v1.21.0 172.16.66.169 <none> CentOS Linux 8 4.18.0-305.19.1.el8_4.x86_64 docker://20.10.9 k8s-master Ready control-plane,master 86m v1.21.0 172.16.66.167 <none> CentOS Linux 8 4.18.0-305.19.1.el8_4.x86_64 docker://20.10.9 k8s-node1 Ready <none> 86m v1.21.0 172.16.66.168 <none> CentOS Linux 8 4.18.0-305.19.1.el8_4.x86_64 docker://20.10.9 k8s-node2 Ready <none> 86m v1.21.0 172.16.66.170 <none> CentOS Linux 8 4.18.0-305.19.1.el8_4.x86_64 docker://20.10.9 # node1 and node2 deploy two node express web applications $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES websvr1-deployment-67fd6cf9d4-9fcfv 1/1 Running 0 62m 10.244.36.65 k8s-node1 <none> <none> websvr1-deployment-67fd6cf9d4-bdhn8 1/1 Running 0 62m 10.244.169.129 k8s-node2 <none> <none> websvr1-deployment-67fd6cf9d4-n6xt2 1/1 Running 0 62m 10.244.169.130 k8s-node2 <none> <none> websvr2-deployment-67dfc4f674-79wrd 1/1 Running 0 62m 10.244.36.68 k8s-node1 <none> <none> websvr2-deployment-67dfc4f674-bwdwx 1/1 Running 0 62m 10.244.36.67 k8s-node1 <none> <none> websvr2-deployment-67dfc4f674-ktfml 1/1 Running 0 62m 10.244.36.66 k8s-node1 <none> <none>
Because the elasticsearch cluster occupies a large amount of memory, in order to avoid competing with the business container for resources, the elasticsearch cluster should be isolated from the business container. (the elasticsearch cluster can also be deployed separately, or even built on the company's intranet. It only needs that fluent can communicate with the elasticsearch network normally.) in the production environment, it should be ensured that at least three physical machines can be used to build the elasticsearch cluster, and the memory of a single physical machine should be more than 2G. Here, elasticsearch is deployed separately in the k8s elasticsearch node as a test, with 8G memory.
- Namespace
To differentiate the business, create a new space to deploy elastic search
$ kubectl create ns kube-log namespace/kube-log created $ kubectl get ns NAME STATUS AGE default Active 3h37m ingress-nginx Active 3h5m kube-log Active 39s kube-node-lease Active 3h37m kube-public Active 3h37m kube-system Active 3h37m
- Create a headless service
There are several components in a cluster: pod-a, svc-b, pod-b1 and pod-b2. When pod-a wants to access the application in pod-b, it will first call svc-b, and then svc-b will randomly forward the request to pod-b1 or pod-b2.
If there is a requirement: pod-a needs to be connected to pod-b1 and pod-b2 at the same time, svc-b forwarding obviously can not meet the requirement. How can pod-a get the IP addresses of pod-b1 and pod-b2? It can be realized by using the handle service.
vim handlessSvc.yaml
kind: Service apiVersion: v1 metadata: name: elasticsearch namespace: kube-logging labels: app: elasticsearch spec: selector: app: elasticsearch clusterIP: None ports: - port: 9200 name: rest - port: 9300 name: inter-node
$ kubectl apply -f handlessSvc.yaml $ kubectl get svc -n kube-log NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 76s #Here, cluster IP is none, which means headless service
- Installing nfs
Install nfs on the elasticsearch deployment node. Install nfs on the k8s elasticsearch node here
$ yum install -y nfs-utils $ systemctl start nfs-server #The old version of nfs started as systemctl start nfs $ chkconfig nfs-server on #Old version: chkconfig NFS server on $ systemctl enable nfs-server #The old version is systemctl enable nfs #Create nfs shared directory $ mkdir /data/eslog -p $ vim /etc/exports > /data/eslog *(rw,no_root_squash) #Set the IP address allowed to access the directory, which can be set to *, that is, all IP addresses are allowed $ exportfs -arv #Configuration effective $ systemctl restart nfs-server #The old version is systemctl restart nfs
- Create sa account and authorize
$ serviceaccount.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: nfs-provisioner
$ kubectl apply -f serviceaccount.yaml
$ vim rbac.yaml
kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: nfs-provisioner-runner rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["create", "update", "patch"] - apiGroups: [""] resources: ["services", "endpoints"] verbs: ["get"] - apiGroups: ["extensions"] resources: ["podsecuritypolicies"] resourceNames: ["nfs-provisioner"] verbs: ["use"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: run-nfs-provisioner subjects: - kind: ServiceAccount name: nfs-provisioner namespace: default roleRef: kind: ClusterRole name: nfs-provisioner-runner apiGroup: rbac.authorization.k8s.io --- kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-provisioner rules: - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "watch", "create", "update", "patch"] --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-provisioner subjects: - kind: ServiceAccount name: nfs-provisioner namespace: default roleRef: kind: Role name: leader-locking-nfs-provisioner apiGroup: rbac.authorization.k8s.io
$ kubectl apply -f rbac.yaml
- Create a pod and run nfs provisioner (deployed on the node with nfs installed)
$ vim npv.yaml
kind: Deployment apiVersion: apps/v1 metadata: name: nfs-provisioner spec: selector: matchLabels: app: nfs-provisioner replicas: 1 strategy: type: Recreate template: metadata: labels: app: nfs-provisioner spec: nodeName: k8s-elasticsearch #The deployment to k8s elasticsearch node is specified here. If the es cluster is distributed on different physical machines, the nodeSelector + tag can be used to specify the deployment serviceAccount: nfs-provisioner containers: - name: nfs-provisioner image: registry.cn-hangzhou.aliyuncs.com/open-ali/nfs-client-provisioner:latest imagePullPolicy: IfNotPresent volumeMounts: - name: nfs-client-root mountPath: /persistentvolumes env: - name: PROVISIONER_NAME value: eslog/nfs #PROVISIONER_NAME is eslog/nfs. eslog/nfs should be consistent with the provisioner of the following storageclass - name: NFS_SERVER value: 172.16.66.169 #This needs to write the ip address where the nfs server is located. Here is the k8s elastic search address - name: NFS_PATH value: /data/eslog #share directory volumes: - name: nfs-client-root nfs: server: 172.16.66.169 #This is the ip address of the nfs server. You need to write your own nfs address path: /data/eslog
$ kubectl apply -f npv.yaml
- Create storageclass
$ vim class.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: es-block-storage provisioner: eslog/nfs
$ kubectl apply -f class.yaml $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nfs-provisioner-75cf88b6c9-wg6b6 0/1 Running 0 6m41s <none> k8s-elasticsearch <none> <none> websvr1-deployment-67fd6cf9d4-9fcfv 1/1 Running 0 5h20m 10.244.36.65 k8s-node1 <none> <none> websvr1-deployment-67fd6cf9d4-bdhn8 1/1 Running 0 5h20m 10.244.169.129 k8s-node2 <none> <none> websvr1-deployment-67fd6cf9d4-n6xt2 1/1 Running 0 5h20m 10.244.169.130 k8s-node2 <none> <none> websvr2-deployment-67dfc4f674-79wrd 1/1 Running 0 5h19m 10.244.36.68 k8s-node1 <none> <none> websvr2-deployment-67dfc4f674-bwdwx 1/1 Running 0 5h19m 10.244.36.67 k8s-node1 <none> <none> websvr2-deployment-67dfc4f674-ktfml 1/1 Running 0 5h19m 10.244.36.66 k8s-node1 <none> <none> $ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE es-block-storage eslog/nfs Delete Immediate false 55m
- Deploy elasticsearch
elasticsearch is deployed in stateful and orderly manner
$ vim es.yaml
apiVersion: apps/v1 kind: StatefulSet metadata: name: es-cluster namespace: kube-log spec: serviceName: elasticsearch replicas: 3 selector: matchLabels: app: elasticsearch template: metadata: labels: app: elasticsearch spec: nodeName: k8s-elasticsearch #The deployment to k8s elasticsearch node is specified here. If the es cluster is distributed on different physical machines, the nodeSelector + tag can be used to specify the deployment containers: - name: elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0 imagePullPolicy: IfNotPresent resources: limits: cpu: 1000m #A single container can use up to 1 CPU requests: cpu: 100m #A single container is guaranteed to have at least 0.1 CPU ports: - containerPort: 9200 name: rest #Consistent with the handle service protocol: TCP - containerPort: 9300 name: inter-node protocol: TCP volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data env: - name: cluster.name #Cluster name value: k8s-logs - name: node.name #Node name, obtained through matedata.name valueFrom: fieldRef: fieldPath: metadata.name - name: discovery.seed_hosts #Set the discovery method of node interconnection in elasticsearch cluster. Since they are all under the same namespace, we can shorten it to es Cluster - [0,1,2]. Elasticsearch value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch" - name: cluster.initial_master_nodes value: "es-cluster-0,es-cluster-1,es-cluster-2" - name: ES_JAVA_OPTS value: "-Xms512m -Xmx512m" #Tell the JVM to use a minimum and maximum heap of 512MB initContainers: #Here, several Init containers running before the main application are defined. These initial containers are executed in the defined order. The main application container will not be started until the execution is completed. - name: fix-permissions image: busybox imagePullPolicy: IfNotPresent command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"] securityContext: privileged: true volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data #The first container named fix permissions is used to run the chown command to change the users and groups of the Elasticsearch data directory to 1000:1000 (the UID of the Elasticsearch user). #By default, Kubernetes uses root to mount the data directory, which makes Elasticsearch unable to the data directory - name: increase-vm-max-map image: busybox imagePullPolicy: IfNotPresent command: ["sysctl", "-w", "vm.max_map_count=262144"] securityContext: privileged: true #The second container named increase VM Max map is used to increase the operating system's limit on mmap count. By default, this value may be too low, resulting in an out of memory error - name: increase-fd-ulimit image: busybox imagePullPolicy: IfNotPresent command: ["sh", "-c", "ulimit -n 65536"] securityContext: privileged: true #The last initialization container is used to execute the ulimit command to increase the maximum number of open file descriptors. #In addition, the elastic notes for production use document also mentioned that it is best to disable swap for performance reasons. Of course, for Kubernetes clusters, it is also best to disable swap partition volumeClaimTemplates: - metadata: name: data labels: app: elasticsearch spec: accessModes: [ "ReadWriteOnce" ] #It can only be mount ed to a single node for reading and writing storageClassName: es-block-storage #This object needs to be created in advance. We use NFS as the storage backend, so we need to install a corresponding provisioner driver resources: requests: storage: 10Gi #Each PV size is set to 10G
$ kubectl apply -f es.yaml $ kubectl get pod -owide -n kube-log NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES es-cluster-0 0/1 Init:0/3 0 10m <none> k8s-elasticsearch <none> <none> #You can see that it is always initializing, which is caused by the failure of elasticsearch:7.2.0 image pull. You can manually pull on the node where es is deployed: $ docker pull elasticsearch:7.2.0 #Rename to mirror name in yaml: $ docker tag 0efa6a3de177 docker.elastic.co/elasticsearch/elasticsearch:7.2.0
Check the running status again and find that it is still initializing. After consulting a large number of data, it is found that in CentOS 8, the kubelet configuration file needs to be modified manually. Modify it on the master node:
$ vim /etc/kubernetes/manifests/kube-apiserver.yaml #Add at the end of spec.containers.command: - --feature-gates=RemoveSelfLink=false #Restart kubelet service kubelet restart #Review es status again: $ kubectl get pod -owide -n kube-log NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES es-cluster-0 1/1 Running 0 21m 10.244.117.10 k8s-elasticsearch <none> <none> es-cluster-1 1/1 Running 0 2m11s 10.244.117.11 k8s-elasticsearch <none> <none> es-cluster-2 1/1 Running 0 115s 10.244.117.12 k8s-elasticsearch <none> <none> $ kubectl get svc -n kube-log NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 3h48m
At this time, elasticsearch is successfully deployed
- kibana deployment
$ kibana.yaml
apiVersion: v1 kind: Service metadata: name: kibana namespace: kube-log labels: app: kibana spec: type: NodePort #For test convenience, we set the Service to NodePort type ports: - port: 5601 selector: app: kibana --- apiVersion: apps/v1 kind: Deployment metadata: name: kibana namespace: kube-log labels: app: kibana spec: replicas: 1 selector: matchLabels: app: kibana template: metadata: labels: app: kibana spec: nodeName: k8s-elasticsearch #The deployment to k8s elasticsearch node is specified here. If the es cluster is distributed on different physical machines, the nodeSelector + tag can be used to specify the deployment containers: - name: kibana image: docker.elastic.co/kibana/kibana:7.2.0 #kibana version should be consistent with es version imagePullPolicy: IfNotPresent resources: limits: cpu: 1000m requests: cpu: 100m env: - name: ELASTICSEARCH_URL value: http://elasticsearch:9200 # can be set as the DNS address of the handle service ports: - containerPort: 5601
$ kubectl apply -f kibana.yaml #Here, if kibana cannot be pulled down for a long time, you can manually pull it from the official docker by referring to the above es deployment image $ kubectl get pod -o wide -n kube-log NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES es-cluster-0 1/1 Running 0 33m 10.244.117.10 k8s-elasticsearch <none> <none> es-cluster-1 1/1 Running 0 13m 10.244.117.11 k8s-elasticsearch <none> <none> es-cluster-2 1/1 Running 0 13m 10.244.117.12 k8s-elasticsearch <none> <none> kibana-5dd9f479dc-gbprl 1/1 Running 0 4m59s 10.244.117.13 k8s-elasticsearch <none> <none> $ kubectl get svc -n kube-log -owide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 3h57m app=elasticsearch kibana NodePort 10.102.222.139 <none> 5601:32591/TCP 5m11s app=kibana
At this time, the kibana log management system can be accessed normally by accessing the elasticsearch server through the public network on port 32591. Finally, we need to deploy fluent D and send the logs of each pod to the elasticsearch service.
- Fluent deployment
Using the daemonset controller to deploy the fluent D component can ensure that each node in the cluster can run a pod copy of the same fluent D, so that the logs of each node in the k8s cluster can be collected. In the k8s cluster, the input and output logs of the container application will be redirected to the json file in the node node, Fluent D can tail and filter logs and convert logs into specified formats and send them to the elastic search cluster. In addition to container logs, fluent D can also collect the logs of kubelet, Kube proxy and docker
$ fluentd.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: fluentd namespace: kube-log labels: app: fluentd --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: fluentd labels: app: fluentd rules: - apiGroups: - "" resources: - pods - namespaces verbs: - get - list - watch --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd roleRef: kind: ClusterRole name: fluentd apiGroup: rbac.authorization.k8s.io subjects: - kind: ServiceAccount name: fluentd namespace: kube-log --- apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: kube-log labels: app: fluentd spec: selector: matchLabels: app: fluentd template: metadata: labels: app: fluentd spec: serviceAccount: fluentd serviceAccountName: fluentd tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: fluentd image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 imagePullPolicy: IfNotPresent env: - name: FLUENT_ELASTICSEARCH_HOST value: "elasticsearch.kube-logging.svc.cluster.local" - name: FLUENT_ELASTICSEARCH_PORT value: "9200" - name: FLUENT_ELASTICSEARCH_SCHEME value: "http" - name: FLUENTD_SYSTEMD_CONF value: disable resources: limits: memory: 512Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true terminationGracePeriodSeconds: 30 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers
$ kubectl apply -f fluentd $ kubectl get pod -owide -n kube-log NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES es-cluster-0 1/1 Running 0 20h 10.244.117.10 k8s-elasticsearch <none> <none> es-cluster-1 1/1 Running 0 19h 10.244.117.11 k8s-elasticsearch <none> <none> es-cluster-2 1/1 Running 0 19h 10.244.117.12 k8s-elasticsearch <none> <none> fluentd-65ngd 1/1 Running 0 141m 10.244.36.69 k8s-node1 <none> <none> fluentd-h8j2z 1/1 Running 0 141m 10.244.117.14 k8s-elasticsearch <none> <none> fluentd-prsgv 1/1 Running 0 141m 10.244.169.131 k8s-node2 <none> <none> fluentd-wtsf9 1/1 Running 0 141m 10.244.235.193 k8s-master <none> <none> kibana-5f64ccf544-4wjwv 1/1 Running 0 66m 10.244.117.15 k8s-elasticsearch <none> <none>
So far, the log collection cluster has been deployed.
- verification
Now you can access kibana log management system by accessing kibana nodePort. The use of kibana system will be discussed separately in the blog later
This article has been run through as a whole. If there are omissions and errors, please correct them