pod troubleshooting of k8s operation and maintenance

Posted by DJ_CARO on Mon, 01 Nov 2021 07:01:12 +0100

pod troubleshooting of k8s operation and maintenance

K8S is an open source application for managing container applications on multiple hosts in the cloud platform. Kubernetes aims to make deploying container applications simple and efficient

K8S Core advantages of:
 1,be based on yaml The file realizes the automatic creation and deletion of containers
 2,More quickly realize the elastic horizontal expansion of business
 3,Dynamically discover newly expanded containers and automatically provide access to users
 4,Simpler and faster business code upgrade and rollback

Generally speaking, if the pod is in an abnormal state, you can execute the following commands to view the pod status

kubectl get pod <pod-name> -o yaml   #View pod configuration
kubcctl get pod <pod-name> -o wide   #View information such as pod operation node 
kubectl describe pod <pod-name>      #View pod events
kubectl logs <pod-name>              #View pod log

Introduction to Pod

 1. pod yes k8s Minimum cell in
 2. One pod You can run one container or multiple containers in
 3. When running multiple containers, these containers are scheduled together
 4. Pod Its life cycle is short and will not heal itself. It is an entity that is destroyed when it is used up
 5. Usually we pass Controller To create and manage pod of
 
Pod Life cycle: initialize container, operate before startup, ready probe, live probe, delete pod operation

Pod creation process in K8S

1,Client to apiserver Initiate a create pod request
2,apiserver Received pod After the request is created, a message containing the creation information is generated yaml
3,apiserver take yaml Information writing etcd database
4,according to scheduler Scheduler is pod distribution node host
5,node kubelet New detected Pod Dispatch, come here, pass container runtime Run this pod
6,kubelet get pod Status and update to apiserver in

Pod is always Pending

pending indicates that the pod has not been scheduled to a Node

 You can view it with the following command
 kubectl describe pod <pod-name>  

Possible causes:

 1,Insufficient resources, all in the cluster Node Are not satisfied with this Pod Requested CPU,Resources such as memory or temporary storage space. The solution is to reduce resource utilization and delete unused resources Pod Or add a new one Node node``  kubectl describe node  #You can view node resource condition 2. The HostPort port has been occupied. It is generally recommended to use Service open Service port 3, which does not meet the nodeSelector. If the Pod contains nodeSelector and specifies the labels to be included in the node, the scheduler will only consider scheduling the Pod to the nodes containing these labels, If no node has these labels or other conditions of the node with these labels are not met, it will not be able to schedule 4. If the affinity nodeAffinity: node affinity is not met, it can be regarded as an enhanced nodeSelector to restrict pods from being scheduled to only a certain part of nodes. Node podAffinity: Pod affinity is used to schedule some associated pods to the same place, It can refer to the same node or nodes in the same availability zone. Podanti affinity: Pod anti affinity is used to avoid scheduling a certain type of Pod to the same place and avoid single point of failure. For example, all Pod copies of cluster DNS services are scheduled to different nodes to avoid that one node hangs, resulting in the failure of DNS resolution of the whole cluster and business interruption


Pod is always in Waiting or ContainerCreating state

First, view it through the following command:
kubectl describe pod

Possible causes:

1. Image pull fails, such as image configuration error, Kubelet cannot access the image, private image key configuration error, image too large, pull timeout, etc. 2. CNI network error. Generally, it is necessary to check the configuration of CNI network plug-in, such as unable to configure Pod, unable to assign IP address. 3. The container cannot be started, You need to check whether the correct image is packaged or whether the correct container parameter 4 is configured. The storage volume that the container depends on cannot be created

Pod in ImagePullBackOff state

This is usually caused by the configuration error of the image name, which makes the image unable to be pulled. Use docker pull < Image > to verify whether the image can be pulled normally.

Pod is in Terminating or Unknown state

Kubernetes does not delete the running Pod on the Node because it is lost, but marks it as Terminating or Unknown

You want to delete these statuses Pod There are three methods: 1. Delete the from the cluster Node. When using the public cloud, kube-controller-manager Will be VM Automatically delete the corresponding after deletion Node. In the cluster of physical machine deployment, the administrator needs to delete it manually Node(as kubectl delete node <node-name>. 2,Node Return to normal. Kubelet I'll talk to you again kube-apiserver Communications confirm these Pod And then decide to delete or continue running these Pod. 3,User forced deletion. User can execute kubectl delete pods <pod> --grace-period=0 --force Force delete Pod. Unless you know Pod It is indeed in a stopped state (e.g Node where VM Or the physical machine has been shut down), otherwise this method is not recommended. especially StatefulSet Managed Pod,Forced deletion can easily lead to brain crack or data loss. 4 Terminating Stateful Pod stay Kubelet Generally, it will be deleted automatically after normal operation is restored. However, sometimes it cannot be deleted, and kubectl delete pods <pod> --grace-period=0 --force Cannot force deletion. At this time, it is generally due to finalizers Caused by kubectl edit take finalizers Delete to solve the problem.

Pod is always in CrashLoopBackOff state

The CrashLoopBackOff status indicates that the container was started but exited abnormally. At this time, the Restart times of Pod is usually greater than 0. You can check the container log first

The possible methods are: container process exit, health check failure exit, etc kubectl get pod <pod-name> -o yamlkubectl describe pod <pod-name>kubectl logs <pod-name> kubectl exec -it <pod-name> bash  #Go into the container and check kubectl get pod <pod-name> -o wide #Check the node on which the pod runs, and check the node system log

Pod is in Error state

The Error status indicates that an Error occurred during Pod startup

Possible causes:
1. The dependent ConfigMap, Secret or PV does not exist
2. The requested resource exceeds the limit set by the administrator, such as exceeding the LimitRange
3. The container does not have permission to operate resources in the cluster. For example, after RBAC is enabled, role binding needs to be configured for ServiceAccount

The cluster is in NotReady state

kubectl get nodes kubectl describe node kubectl logs -n kube-system journalctl -l -u kubelet

Node is in NotReady state, mostly due to PLEG (Pod Lifecycle Event Generator) problems
The community issue is still unresolved

Common problems and repair methods are:

  1. Kubelet does not start or hangs abnormally: restart kubelet
  2. CNI network plug-in not deployed: deploy CNI plug-in
  3. Docker: restart docker
  4. Insufficient disk space: clean up disk space, such as images, files, etc

**Cluster troubleshooting**

Troubleshooting cluster status exceptions usually starts from Node and Kubernetes Service status departure
 Common reasons are: 
 Virtual machine or physical machine down
 Kubernetes The service did not start normally
 Operation error(Configuration error, etc)

kube-apiserver Unable to start:
 It will make the cluster inaccessible and the existing pod normal operation
  
etcd Cluster exception:
 apiserver Unable to read and write cluster status normally, kubelet Unable to update status periodically

kube-controller-manager/kube-scheduler Exception:
 The controller does not work, resulting in deployment,service And other exceptions, newly created pod Unable to schedule

Node The machine cannot be started or kubelet Unable to start:
 node Upper pod Not working properly
 Already running pod Unable to terminate normally

Topics: Operation & Maintenance Kubernetes