pod troubleshooting of k8s operation and maintenance
K8S is an open source application for managing container applications on multiple hosts in the cloud platform. Kubernetes aims to make deploying container applications simple and efficient
K8S Core advantages of: 1,be based on yaml The file realizes the automatic creation and deletion of containers 2,More quickly realize the elastic horizontal expansion of business 3,Dynamically discover newly expanded containers and automatically provide access to users 4,Simpler and faster business code upgrade and rollback
Generally speaking, if the pod is in an abnormal state, you can execute the following commands to view the pod status
kubectl get pod <pod-name> -o yaml #View pod configuration kubcctl get pod <pod-name> -o wide #View information such as pod operation node kubectl describe pod <pod-name> #View pod events kubectl logs <pod-name> #View pod log
Introduction to Pod
1. pod yes k8s Minimum cell in 2. One pod You can run one container or multiple containers in 3. When running multiple containers, these containers are scheduled together 4. Pod Its life cycle is short and will not heal itself. It is an entity that is destroyed when it is used up 5. Usually we pass Controller To create and manage pod of Pod Life cycle: initialize container, operate before startup, ready probe, live probe, delete pod operation
Pod creation process in K8S
1,Client to apiserver Initiate a create pod request 2,apiserver Received pod After the request is created, a message containing the creation information is generated yaml 3,apiserver take yaml Information writing etcd database 4,according to scheduler Scheduler is pod distribution node host 5,node kubelet New detected Pod Dispatch, come here, pass container runtime Run this pod 6,kubelet get pod Status and update to apiserver in
Pod is always Pending
pending indicates that the pod has not been scheduled to a Node
You can view it with the following command kubectl describe pod <pod-name>
Possible causes:
1,Insufficient resources, all in the cluster Node Are not satisfied with this Pod Requested CPU,Resources such as memory or temporary storage space. The solution is to reduce resource utilization and delete unused resources Pod Or add a new one Node node`` kubectl describe node #You can view node resource condition 2. The HostPort port has been occupied. It is generally recommended to use Service open Service port 3, which does not meet the nodeSelector. If the Pod contains nodeSelector and specifies the labels to be included in the node, the scheduler will only consider scheduling the Pod to the nodes containing these labels, If no node has these labels or other conditions of the node with these labels are not met, it will not be able to schedule 4. If the affinity nodeAffinity: node affinity is not met, it can be regarded as an enhanced nodeSelector to restrict pods from being scheduled to only a certain part of nodes. Node podAffinity: Pod affinity is used to schedule some associated pods to the same place, It can refer to the same node or nodes in the same availability zone. Podanti affinity: Pod anti affinity is used to avoid scheduling a certain type of Pod to the same place and avoid single point of failure. For example, all Pod copies of cluster DNS services are scheduled to different nodes to avoid that one node hangs, resulting in the failure of DNS resolution of the whole cluster and business interruption
Pod is always in Waiting or ContainerCreating state
First, view it through the following command:
kubectl describe pod
Possible causes:
1. Image pull fails, such as image configuration error, Kubelet cannot access the image, private image key configuration error, image too large, pull timeout, etc. 2. CNI network error. Generally, it is necessary to check the configuration of CNI network plug-in, such as unable to configure Pod, unable to assign IP address. 3. The container cannot be started, You need to check whether the correct image is packaged or whether the correct container parameter 4 is configured. The storage volume that the container depends on cannot be created
Pod in ImagePullBackOff state
This is usually caused by the configuration error of the image name, which makes the image unable to be pulled. Use docker pull < Image > to verify whether the image can be pulled normally.
Pod is in Terminating or Unknown state
Kubernetes does not delete the running Pod on the Node because it is lost, but marks it as Terminating or Unknown
You want to delete these statuses Pod There are three methods: 1. Delete the from the cluster Node. When using the public cloud, kube-controller-manager Will be VM Automatically delete the corresponding after deletion Node. In the cluster of physical machine deployment, the administrator needs to delete it manually Node(as kubectl delete node <node-name>. 2,Node Return to normal. Kubelet I'll talk to you again kube-apiserver Communications confirm these Pod And then decide to delete or continue running these Pod. 3,User forced deletion. User can execute kubectl delete pods <pod> --grace-period=0 --force Force delete Pod. Unless you know Pod It is indeed in a stopped state (e.g Node where VM Or the physical machine has been shut down), otherwise this method is not recommended. especially StatefulSet Managed Pod,Forced deletion can easily lead to brain crack or data loss. 4 Terminating Stateful Pod stay Kubelet Generally, it will be deleted automatically after normal operation is restored. However, sometimes it cannot be deleted, and kubectl delete pods <pod> --grace-period=0 --force Cannot force deletion. At this time, it is generally due to finalizers Caused by kubectl edit take finalizers Delete to solve the problem.
Pod is always in CrashLoopBackOff state
The CrashLoopBackOff status indicates that the container was started but exited abnormally. At this time, the Restart times of Pod is usually greater than 0. You can check the container log first
The possible methods are: container process exit, health check failure exit, etc kubectl get pod <pod-name> -o yamlkubectl describe pod <pod-name>kubectl logs <pod-name> kubectl exec -it <pod-name> bash #Go into the container and check kubectl get pod <pod-name> -o wide #Check the node on which the pod runs, and check the node system log
Pod is in Error state
The Error status indicates that an Error occurred during Pod startup
Possible causes:
1. The dependent ConfigMap, Secret or PV does not exist
2. The requested resource exceeds the limit set by the administrator, such as exceeding the LimitRange
3. The container does not have permission to operate resources in the cluster. For example, after RBAC is enabled, role binding needs to be configured for ServiceAccount
The cluster is in NotReady state
kubectl get nodes kubectl describe node kubectl logs -n kube-system journalctl -l -u kubelet
Node is in NotReady state, mostly due to PLEG (Pod Lifecycle Event Generator) problems
The community issue is still unresolved
Common problems and repair methods are:
- Kubelet does not start or hangs abnormally: restart kubelet
- CNI network plug-in not deployed: deploy CNI plug-in
- Docker: restart docker
- Insufficient disk space: clean up disk space, such as images, files, etc
**Cluster troubleshooting**
Troubleshooting cluster status exceptions usually starts from Node and Kubernetes Service status departure Common reasons are: Virtual machine or physical machine down Kubernetes The service did not start normally Operation error(Configuration error, etc) kube-apiserver Unable to start: It will make the cluster inaccessible and the existing pod normal operation etcd Cluster exception: apiserver Unable to read and write cluster status normally, kubelet Unable to update status periodically kube-controller-manager/kube-scheduler Exception: The controller does not work, resulting in deployment,service And other exceptions, newly created pod Unable to schedule Node The machine cannot be started or kubelet Unable to start: node Upper pod Not working properly Already running pod Unable to terminate normally