When dynamically allocating pv in ceph # rbd storage, I encountered the problem that pod cannot be deleted twice (the specific reason is not understood yet). All pods display the Terminating status. Use the following command to forcibly delete:
kubectl delete pods <pod> --grace-period=0 --force
However, after the pod is deleted, the pvc dynamically mounted by the pod will also delete exceptions. I also use the forced manual deletion method to delete pvc and pv.
After such a flip operation, it is found that the rbd of ceph will not be released. Use ceph df to check that the occupied space remains unchanged.
Using RBD LS < poolName >, you can see that the dynamically generated images in the pool are still there. Next, you can double them and forcibly delete them. Anyway, you have done it recklessly, which is not bad. ceph rm <poolname>/<imagesname>
If you cannot delete it manually, you will report check_ image_ Monitors: image has monitors - not removing error. I checked the relevant documents. This is because the storage volume has not been released. The solution is still reckless.
1,inspect watcher image of client rbd status ceph-block/csi-vol-0068f225-14f7-11eb-ac08-2a0aff2a8247 Watchers: watcher=10.244.2.0:0/1036319188 client.974190 cookie=18446462598732840961 2,hold watcher ip Add to blacklist ceph osd blacklist add 10.244.2.0:0/1036319188 blacklisting 10.244.2.0:0/1036319188 until 2020-10-31T08:47:43.513987+0000 (3600 sec) 3,Delete again image rbd rm ceph-block/csi-vol-0068f225-14f7-11eb-ac08-2a0aff2a8247 Removing image: 100% complete...done. 4,Put what you just joined ip Exit blacklist ceph osd blacklist rm 10.244.2.0:0/1036319188 un-blacklisting 10.244.2.0:0/1036319188 #View blacklist list > ceph osd blacklist ls
At this time, the space occupied by using ceph df to view has returned to normal. How to dynamically deploy a pod using pvc now? You will find that the storage volume cannot be used and the deployed pod cannot operate normally. Through the description view, you will find that the words systemfiles whose storage volume is read-only are displayed and cannot be used. View the node node deployed by the pod by adding the - o wide parameter. Log on to the node to view it. Almost all disk related commands will report an Input/output error.
use lsblk see NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/2074 loop1 7:1 0 55.4M 1 loop /snap/core18/2128 loop2 7:2 0 67.6M 1 loop /snap/lxd/20326 loop3 7:3 0 70.3M 1 loop /snap/lxd/21029 loop5 7:5 0 32.3M 1 loop /snap/snapd/12704 loop6 7:6 0 32.3M 1 loop /snap/snapd/12883 sda 8:0 0 1.1T 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 1.1T 0 part └─ubuntu--vg-ubuntu--lv 253:0 0 200G 0 lvm / sr0 11:0 1 1024M 0 rom rbd0 252:0 0 10G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd1 252:16 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd2 252:32 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd3 252:48 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd4 252:64 0 1G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd5 252:80 0 5G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd6 252:96 0 1G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd7 252:112 0 5G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd8 252:128 0 5G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd9 252:144 0 1G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount rbd10 252:160 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount Will find the original rbd The directory of block device mounting still exists and has not been uninstalled. This should be the reason for the error reported just now.
Next, uninstall the mount directory should be ok. However, it should be noted that some of the directories mounted here may be valid and in use. Don't uninstall them. It can be identified in the following ways.
//kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-6fc729c1-6f50-4afd-ab8e-b5acddbf64fc 8Gi RWO Delete Bound gitlab/gitlab-prometheus-server ceph-gitlab-rbd 10d pvc-74015690-054d-48be-a8c0-af8a895750e7 10Gi RWO Delete Bound gitlab/gitlab-minio ceph-gitlab-rbd 10d pvc-7545aadc-829d-4a5f-ab81-736c6fc9ac7b 8Gi RWO Delete Bound gitlab/data-gitlab-postgresql-0 ceph-gitlab-rbd 10d pvc-c9fd941e-d99d-437b-963e-6e7a1cb20050 8Gi RWO Delete Bound gitlab/redis-data-gitlab-redis-master-0 ceph-gitlab-rbd 10d pvc-f288093c-d60a-4f49-8d10-c112b927dcf4 8Gi RWO Delete Bound jenkins/jenkins ceph-rbd 10d
Those not in this list should be uninstallable.
1,View the specific mount directory in the following way: mount |grep rbd9 //The number here is the corresponding number of the block device viewed earlier /dev/rbd9 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7abd663f-06da-11ec-bfb1-da58ba56442c type ext4 (rw,relatime,stripe=16) 2,uninstall sudo umount /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7abd663f-06da-11ec-bfb1-da58ba56442c 3,use lsblk see NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/2074 loop1 7:1 0 55.4M 1 loop /snap/core18/2128 loop2 7:2 0 67.6M 1 loop /snap/lxd/20326 loop3 7:3 0 70.3M 1 loop /snap/lxd/21029 loop5 7:5 0 32.3M 1 loop /snap/snapd/12704 loop6 7:6 0 32.3M 1 loop /snap/snapd/12883 sda 8:0 0 1.1T 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 1.1T 0 part └─ubuntu--vg-ubuntu--lv 253:0 0 200G 0 lvm / sr0 11:0 1 1024M 0 rom rbd0 252:0 0 10G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-294bd568-00d2-11ec-8d41-0e03797b96fa rbd1 252:16 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2a5ba93a-00d2-11ec-8d41-0e03797b96fa rbd2 252:32 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2942b322-00d2-11ec-8d41-0e03797b96fa rbd3 252:48 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2a6b2a58-00d2-11ec-8d41-0e03797b96fa rbd4 252:64 0 1G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79993372-06da-11ec-bfb1-da58ba56442c rbd5 252:80 0 5G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79a09b13-06da-11ec-bfb1-da58ba56442c rbd6 252:96 0 1G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79dbb137-06da-11ec-bfb1-da58ba56442c rbd7 252:112 0 5G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7afaaf63-06da-11ec-bfb1-da58ba56442c rbd8 252:128 0 5G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79a017aa-06da-11ec-bfb1-da58ba56442c rbd9 252:144 0 1G 0 disk rbd10 252:160 0 8G 0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-63c3c453-00fc-11ec-8d41-0e03797b96fa 4,Unmapping relationship sudo rbd unmap /dev/rbd9 Then use it lsblk see, rbd The corresponding block device is released normally.
It's normal to deploy pod now.
Looking back and thinking carefully, this exception should be caused by improper operation steps. If the order is reversed and the corresponding block devices are unloaded on the node first, there should be no exception when deleting the image in ceph.