k8s learning notes -- about ceph storage volume unmount exception

Posted by tmaiden on Sat, 18 Dec 2021 16:46:42 +0100

When dynamically allocating pv in ceph # rbd storage, I encountered the problem that pod cannot be deleted twice (the specific reason is not understood yet). All pods display the Terminating status. Use the following command to forcibly delete:

kubectl delete pods <pod> --grace-period=0 --force

However, after the pod is deleted, the pvc dynamically mounted by the pod will also delete exceptions. I also use the forced manual deletion method to delete pvc and pv.

After such a flip operation, it is found that the rbd of ceph will not be released. Use ceph df to check that the occupied space remains unchanged.

Using RBD LS < poolName >, you can see that the dynamically generated images in the pool are still there. Next, you can double them and forcibly delete them. Anyway, you have done it recklessly, which is not bad. ceph rm <poolname>/<imagesname>

If you cannot delete it manually, you will report check_ image_ Monitors: image has monitors - not removing error. I checked the relevant documents. This is because the storage volume has not been released. The solution is still reckless.

1,inspect watcher image of client
rbd status ceph-block/csi-vol-0068f225-14f7-11eb-ac08-2a0aff2a8247
        watcher= client.974190 cookie=18446462598732840961

2,hold watcher ip Add to blacklist
ceph osd blacklist add
blacklisting until 2020-10-31T08:47:43.513987+0000 (3600 sec)

3,Delete again image
rbd rm ceph-block/csi-vol-0068f225-14f7-11eb-ac08-2a0aff2a8247
Removing image: 100% complete...done.

4,Put what you just joined ip Exit blacklist
ceph osd blacklist rm
#View blacklist list
> ceph osd blacklist ls

At this time, the space occupied by using ceph df to view has returned to normal. How to dynamically deploy a pod using pvc now? You will find that the storage volume cannot be used and the deployed pod cannot operate normally. Through the description view, you will find that the words systemfiles whose storage volume is read-only are displayed and cannot be used. View the node node deployed by the pod by adding the - o wide parameter. Log on to the node to view it. Almost all disk related commands will report an Input/output error.

use lsblk see
loop0    7:0    0 55.5M  1 loop /snap/core18/2074
loop1    7:1    0 55.4M  1 loop /snap/core18/2128
loop2    7:2    0 67.6M  1 loop /snap/lxd/20326
loop3    7:3    0 70.3M  1 loop /snap/lxd/21029
loop5    7:5    0 32.3M  1 loop /snap/snapd/12704
loop6    7:6    0 32.3M  1 loop /snap/snapd/12883
sda      8:0    0  1.1T  0 disk
├─sda1   8:1    0    1M  0 part
├─sda2   8:2    0    1G  0 part /boot
└─sda3   8:3    0  1.1T  0 part
       253:0    0  200G  0 lvm  /
sr0     11:0    1 1024M  0 rom
rbd0   252:0    0   10G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd1   252:16   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd2   252:32   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd3   252:48   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd4   252:64   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd5   252:80   0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd6   252:96   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd7   252:112  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd8   252:128  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd9   252:144  0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd10  252:160  0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount

Will find the original rbd The directory of block device mounting still exists and has not been uninstalled. This should be the reason for the error reported just now.

Next, uninstall the mount directory should be ok. However, it should be noted that some of the directories mounted here may be valid and in use. Don't uninstall them. It can be identified in the following ways.

//kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                     STORAGECLASS      REASON   AGE
pvc-6fc729c1-6f50-4afd-ab8e-b5acddbf64fc   8Gi        RWO            Delete           Bound    gitlab/gitlab-prometheus-server           ceph-gitlab-rbd            10d
pvc-74015690-054d-48be-a8c0-af8a895750e7   10Gi       RWO            Delete           Bound    gitlab/gitlab-minio                       ceph-gitlab-rbd            10d
pvc-7545aadc-829d-4a5f-ab81-736c6fc9ac7b   8Gi        RWO            Delete           Bound    gitlab/data-gitlab-postgresql-0           ceph-gitlab-rbd            10d
pvc-c9fd941e-d99d-437b-963e-6e7a1cb20050   8Gi        RWO            Delete           Bound    gitlab/redis-data-gitlab-redis-master-0   ceph-gitlab-rbd            10d
pvc-f288093c-d60a-4f49-8d10-c112b927dcf4   8Gi        RWO            Delete           Bound    jenkins/jenkins                           ceph-rbd                   10d

Those not in this list should be uninstallable.

1,View the specific mount directory in the following way:
mount |grep rbd9  //The number here is the corresponding number of the block device viewed earlier
/dev/rbd9 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7abd663f-06da-11ec-bfb1-da58ba56442c type ext4 (rw,relatime,stripe=16)


sudo umount /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7abd663f-06da-11ec-bfb1-da58ba56442c

3,use lsblk see
loop0                       7:0    0 55.5M  1 loop /snap/core18/2074
loop1                       7:1    0 55.4M  1 loop /snap/core18/2128
loop2                       7:2    0 67.6M  1 loop /snap/lxd/20326
loop3                       7:3    0 70.3M  1 loop /snap/lxd/21029
loop5                       7:5    0 32.3M  1 loop /snap/snapd/12704
loop6                       7:6    0 32.3M  1 loop /snap/snapd/12883
sda                         8:0    0  1.1T  0 disk
├─sda1                      8:1    0    1M  0 part
├─sda2                      8:2    0    1G  0 part /boot
└─sda3                      8:3    0  1.1T  0 part
  └─ubuntu--vg-ubuntu--lv 253:0    0  200G  0 lvm  /
sr0                        11:0    1 1024M  0 rom
rbd0                      252:0    0   10G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-294bd568-00d2-11ec-8d41-0e03797b96fa
rbd1                      252:16   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2a5ba93a-00d2-11ec-8d41-0e03797b96fa
rbd2                      252:32   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2942b322-00d2-11ec-8d41-0e03797b96fa
rbd3                      252:48   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2a6b2a58-00d2-11ec-8d41-0e03797b96fa
rbd4                      252:64   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79993372-06da-11ec-bfb1-da58ba56442c
rbd5                      252:80   0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79a09b13-06da-11ec-bfb1-da58ba56442c
rbd6                      252:96   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79dbb137-06da-11ec-bfb1-da58ba56442c
rbd7                      252:112  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7afaaf63-06da-11ec-bfb1-da58ba56442c
rbd8                      252:128  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79a017aa-06da-11ec-bfb1-da58ba56442c
rbd9                      252:144  0    1G  0 disk
rbd10                     252:160  0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-63c3c453-00fc-11ec-8d41-0e03797b96fa

4,Unmapping relationship
sudo rbd unmap /dev/rbd9

Then use it lsblk see, rbd The corresponding block device is released normally.

It's normal to deploy pod now.

Looking back and thinking carefully, this exception should be caused by improper operation steps. If the order is reversed and the corresponding block devices are unloaded on the node first, there should be no exception when deleting the image in ceph.

Topics: Ceph Ubuntu Cloud Native