In order to deploy stateful services, we need to provide k8s with a set of persistent storage scheme. We use ceph as the underlying storage. Generally, there are two types of k8s docking ceph:
- Deploy and connect ceph through rook, and use k8s to provide ceph services. The official document of rook is very detailed, and there is also the fix version of common problems. I used it very smoothly all the way. I won't repeat it here. The documents are as follows:
https://rook.io/docs/rook/v1.3/ceph-quickstart.html
https://rook.io/docs/rook/v1.3/ceph-toolbox.html
https://rook.io/docs/rook/v1.3/ceph-cluster-crd.html#storage-selection-settings
https://rook.io/docs/rook/v1.3/ceph-block.html - k8s interfaces with external ceph services
This paper mainly records the scheme and problems of k8s cluster connecting with external ceph cluster. During this period, I still encountered many problems.
Environmental preparation
The k8s and ceph environments we use are shown in:
https://blog.51cto.com/leejia/2495558
https://blog.51cto.com/leejia/2499684
Static persistent volume
Each time a storage space needs to be used, the storage administrator needs to manually create the corresponding image on the storage before k8s it can be used.
Create ceph secret
You need to add a secret to k8s to access ceph, which is mainly used for k8s to map rbd.
1. On the ceph master node, execute the following command to obtain the base64 encoded key of admin (the production environment can create a special user for k8s):
# ceph auth get-key client.admin | base64 QVFCd3BOQmVNMCs5RXhBQWx3aVc3blpXTmh2ZjBFMUtQSHUxbWc9PQ==
2. Create a secret in k8s the manifest
# vim ceph-secret.yaml apiVersion: v1 kind: Secret metadata: name: ceph-secret data: key: QVFCd3BOQmVNMCs5RXhBQWx3aVc3blpXTmh2ZjBFMUtQSHUxbWc9PQ== # kubectl apply -f ceph-secret.yaml
Create image
By default, the default pool used after ceph creation is rdb. Use the following command to create an image on the client where ceph is installed or directly on the ceph master node:
# rbd create image1 -s 1024 # rbd info rbd/image1 rbd image 'image1': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.374d6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags:
Create persistent volume
Create on k8s via manifest:
# vim pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: ceph-pv spec: capacity: storage: 1Gi accessModes: - ReadWriteOnce - ReadOnlyMany rbd: monitors: - 172.18.2.172:6789 - 172.18.2.178:6789 - 172.18.2.189:6789 pool: rbd image: image1 user: admin secretRef: name: ceph-secret fsType: ext4 persistentVolumeReclaimPolicy: Retain # kubectl apply -f pv.yaml persistentvolume/ceph-pv created # kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE ceph-pv 1Gi RWO,ROX Retain Available 76s
The main instructions are as follows:
1,accessModes:
RWO: ReadWriteOnce,Only a single node can be mounted for reading and writing; ROX: ReadOnlyMany,Allow multiple nodes to be mounted and read-only; RWX: ReadWriteMany,Allow multiple nodes to mount for reading and writing;
2,fsType
If PersistentVolumes of VolumeMode by Filesystem,Then this field specifies the file system that should be used when mounting the volume. If the volume has not been formatted and formatting is supported, this value is used to format the volume.
3,persistentVolumeReclaimPolicy:
There are three recycling strategies: Delete: For dynamically configured PersistentVolumes For example, the default recycling policy is“ Delete". This means that when the user deletes the corresponding PersistentVolumeClaim Dynamically configured volume Will be automatically deleted. Retain: If volume Suitable for use when important data is included“ Retain"Strategy. Use“ Retain" If the user deletes PersistentVolumeClaim,Corresponding PersistentVolume Will not be deleted. Instead, it will become Released Status, indicating that all data can be recovered manually. Recycle: If the user deletes PersistentVolumeClaim,The data on the volume is deleted and the volume is not deleted.
Create persistent volume declaration
Create on k8s via manifest:
# vim pvc.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ceph-claim spec: accessModes: - ReadWriteOnce - ReadOnlyMany resources: requests: storage: 1Gi # kubectl apply -f pvc.yaml
After the claim is created, k8s will match the most appropriate pv and bind it to the claim. The capacity of the persistent volume must meet the requirements of the claim + the mode of the volume must include the access mode specified in the claim. Therefore, the above pvc will be bound to the pv we just created.
To view the binding of pvc:
# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-claim Bound ceph-pv 1Gi RWO,ROX 13m
pod uses persistent volumes
Create on k8s via manifest:
vim cat ubuntu.yaml apiVersion: v1 kind: Pod metadata: name: ceph-pod spec: containers: - name: ceph-ubuntu image: phusion/baseimage command: ["sh", "/sbin/my_init"] volumeMounts: - name: ceph-mnt mountPath: /mnt readOnly: false volumes: - name: ceph-mnt persistentVolumeClaim: claimName: ceph-claim # kubectl apply -f ubuntu.yaml pod/ceph-pod created
Check the status of the pod and find that one is in the ContainerCreating stage, and then find an error through the describe log:
# kubectl get pods NAME READY STATUS RESTARTS AGE ceph-pod 0/1 ContainerCreating 0 75s # kubectl describe pods ceph-pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 48m (x6 over 75m) kubelet, work3 Unable to attach or mount volumes: unmounted volumes=[ceph-mnt], unattached volumes=[default-token-tlsjd ceph-mnt]: timed out waiting for the condition Warning FailedMount 8m59s (x45 over 84m) kubelet, work3 MountVolume.WaitForAttach failed for volume "ceph-pv" : fail to check rbd image status with: (executable file not found in $PATH), rbd output: () Warning FailedMount 3m13s (x23 over 82m) kubelet, work3 Unable to attach or mount volumes: unmounted volumes=[ceph-mnt], unattached volumes=[ceph-mnt default-token-tlsjd]: timed out waiting for the condition
This problem occurs because k8s relies on kubelet to implement attach (rbd map) and detach (rbd unmap) RBD image operations, while kubelet runs on each k8s node. Therefore, each k8s node must install the CEPH common package to provide kubelet with rbd commands. After installing the ceph repo of Alibaba cloud on each machine, new errors are found:
# kubectl describe pods ceph-pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- MountVolume.WaitForAttach failed for volume "ceph-pv" : rbd: map failed exit status 6, rbd output: 2020-06-02 17:12:18.575338 7f0171c3ed80 -1 did not load config file, using default settings. 2020-06-02 17:12:18.603861 7f0171c3ed80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory rbd: sysfs write failed 2020-06-02 17:12:18.620447 7f0171c3ed80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable". In some cases useful info is found in syslog - try "dmesg | tail" or so. rbd: map failed: (6) No such device or address Warning FailedMount 15s kubelet, work3 MountVolume.WaitForAttach failed for volume "ceph-pv" : rbd: map failed exit status 6, rbd output: 2020-06-02 17:12:19.257006 7fc330e14d80 -1 did not load config file, using default settings.
We can only continue to check the data to find the cause, and found that there are two problems to be solved:
1) , it is found that the kernel version of k8s cluster is different from that of ceph cluster. The kernel version of k8s cluster is lower. Some feature s stored in rdb blocks are not supported by the lower version kernel and need to be disabled. disable with the following command:
# rbd info rbd/image1 rbd image 'image1': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.374d6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: # rbd feature disable rbd/image1 exclusive-lock object-map fast-diff deep-flatten
2) , the error that the key cannot be found is because the k8s node needs to interact with ceph to map the image to the local machine. ceph must be placed in the / etc/ceph directory of each k8s node client. admin. Keyring file is used for authentication during mapping. Therefore, the / etc/ceph directory is created for each node, and a script is written to place the key file.
# scp /etc/ceph/ceph.client.admin.keyring root@k8s-node:/etc/ceph
Check the pod status and finally run:
# kubectl get pods NAME READY STATUS RESTARTS AGE ceph-pod 1/1 Running 0 29s
Enter the ubuntu system to view the mount items and find that the image has been mounted and formatted:
# kubectl exec ceph-pod -it sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. # df -hT Filesystem Type Size Used Avail Use% Mounted on overlay overlay 50G 3.6G 47G 8% / tmpfs tmpfs 64M 0 64M 0% /dev tmpfs tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup /dev/rbd0 ext4 976M 2.6M 958M 1% /mnt /dev/mapper/centos-root xfs 50G 3.6G 47G 8% /etc/hosts shm tmpfs 64M 0 64M 0% /dev/shm tmpfs tmpfs 2.9G 12K 2.9G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs tmpfs 2.9G 0 2.9G 0% /proc/acpi tmpfs tmpfs 2.9G 0 2.9G 0% /proc/scsi tmpfs tmpfs 2.9G 0 2.9G 0% /sys/firmware
On CEPH pod, the node on which the pod runs, view the rbd mounting through the df command:
# df -hT|grep rbd /dev/rbd0 ext4 976M 2.6M 958M 1% /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/rbd-image-image2
Dynamic persistent volume
Without the intervention of the storage administrator, the k8s used storage image s can be automatically created, that is, the storage space can be dynamically applied for and automatically created according to the use needs. One or more storageclasses need to be defined first. Each StorageClass must be configured with a provisioner to decide which volume plug-in to allocate PV. Then, the StorageClass resource specifies which provider is used to create the persistent volume in the corresponding storage when the persistent volume declaration requests StorageClass.
k8s officially provides supported volume plug-ins: https://kubernetes.io/zh/docs/concepts/storage/storage-classes/
Create an ordinary user to map the rdb to k8s
Create a k8s dedicated pool and user in the ceph cluster:
# ceph osd pool create kube 8192 # ceph auth get-or-create client.kube mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube' -o ceph.client.kube.keyring
Create kube user's secret in k8s cluster:
# ceph auth get-key client.kube|base64 QVFBS090WmVDcUxvSHhBQWZma1YxWUNnVzhuRTZUcjNvYS9yclE9PQ== # vim ceph-kube-secret.yaml apiVersion: v1 kind: Secret metadata: name: ceph-kube-secret namespace: default data: key: QVFBS090WmVDcUxvSHhBQWZma1YxWUNnVzhuRTZUcjNvYS9yclE9PQ== type: kubernetes.io/rbd # kubectl create -f ceph-kube-secret.yaml # kubectl get secret NAME TYPE DATA AGE ceph-kube-secret kubernetes.io/rbd 1 68s
Create a StorageClass or use a StorageClass that has already been created
# vim sc.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd annotations: storageclass.beta.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/rbd parameters: monitors: 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: default pool: kube userId: kube userSecretName: ceph-kube-secret userSecretNamespace: default fsType: ext4 imageFormat: "2" imageFeatures: "layering" # kubectl apply -f sc.yaml # kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE ceph-rbd (default) kubernetes.io/rbd Delete Immediate false 6s
The main instructions are as follows:
1,storageclass.beta.kubernetes.io/is-default-class
If set to true, it is the default storageclasses. pvc applies for storage. If no storageclass is specified, it applies from the default storageclass.
2. adminId: ceph client ID, which is used to create an image in ceph pool. The default is "admin".
3. userId: ceph client ID, which is used to map rbd images. The default is the same as adminId.
4. imageFormat: ceph rbd image format, "1" or "2". The default value is "1".
5. imageFeatures: this parameter is optional and can only be used when you set imageFormat to "2". Currently, only layering is supported. The default is "? And no function is turned on.
Create persistent volume declaration
Since we have specified the default storageclass, we can directly create pvc. The provider creation will be triggered only when the creation is in pending status:
# vim pvc.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ceph-sc-claim spec: accessModes: - ReadWriteOnce - ReadOnlyMany resources: requests: storage: 500Mi # kubectl apply -f pvc.yaml persistentvolumeclaim/ceph-sc-claim created # kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-sc-claim Pending ceph-rbd 50s
After creating pvc, we found that pvc was not successfully bound to pv and was always in pending status. Then we checked the error message of pvc and found the following problems:
# kubectl describe pvc ceph-sc-claim Name: ceph-sc-claim Namespace: default StorageClass: ceph-rbd Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Mounted By: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 5s (x7 over 103s) persistentvolume-controller Failed to provision volume with StorageClass "ceph-rbd": failed to get admin secret from ["default"/"ceph-secret"]: failed to get secret from ["default"/"ceph-secret"]: Cannot get secret of type kubernetes.io/rbd
By reporting an error, we know that the controller of k8s failed to obtain the admin secret of ceph. Because the cepe secret we created is under the default namespace and the controller is under the Kube system, we do not have permission to obtain it. Therefore, we create cepe secret under the Kube system, delete pvc and storageclass resources, update the storageclass configuration, and re create storageclass and pvc resources:
# cat ceph-secret.yaml apiVersion: v1 kind: Secret metadata: name: ceph-secret namespace: kube-system data: key: QVFCd3BOQmVNMCs5RXhBQWx3aVc3blpXTmh2ZjBFMUtQSHUxbWc9PQ== type: kubernetes.io/rbd # kubectl apply -f ceph-secret.yaml # kubectl get secret ceph-secret -n kube-system NAME TYPE DATA AGE ceph-secret kubernetes.io/rbd 1 19m # kubectl delete pvc ceph-sc-claim # kubectl delete sc ceph-rbd # vim sc.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd annotations: storageclass.beta.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/rbd parameters: monitors: 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-kube-secret userSecretNamespace: default fsType: ext4 imageFormat: "2" imageFeatures: "layering" # kubectl apply -f sc.yaml # kubectl apply -f pvc.yaml # kubectl describe pvc ceph-sc-claim Name: ceph-sc-claim Namespace: default StorageClass: ceph-rbd Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Mounted By: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 33s (x59 over 116m) persistentvolume-controller Failed to provision volume with StorageClass "ceph-rbd": failed to create rbd image: executable file not found in $PATH, command output:
It is found that binding pv still fails. Continue to find the problem. We have installed CEPH common on every node of the k8s cluster. Why can't we find the rbd command. Through query and analysis, the reasons are as follows:
When k8s uses stroageclass to dynamically apply for ceph storage resources, the controller manager needs to use the rbd command to interact with the ceph cluster, while the controller manager of k8s uses the default image k8s gcr. There is no rbd client integrating ceph in io / Kube controller manager. K8s officials suggest that we use external providers to solve this problem. These independent external programs follow the specifications defined by k8s.
According to the official recommendation, we use an external RBD provider to provide services. The following operations are performed on k8s's master:
# git clone https://github.com/kubernetes-incubator/external-storage.git # cd external-storage/ceph/rbd/deploy # sed -r -i "s/namespace: [^ ]+/namespace: kube-system/g" ./rbac/clusterrolebinding.yaml ./rbac/rolebinding.yaml # kubectl -n kube-system apply -f ./rbac # kubectl describe deployments.apps -n kube-system rbd-provisioner Name: rbd-provisioner Namespace: kube-system CreationTimestamp: Wed, 03 Jun 2020 18:59:14 +0800 Labels: <none> Annotations: deployment.kubernetes.io/revision: 1 Selector: app=rbd-provisioner Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: Recreate MinReadySeconds: 0 Pod Template: Labels: app=rbd-provisioner Service Account: rbd-provisioner Containers: rbd-provisioner: Image: quay.io/external_storage/rbd-provisioner:latest Port: <none> Host Port: <none> Environment: PROVISIONER_NAME: ceph.com/rbd Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: rbd-provisioner-c968dcb4b (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 6m5s deployment-controller Scaled up replica set rbd-provisioner-c968dcb4b to 1
Modify the provisioner of storageclass to our newly added provisioner:
# vim sc.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd annotations: storageclass.beta.kubernetes.io/is-default-class: "true" provisioner: ceph.com/rbd parameters: monitors: 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-kube-secret userSecretNamespace: default fsType: ext4 imageFormat: "2" imageFeatures: "layering" # kubectl delete pvc ceph-sc-claim # kubectl delete sc ceph-rbd # kubectl apply -f sc.yaml # kubectl apply -f pvc.yaml
Wait for the provisioner to allocate storage and bind pv to pvc, about 3 minutes. Finally binding succeeded:
# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-sc-claim Bound pvc-0b92a433-adb0-46d9-a0c8-5fbef28eff5f 2Gi RWO ceph-rbd 7m49s
pod uses persistent volumes
Create a pod and view the mount status:
# vim ubuntu.yaml apiVersion: v1 kind: Pod metadata: name: ceph-sc-pod spec: containers: - name: ceph-sc-ubuntu image: phusion/baseimage command: ["/sbin/my_init"] volumeMounts: - name: ceph-sc-mnt mountPath: /mnt readOnly: false volumes: - name: ceph-sc-mnt persistentVolumeClaim: claimName: ceph-sc-claim # kubectl apply -f ubuntu.yaml # kubectl get pods NAME READY STATUS RESTARTS AGE ceph-sc-pod 1/1 Running 0 24s # kubectl exec ceph-sc-pod -it sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. # df -h Filesystem Size Used Avail Use% Mounted on overlay 50G 3.8G 47G 8% / tmpfs 64M 0 64M 0% /dev tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup /dev/rbd0 2.0G 6.0M 1.9G 1% /mnt /dev/mapper/centos-root 50G 3.8G 47G 8% /etc/hosts shm 64M 0 64M 0% /dev/shm tmpfs 2.9G 12K 2.9G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 2.9G 0 2.9G 0% /proc/acpi tmpfs 2.9G 0 2.9G 0% /proc/scsi tmpfs 2.9G 0 2.9G 0% /sys/firmware
After so many twists and turns, we finally successfully connected to the external ceph
summary
1. K8s relies on kubelet to implement the operation of attach (rbd map) and detach (rbd unmap) RBD image, and kubelet runs on the node of each k8s. Therefore, each k8s node should install CEPH common package to provide rbd commands to kubelet.
2. When k8s uses stroageclass to dynamically create ceph storage resources, the controller manager needs to use the rbd command to interact with the ceph cluster, while the controller manager of k8s uses the default image k8s gcr. There is no rbd client integrating ceph in io / Kube controller manager. K8s officials suggest that we use external providers to solve this problem. These independent external programs follow the specifications defined by k8s.
reference resources
https://kubernetes.io/zh/docs/concepts/storage/storage-classes/
https://kubernetes.io/zh/docs/concepts/storage/volumes/
https://groups.google.com/forum/#!topic/kubernetes-sig-storage-bugs/4w42QZxboIA
Reprinted to https://blog.51cto.com/leejia/2501080