Ceph operation
1. Prepare the machine
- CentOS7
- The configuration of four virtual machines (4C/4G/50G/50G) is for my practice. Please use it according to the actual situation in production
2. The virtual machine is assigned as follows
host name | role | ip | NAT |
---|---|---|---|
CephAdmin | ceph-deploy+client | 192.168.3.189 | 192.168.122.189 |
ceph01 | mon+osd | 192.168.3.190 | 192.168.122.190 |
ceph02 | mon+osd | 192.168.3.191 | 192.168.122.191 |
ceph03 | mon+osd | 192.168.3.192 | 192.168.122.192 |
3. Modify yum source
Sometimes the default Yum source of CentOS is not necessarily a domestic image, resulting in the unsatisfactory speed of Yum online installation and update. At this time, it is necessary to set the yum source as a domestic image site
yum install wget -y
Alibaba cloud
cd /etc/yum.repos.d mv CentOS-Base.repo CentOS-Base.repo.bak wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo yum makecache
4. CentOS7 kernel upgrade
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm yum --disablerepo=\* --enablerepo=elrepo-kernel repolist yum --disablerepo=\* --enablerepo=elrepo-kernel install kernel-ml.x86_64 -y yum remove kernel-tools-libs.x86_64 kernel-tools.x86_64 -y yum --disablerepo=\* --enablerepo=elrepo-kernel install kernel-ml-tools.x86_64 -y awk -F \' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg grub2-editenv list grub2-set-default 0 reboot uname -r
5. Installation of epel warehouse
yum update -y && yum install epel-release -y
6. Set host name
hostnamectl set-hostname cephadmin hostnamectl set-hostname ceph01 hostnamectl set-hostname ceph02 hostnamectl set-hostname ceph03
7. Set hosts file
sudo vim /etc/hosts # The contents are as follows 192.168.3.189 cephadmin 192.168.3.190 ceph01 192.168.3.191 ceph02 192.168.3.192 ceph03
8. Create users and set password free login
-
Create user (running on all four machines)
useradd -d /home/admin -m admin echo "123456" | passwd admin --stdin echo "admin ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/admin sudo chmod 0440 /etc/sudoers.d/admin
-
Set password free login (only run on cephadmin node)
su - admin ssh-keygen ssh-copy-id admin@ceph01 ssh-copy-id admin@ceph02 ssh-copy-id admin@ceph03
9. Modify the synchronization time zone (running on all four machines)
sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime sudo yum install ntp -y sudo systemctl enable ntpd sudo systemctl start ntpd sudo ntpstat
10. Install ceph deploy and install ceph package
-
Configure ceph source
cat > /etc/yum.repos.d/ceph.repo<<'EOF' [Ceph] name=Ceph packages for $basearch baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc priority=1 [Ceph-noarch] name=Ceph noarch packages baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc priority=1 [ceph-source] name=Ceph source packages baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc priority=1 EOF
-
Install CEPH deploy (only on cephadmin)
sudo yum install ceph-deploy -y
-
Install EPEL release on all nodes
sudo yum install epel-release -y
11. Initialize mon point (only executed on cephadmin)
# Enter admin user su - admin mkdir my-cluster cd my-cluster
-
Create cluster
ceph-deploy new {initial-monitor-node(s)}
For example:
ceph-deploy new cephadmin ceph01 ceph02 ceph03
The situation is as follows:
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/admin/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy new cephadmin ceph01 ceph02 ceph03 [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] func : <function new at 0x7f8a22d452a8> [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] overwrite_conf : False [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f8a22d60ef0> [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] ssh_copykey : True [ceph_deploy.cli][INFO ] mon : ['cephadmin', 'ceph01', 'ceph02', 'ceph03'] [ceph_deploy.cli][INFO ] public_network : None [ceph_deploy.cli][INFO ] ceph_conf : None [ceph_deploy.cli][INFO ] cluster_network : None [ceph_deploy.cli][INFO ] default_release : False [ceph_deploy.cli][INFO ] fsid : None [ceph_deploy.new][DEBUG ] Creating new cluster named ceph [ceph_deploy.new][INFO ] making sure passwordless SSH succeeds [cephadmin][DEBUG ] connection detected need for sudo [cephadmin][DEBUG ] connected to host: cephadmin [cephadmin][DEBUG ] detect platform information from remote host [cephadmin][DEBUG ] detect machine type [cephadmin][DEBUG ] find the location of an executable [cephadmin][INFO ] Running command: sudo /usr/sbin/ip link show [cephadmin][INFO ] Running command: sudo /usr/sbin/ip addr show [cephadmin][DEBUG ] IP addresses found: [u'192.168.124.1', u'192.168.3.189', u'192.168.122.189'] [ceph_deploy.new][DEBUG ] Resolving host cephadmin [ceph_deploy.new][DEBUG ] Monitor cephadmin at 192.168.3.189 [ceph_deploy.new][INFO ] making sure passwordless SSH succeeds [ceph01][DEBUG ] connected to host: cephadmin [ceph01][INFO ] Running command: ssh -CT -o BatchMode=yes ceph01 [ceph01][DEBUG ] connection detected need for sudo [ceph01][DEBUG ] connected to host: ceph01 [ceph01][DEBUG ] detect platform information from remote host [ceph01][DEBUG ] detect machine type [ceph01][DEBUG ] find the location of an executable [ceph01][INFO ] Running command: sudo /usr/sbin/ip link show [ceph01][INFO ] Running command: sudo /usr/sbin/ip addr show [ceph01][DEBUG ] IP addresses found: [u'192.168.122.190', u'192.168.124.1', u'192.168.3.190'] [ceph_deploy.new][DEBUG ] Resolving host ceph01 [ceph_deploy.new][DEBUG ] Monitor ceph01 at 192.168.3.190 [ceph_deploy.new][INFO ] making sure passwordless SSH succeeds [ceph02][DEBUG ] connected to host: cephadmin [ceph02][INFO ] Running command: ssh -CT -o BatchMode=yes ceph02 [ceph02][DEBUG ] connection detected need for sudo [ceph02][DEBUG ] connected to host: ceph02 [ceph02][DEBUG ] detect platform information from remote host [ceph02][DEBUG ] detect machine type [ceph02][DEBUG ] find the location of an executable [ceph02][INFO ] Running command: sudo /usr/sbin/ip link show [ceph02][INFO ] Running command: sudo /usr/sbin/ip addr show [ceph02][DEBUG ] IP addresses found: [u'192.168.122.191', u'192.168.3.191', u'192.168.124.1'] [ceph_deploy.new][DEBUG ] Resolving host ceph02 [ceph_deploy.new][DEBUG ] Monitor ceph02 at 192.168.3.191 [ceph_deploy.new][INFO ] making sure passwordless SSH succeeds [ceph03][DEBUG ] connected to host: cephadmin [ceph03][INFO ] Running command: ssh -CT -o BatchMode=yes ceph03 [ceph03][DEBUG ] connection detected need for sudo [ceph03][DEBUG ] connected to host: ceph03 [ceph03][DEBUG ] detect platform information from remote host [ceph03][DEBUG ] detect machine type [ceph03][DEBUG ] find the location of an executable [ceph03][INFO ] Running command: sudo /usr/sbin/ip link show [ceph03][INFO ] Running command: sudo /usr/sbin/ip addr show [ceph03][DEBUG ] IP addresses found: [u'192.168.3.192', u'192.168.124.1', u'192.168.122.192'] [ceph_deploy.new][DEBUG ] Resolving host ceph03 [ceph_deploy.new][DEBUG ] Monitor ceph03 at 192.168.3.192 [ceph_deploy.new][DEBUG ] Monitor initial members are ['cephadmin', 'ceph01', 'ceph02', 'ceph03'] [ceph_deploy.new][DEBUG ] Monitor addrs are ['192.168.3.189', '192.168.3.190', '192.168.3.191', '192.168.3.192'] [ceph_deploy.new][DEBUG ] Creating a random mon key... [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring... [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
-
Modify CEPH conf
vim /home/admin/my-cluster/ceph.conf # Add the following public network = 192.168.3.0/24 cluster network = 192.168.122.0/24 osd pool default size = 3 osd pool default min size = 2 osd pool default pg num = 128 osd pool default pgp num = 128 osd pool default crush rule = 0 osd crush chooseleaf type = 1 max open files = 131072 ms bing ipv6 = false [mon] mon clock drift allowed = 10 mon clock drift warn backoff = 30 mon osd full ratio = .95 mon osd nearfull ratio = .85 mon osd down out interval = 600 mon osd report timeout = 300 mon allow pool delete = true [osd] osd recovery max active = 3 osd max backfills = 5 osd max scrubs = 2 osd mkfs type = xfs osd mkfs options xfs = -f -i size=1024 osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog filestore max sync interval = 5 osd op threads = 2
12. Install Ceph software to the specified node
ceph-deploy install --no-adjust-repos cephadmin ceph01 ceph02 ceph03
– no adjust repos uses the local source directly and does not generate the official source
Deploy the initial monitors and obtain the keys
ceph-deploy mon create-initial
After this step, you will see the following keyrings in the current directory:
ls -al /home/cephadmin/my-cluster drwxrwxr-x 2 admin admin 4096 10 October 27:46 . drwx------ 7 admin admin 177 10 October 27:36 .. -rw------- 1 admin admin 113 10 October 27:46 ceph.bootstrap-mds.keyring -rw------- 1 admin admin 113 10 October 27:46 ceph.bootstrap-mgr.keyring -rw------- 1 admin admin 113 10 October 27:46 ceph.bootstrap-osd.keyring -rw------- 1 admin admin 113 10 October 27:46 ceph.bootstrap-rgw.keyring -rw------- 1 admin admin 151 10 October 27:46 ceph.client.admin.keyring -rw-rw-r-- 1 admin admin 1107 10 October 27:36 ceph.conf -rw-rw-r-- 1 admin admin 237600 10 October 27:46 ceph-deploy-ceph.log -rw------- 1 admin admin 73 10 October 27:20 ceph.mon.keyring
Copy the configuration file and key to each node of the cluster
The configuration file is the generated ceph Conf, and the key is ceph client. admin. Keyring, the default secret key that needs to be used when connecting to the ceph cluster using the ceph client. Here, all nodes should copy it. The command is as follows
ceph-deploy admin cephadmin ceph01 ceph02 ceph03
13. Deploy CEPH Mgr
#'manager daemon 'is added to' Ceph 'in L version. Deploy a' Manager 'daemon with the following command [admin@node1 my-cluster]$ ceph-deploy mgr create cephadmin
14. Create OSD
Execute the following command on cephadmin
#Usage: CEPH deploy OSD create – data {device} {CEPH node} ceph-deploy osd create --data /dev/sdb cephadmin ceph-deploy osd create --data /dev/sdb ceph01 ceph-deploy osd create --data /dev/sdb ceph02 ceph-deploy osd create --data /dev/sdb ceph03
15. Check osd status
sudo ceph health sudo ceph -s
By default, ceph client. admin. The permission of the Keyring file is 600, and the owner and group are root. If you use the cephadmin user's direct ceph command on a node in the cluster, you will be prompted that / etc / ceph / ceph cannot be found client. admin. Keyring file because of insufficient permissions
If this problem does not exist when using sudo ceph, the permission can be set to 644 for the convenience of directly using ceph command. Execute the following command under the admin user on the cephadmin node on the cluster node
sudo chmod 644 /etc/ceph/ceph.client.admin.keyring ceph -s
16. View osds
sudo ceph osd tree
17. Start MGR monitoring module
Install CEPH mgr dashboard and install it on the node of mgr (four machines)
yum install ceph-mgr-dashboard -y
-
Mode 1: command operation
ceph mgr module enable dashboard
-
Method 2: configuration file
vim /home/cephadmin/my-cluster/ceph.conf # The contents are as follows [mon] mgr initial modules = dashboard # Push configuration ceph-deploy --overwrite-conf config push cephadmin ceph01 ceph02 ceph03 # Restart mgr systemctl restart ceph-mgr@cephadmin ceph-mgr@ceph01 ceph-mgr@ceph02 ceph-mgr@ceph03
18. Web login configuration
By default, all HTTP connections to the dashboard are protected using SSL/TLS
-
Method 1
#To quickly start and run the dashboard, you can generate and install a self signed certificate using the following built-in command: with root privileges [root@node1 my-cluster]# ceph dashboard create-self-signed-cert #To create a user with the administrator role: [root@node1 my-cluster]# ceph dashboard set-login-credentials admin Shanghai711 #To view CEPH Mgr services: [root@node1 my-cluster]# ceph mgr services { "dashboard": "https://cephadmin:8443/" }
-
Method 2
ceph config-key set mgr/dashboard/server_port 8080 # Set the port to 8080 ceph config-key set mgr/dashboard/server_addr 192.168.3.189 # Set binding ip ceph config set mgr mgr/dashboard/ssl false # ssl is turned off because it is used in the intranet # Restart the dashboard ceph mgr module disable dashboard ceph mgr module enable dashboard ceph dashboard set-login-credentials admin Shanghai711 # Set user name and password
19. Run CEPH as a service
If you deploy Argonaut or Bobtail with Ceph deploy, Ceph can run as a service (you can also use sysvinit)
Start all Daemons
To start your CEPH cluster, add the start command when executing CEPH. Syntax:
sudo service ceph [options] [start|restart] [daemonType|daemonID]
For example:
sudo service ceph -a start
20. Monitoring cluster
After the cluster is running, you can use ceph tool to monitor. Typical monitoring includes checking OSD status, monitor status, homing group status and metadata server status.
-
Interactive mode
To run ceph in interactive mode, do not run ceph with parameters
ceph ceph> health ceph> status ceph> quorum_status ceph> mon_status
-
Check cluster health
After starting the cluster and before reading and writing data, check the health status of the cluster. You can check with the following command
ceph health
When clustering, you may encounter things like health_ Health alarms such as warn XXX num placement groups stale will be checked later. When the cluster is ready, ceph health will give a message like health_ With the same message as OK, you can start using the cluster at this time
-
Observation cluster
To observe what is happening in the cluster, open a new terminal and enter
ceph -w
Ceph prints various events. For example, a small Ceph cluster with one monitor and two OSD s might print these
21. Common usage
# Check cluster status ceph status ceph -s ceph> status # Check OSD status ceph osd stat ceph osd dup ceph osd tree # Check monitor status ceph mon stat ceph mon dump ceph quorum_status # Check MDS status ceph mds stat ceph mds dump
22. Create storage pool
When you create a storage pool, it creates the specified number of homing groups. Ceph will display creating when creating one or more homing groups; After creation, the OSD in the Acting Set of its homing group will establish interconnection; Once the interconnection is completed, the home group status should change to active+clean, which means that Ceph clients can write data to the home group
# List storage pools ceph osd lspools # Create storage pool osd pool default pg num = 100 osd pool default pgp num = 100 # Create and store this ceph osd pool create test 2 ################# determine pg_num The value is mandatory because it cannot be calculated automatically. Here are some common values: Less than 5 OSD When you can pg_num Set to 128 OSD When the number is 5 to 10, you can pg_num Set to 512 OSD When the number is 10 to 50, you can pg_num Set to 4096 OSD When the quantity is greater than 50, you have to understand the trade-off method and how to calculate it yourself pg_num Value Calculate yourself pg_num When taking values, you can use pgcalc tool ####################### # Delete storage pool ceph osd pool delete test test --yes-i-really-really-mean-it # Rename storage pool ceph osd pool rename {current-pool-name} {new-pool-name} # Viewing storage pool statistics rados df # To view usage statistics for a storage pool # Take a snapshot of the storage pool ceph osd pool mksnap {pool-name} {snap-name} # Delete snapshot of storage pool ceph osd pool rmsnap {pool-name} {snap-name} # Adjust storage pool option values cph osd pool set {pool-name} {key} {value} # http://docs.ceph.org.cn/rados/operations/pools/
23. Ceph expansion
Ceph expansion node (new server)
original
host name | role | ip | NAT |
---|---|---|---|
CephAdmin | ceph-deploy+client | 192.168.3.189 | 192.168.122.189 |
ceph01 | mon+osd | 192.168.3.190 | 192.168.122.190 |
ceph02 | mon+osd | 192.168.3.191 | 192.168.122.191 |
ceph03 | mon+osd | 192.168.3.192 | 192.168.122.192 |
newly added
host name | role | ip | NAT |
---|---|---|---|
CephAdmin | ceph-deploy+client | 192.168.3.189 | 192.168.122.189 |
ceph01 | mon+osd | 192.168.3.190 | 192.168.122.190 |
ceph02 | mon+osd | 192.168.3.191 | 192.168.122.191 |
ceph03 | mon+osd | 192.168.3.192 | 192.168.122.192 |
ceph04 | mon+osd | 192.168.3.193 | 192.168.122.193 |
In the production environment, data backfilling is generally not started immediately after the new node joins the ceph cluster, which will affect the cluster performance. So we need to set some flag bits to achieve this goal
ceph osd set noin # Set flag bit ceph osd set nobackfill # Set no backfill data flag
In the off peak of user access, cancel these flag bits and the cluster starts balancing tasks
ceph osd unset noin # Unset flag bit ceph osd unset nobackfill # Cancel no backfill data flag
-
Modify the hosts files of all nodes and add a new node 192.168.3.193 ceph04
vim /etc/hosts # The contents are as follows 192.168.3.189 cephadmin 192.168.3.190 ceph01 192.168.3.191 ceph02 192.168.3.192 ceph03 192.168.3.193 ceph04
-
Modify the host name and hosts of ceph04
hostnamectl set-hostname ceph04
-
Create user
useradd -d /home/admin -m admin echo "123456" | passwd admin --stdin echo "admin ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/admin sudo chmod 0440 /etc/sudoers.d/admin
-
Set password free login (only run on cephadmin node)
su - admin ssh-copy-id admin@ceph04
-
Modify the synchronization time zone (run on the new machine)
sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime sudo yum install ntp -y sudo systemctl enable ntpd sudo systemctl start ntpd sudo ntpstat
-
Configure ceph source
cat > /etc/yum.repos.d/ceph.repo<<'EOF' [Ceph] name=Ceph packages for $basearch baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc priority=1 [Ceph-noarch] name=Ceph noarch packages baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc priority=1 [ceph-source] name=Ceph source packages baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc priority=1 EOF
-
Installation of CEPH and CEPH radosgw at ceph04 node
yum install ceph ceph-radosgw -y
-
Modify CEPH on the * * cephadmin * * node Conf file
vim /home/admin/my-cluster # The amendments are as follows mon_initial_members = cephadmin, ceph01, ceph02, ceph03, ceph04 # Added ceph04
-
Add monitor to existing cluster
ceph-deploy --overwrite-conf mon add ceph04 --address 192.168.3.193
-
Extended rgw
ceph-deploy --overwrite-conf rgw create ceph04
-
Extended mgr
ceph-deploy --overwrite-conf mgr create ceph04
-
View CEPH conf
cat /home/admin/my-cluster/ceph.conf # The contents are as follows [global] fsid = 7218408f-9951-49d7-9acc-857f63369a84 mon_initial_members = cephadmin, ceph01, ceph02, ceph03, ceph04 mon_host = 192.168.3.189,192.168.3.190,192.168.3.191,192.168.3.192,192.168.3.193 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public network = 192.168.3.0/24 cluster network = 192.168.122.0/24
-
Copy the configuration file and admin key to Ceph node in the management node
ceph-deploy --overwrite-conf admin ceph01 ceph02 ceph03 ceph04
-
Create an osd and add the sdb of the new node ceph04 to the storage pool
ceph-deploy osd create --data /dev/sdb ceph04
24. Ceph file storage system
Ceph file system (Ceph FS) is a POSIX compatible file system that uses Ceph storage clusters to store data. Ceph file system uses the same Ceph storage cluster system as Ceph block devices, Ceph object storage that provides both S3 and swift APIs, or native libraries.
[the external link image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-jlocj9vp-164678989987) (D: \ Gaowen \ latest work \ Ceph\Ceph_new \ image \ wechat screenshot _20211102102731.png)]
25. Ceph block equipment
A block is a sequence of bytes (for example, a 512 byte data block). Block based storage interfaces are the most common methods of storing data. They are based on rotating media, such as hard disk, CD, floppy disk, and even traditional 9-track magnetic tape. The ubiquitous block device interface makes virtual block devices an ideal choice for interacting with mass storage systems such as Ceph
Ceph block devices are thin provisioned, adjustable in size, and stripe data to multiple OSDs in the cluster. Ceph block devices take advantage of many capabilities of RADOS, such as snapshot, replication and consistency. Ceph's RADOS block device (RBD) interacts with OSD using kernel modules or librbd libraries
26. Ceph block storage
After deploying CEPH cluster, how to store files in CEPH cluster? CEPH provides three interfaces for users:
- rbd, block storage, is used in block mode. It is usually suitable for combination with virtualization, such as KVM, to provide block storage devices for virtualization
- Object storage, which provides an object storage api through radosgw for users to upload put and get and download object files
- Cephfs file storage, use ceph in the way that cephfs mounts the file system
Create an RBD block file in Ceph cluster for users to use. To use Ceph, you first need a resource pool. Pool is an abstract concept of data storage in Ceph. It is composed of multiple pg (Placegroup) and pgp. When creating, you can specify the number of pg. the size of pg is generally 2^n power. First create a pool as follows
1. Create a pool named test, which contains 128 pg / PGPS
```shell ceph osd pool create test 128 128 ```
2. You can view the pool information, such as the pool list of the current cluster - lspools and pg_num and pgp_num, number of copies, size
# View pool list ceph osd lspools # View pg and pgp quantities ceph osd pool get test pg_num ceph osd pool get test pgp_num # View the size. The default is three copies ceph osd pool get test size
3. At this time, the pool has been created. RBD blocks can be created in the pool. RBD blocks can be created through RBD commands, such as creating a 10G block storage
rbd create -p test --image ceph-rbd-demo.img --size 10G
A CEPH RBD demo is created as above IMG RBD block file, with a size of 10G. You can view the list and details of RBD images through ls and info
# View RBD image list rbd -p test ls # Viewing RBD details, you can see that the image contains 2560 objects, each ojbect is 4M in size, and the objects are RBD_ data. Beginning with 10b96b8b4567 rbd -p test info ceph-rbd-demo.img
4. RBD storage blocks have been created. If they have been combined with the virtual environment, create a virtual machine, and then write data to the disk, RBD provides a map tool, which can map an RBD block locally for use, greatly simplifying the use process. When rbd map, exclusive lock object map fast diff deep flat features are not supported, Therefore, you need to disable first, and the responsible person will prompt the RBD for an error
# Turn off the default featrues rbd -p test --image ceph-rbd-demo.img feature disable deep-flatten && \ rbd -p test --image ceph-rbd-demo.img feature disable fast-diff && \ rbd -p test --image ceph-rbd-demo.img feature disable object-map && \ rbd -p test --image ceph-rbd-demo.img feature disable exclusive-lock # View verification feature information rbd -p test info ceph-rbd-demo.img # map the RBD block locally. After mapping, you can see that the RBD block device is mapped to a local / dev/rbd0 device rbd map -p test --image ceph-rbd-demo.img ls -l /dev/rbd0
5. The RBD block device is mapped to the local / dev/rbd0 device, so it can be used for formatting operations on the device
adopt device list You can view the current machine RBD Mapping of block devices [root@node-1 ~]# ls -l /dev/rbd0 The device can be used like a local disk, so it can be formatted [root@node-1 ~]# mkfs.xfs /dev/rbd0 [root@node-1 ~]# blkid /dev/rbd0 Mount the disk to the system mkdir /mnt/test-rbd mount /dev/rbd0 /mnt/test-rbd df -h /mnt/test-rbd cd /mnt/test-rbd echo "testfile for ceph rbd" > rbd.log
6. A block is a sequence of bytes (for example, a 512 byte data block). Block based storage interfaces are the most common methods of storing data. They are based on rotating media, such as hard disk, CD, floppy disk, and even traditional 9-track magnetic tape. The ubiquitous block device interface makes virtual block devices an ideal choice for interacting with mass storage systems such as Ceph
Ceph block devices are thin provisioned, adjustable in size, and stripe data to multiple OSDs in the cluster. Ceph block devices take advantage of many capabilities of RADOS, such as snapshot, replication and consistency. Ceph's RADOS block device (RBD) interacts with OSD using kernel modules or librbd libraries.
Ceph block devices provide high performance with unlimited scalability, such as Kernel module , or to abbr:KVM (kernel virtual machines) (e.g Qemu , OpenStack and CloudStack And other cloud computing systems can be integrated with Ceph block devices through libvirt and Qemu). You can run the same cluster at the same time Ceph RADOS gateway, Ceph FS file system , and Ceph block devices.
-
Common block device commands
RBD command can be used to create, list, introspect and delete block device images, clone images, create snapshots, rollback snapshots, view snapshots, etc. for details of RBD command usage, see RBD - manage RADOS block device images
# Create a block device image to add a block device to a node, you must first create an image in the Ceph cluster rbd create --size {megabytes} {pool-name}/{image-name} # For example, to create a 1GB image named foo in the storage pool test rbd create --size 1024 test/foo
# List block device images to list block devices in the rbd storage pool, you can use the following command rbd ls {poolname} # For example: rbd ls test
# Retrieve image information use the following command to retrieve the information of a specific image and replace {image name} with the image name rbd info {pool-name}/{image-name} # as rbd info test/foo
# Resizing block device images Ceph block device images are thin provisioned and take up physical space only when you start writing data. However, they all have the maximum capacity, which is the -- size option you set. If you want to increase (or decrease) the maximum size of Ceph block device image, execute the following command: rbd resize --size 2048 foo (to increase) rbd resize --size 2048 foo --allow-shrink (to decrease)
# To delete a block device image, use the following command to delete the block device and replace {image name} with the image name rbd rm {image-name} # as rbd rm foo
27. Block device command
# Create a block device image -- to add a block device to a node, you must first create an image in the Ceph storage cluster, using the following command: rbd create --size {megabytes} {pool-name}/{image-name} # For example, to create an image named bar with a size of 1GB in the storage pool swimmingpool, execute rbd create --size 1024 swimmingpool/bar
# rbd is the name of the storage block in the storage pool, which can be listed by the following command: rbd ls {poolname} # for example rbd ls test
# Retrieve image information -- retrieve the information of a specific image with the following command, replacing {image name} with the image name: rbd info {image-name} rbd info {pool-name}/{image-name}
# Resizing block device images - Ceph block device images are thin provisioned and take up physical space only when you start writing data. However, they all have the maximum capacity, which is the -- size option you set. If you want to increase (or decrease) the maximum size of Ceph block device image, execute the following command rbd resize --size 2048 foo (to increase) rbd resize --size 2048 foo --allow-shrink (to decrease)
# Delete block device image -- you can delete the block device with the following command and replace {image name} with the image name: rbd rm {pool-name}/{image-name} rbd rm swimmingpool/bar
28. Kernel module operation
# Get image list -- to mount block device images, list all images first rbd list {pool-name}
# Mapping block device, using RBD to map the image name to the kernel module. You must specify the image name, storage pool name, and user name. If the RBD kernel module has not been loaded, the RBD command will be loaded automatically sudo rbd map {pool-name}/{image-name} --id {user-name} # for example sudo rbd map rbd/myimage --id admin # If you enable cephx authentication, you must also provide a key. You can specify the key with a key ring or key file sudo rbd map rbd/myimage --id admin --keyring /path/to/keyring sudo rbd map rbd/myimage --id admin --keyfile /path/to/file
# View mapped block devices -- you can use the showmapped option of the rbd command to view the block device image mapped to the kernel module rbd showmapped
# For convenience, use the device block, unmap, and unmap command to specify the device block name with the same name. sudo rbd unmap /dev/rbd/{poolname}/{imagename} sudo rbd unmap /dev/rbd/rbd/foo
29. Snapshot of RBD
A snapshot is a read-only copy of an image at a specific point in time. An advanced feature of Ceph block devices is that you can create snapshots of images to preserve their history. Ceph also supports hierarchical snapshots, allowing you to clone images (such as VM images) quickly and easily. Ceph's snapshot function supports rbd commands and a variety of advanced interfaces, including QEMU , libvirt , OpenStack and CloudStack .
To use the RBD snapshot function, you must have a running Ceph cluster.
Note
If the image is still in I/O operation when taking a snapshot, the snapshot may not obtain the accurate or latest data of the image, and the snapshot may have to be cloned into a new mountable image. Therefore, we recommend stopping the I/O operation before taking a snapshot. If the image contains a file system, make sure that the file system is in a consistent state before taking a snapshot. To stop I/O operations, use the fsfreeze command. For details, please refer to the manual page of fsfreeze(8). For virtual machines, QEMU guest agent is used to automatically freeze the file system when taking snapshots.
Enabled cephx When (default), you must specify the user name or ID and its corresponding key file. For details, see user management . You can also use CEPH_ARGS environment variable to avoid repeated input of the following parameters
rbd --id {user-ID} --keyring=/path/to/secret [commands] rbd --name {username} --keyring=/path/to/secret [commands] # for example rbd --id admin --keyring=/etc/ceph/ceph.keyring [commands] rbd --name client.admin --keyring=/etc/ceph/ceph.keyring [commands]
# Create snapshot -- use the rbd command to create a snapshot. Specify the snap create option, storage pool name, and image name rbd snap create {pool-name}/{image-name}@{snap-name} # for example rbd snap create test/test@test1
# List snapshots - lists snapshots of an image. You need to specify the storage pool name and image name rbd snap ls {pool-name}/{image-name} # for example rbd snap ls rbd/foo
# Rollback snapshot -- use the rbd command to rollback to a snapshot, and specify the snap rollback option, storage pool name, image name, and snapshot name rbd snap rollback {pool-name}/{image-name}@{snap-name} # for example rbd snap rollback rbd/foo@snapname
Note
Rolling back an image to a snapshot means overwriting the current version of the image with the data in the snapshot. The larger the image, the longer the process takes. Cloning from a snapshot is faster than rolling back to a snapshot, which is also the preferred method to return to the previous state.
# Delete snapshot -- to delete a snapshot with rbd, specify the snap rm option, storage pool name, image name, and snapshot name rbd snap rm {pool-name}/{image-name}@{snap-name} # for example rbd snap rm rbd/foo@snapname
Note
Ceph OSDs deletes data asynchronously, so the disk space will not be released immediately after deleting the snapshot.
# Clear snapshots -- to delete all snapshots of an image with rbd, specify the snap purge option, storage pool name, and image name rbd snap purge {pool-name}/{image-name} # for example rbd snap purge test/test
30. RBD image
RBD images can be backed up asynchronously in two Ceph clusters. This capability utilizes the logging feature of RBD image to ensure the consistency of replica crash between clusters. The mirroring function needs to be configured on each corresponding pool in peer clusters. You can set to automatically backup all images in a storage pool or only a specific subset of images. Use the rbd command to configure the mirroring function. The rbd mirror daemon is responsible for pulling the update of the image from the remote cluster and writing it to the corresponding image of the local cluster
# Storage pool configuration - the following program explains how to perform some basic management work to configure the mirroring function with the rbd command. The mirroring function is configured at the storage pool level in Ceph cluster. The steps of configuring the storage pool need to be performed in both peer clusters. For clarity, the following steps assume that the two clusters are called "local" and "remote" respectively, and that a single host has access to both clusters
# Enable mirroring function -- to enable the mirroring function of a storage pool using rbd, you need to specify the mirror pool enable command, storage pool name and mirroring mode rbd mirror pool enable {pool-name} {mode}
The mirror mode can be pool or image:
- Pool: when set to pool mode, all images in the storage pool with log feature enabled will be backed up.
- Image: when set to image mode, each image needs to be Explicit enable Mirror function
for example
rbd --cluster local mirror pool enable test pool rbd --cluster remote mirror pool enable test pool
# Disable mirroring function -- to disable the mirroring function of a storage pool using rbd, you need to specify the mirror pool disable command and the storage pool name: rbd mirror pool disable {pool-name} # for example rbd --cluster local mirror pool disable test rbd --cluster remote mirror pool disable test-gw
# Add peer cluster -- in order for the rbd mirror daemon to discover its peer cluster, it needs to register with the storage pool. To add a companion Ceph cluster using rbd, you need to specify the mirror pool peer add command, storage pool name and cluster description rbd mirror pool peer add {pool-name} {client-name}@{cluster-name} # for example rbd --cluster local mirror pool peer add test client.remote@remote rbd --cluster remote mirror pool peer add test client.local@local
# Remove peer cluster -- use rbd to remove the peer Ceph cluster, and specify the mirror pool peer remove command, storage pool name and UUID of the peer (available through rbd mirror pool info command) rbd mirror pool peer remove {pool-name} {peer-uuid} # for example rbd --cluster local mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445 rbd --cluster remote mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08
-
IMAGE configuration
The configuration of Ceph is different from that of a single cluster
Mirror RBD image is designated as primary or secondary. This is a feature of image, not storage pool. An image designated as a secondary image cannot be modified
When the mirroring function is enabled for an image for the first time (the mirroring mode of the storage pool is set to pool and the logging feature of the image is enabled, or through the rbd command) Explicit enable ), it is automatically promoted to the primary image
Enable log support for IMAGE
The RBD mirroring function uses the RBD log feature to ensure the crash consistency between image copies. The logging feature must be enabled before backing up an image to another peer cluster. This feature can be enabled by specifying the -- image feature exclusive lock, journaling option when creating an image using the RBD command.
Alternatively, you can dynamically enable the logging feature of the existing image. To enable the log feature using rbd, you need to specify the feature enable command, storage pool name, image name and feature Name:
rbd feature enable {pool-name}/{image-name} {feature-name} # for example rbd --cluster local feature enable image-pool/image-1 journaling
# Enable image mirroring function - if the mirroring function of a storage pool is configured as image mode, you also need to explicitly enable the mirroring function for each image in the storage pool. To enable the mirroring function of a specific image through rbd, specify the mirror image enable command, storage pool name and image name rbd mirror image enable {pool-name}/{image-name}
# Disable image mirroring function -- disable the mirroring function of a specific image through rbd. Specify the mirror image disable command, storage pool name and image name rbd mirror image disable {pool-name}/{image-name} # for example rbd --cluster local mirror image disable test/image-1
31. QEMU and block equipment
One of the most common uses of Ceph block devices is as a block device image of a virtual machine. For example, users can create a "gold standard" image with the operating system and related software installed and configured, then take a snapshot of the image, and finally clone the snapshot (usually many times). For details, see snapshot . Write time replication clones that can take snapshots mean that Ceph can quickly provide block device images for virtual machines, because clients do not have to download the entire image each time they start a new virtual machine.
Ceph block device can be integrated with QEMU virtual machine. For QEMU, please refer to QEMU open source processor simulator , whose documents can be referred to QEMU manual . For how to install, see install
-
usage
QEMU command line requires you to specify storage pool name, image name, and snapshot name.
If your configuration file is / Ceph /. Etc by default, it will be located in / Ceph /. Etc The admin user executes the command unless you specify another Ceph profile path or user. When specifying users, QEMU only needs the ID part and does not need to specify the TYPE:ID completely. See for details User management - users_ . Do not add the client type (i.e. client) before the user ID, Otherwise, the authentication will fail. You should also save the key file of the admin user or the user you specified with the: id={user} option to the default path (i.e. / etc/ceph) or local directory, and modify the ownership and permission of the key ring file. The command format is as follows
qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...] # For example, you should specify the id and conf options as follows: qemu-img {command} [options] rbd:test/ceph-rbd-demo.img:id=glance:conf=/etc/ceph/ceph.conf # If the value in Tip configuration contains these characters::, @, =, a backslash \ escape can be added before this symbol.
-
Creating images with QEMU
You can create block device images with QEMU. You must specify rbd, storage pool name, image name to create, and image size
qemu-img create -f raw rbd:{pool-name}/{image-name} {size} # as qemu-img create -f raw rbd:test/vm-test 10G # raw data format is the only format option available when using RBD. Technically, you can use other formats supported by QEMU (such as qcow2 or vmdk), but doing so may bring additional overhead, and hot migration of virtual machines in cache on mode (see below) will lead to volume insecurity.
-
Changing image size with QEMU
You can resize the block device through QEMU. You must specify rbd, storage pool name, image name to adjust, and image size
qemu-img resize rbd:{pool-name}/{image-name} {size} # for example qemu-img resize rbd:test/vm-test 12G
-
Retrieving image information with QEMU
You can use QEMU to retrieve block device image information. You must specify rbd, storage pool name, and image name
qemu-img info rbd:{pool-name}/{image-name} # for example qemu-img info rbd:test/vm-test
-
Running QEMU via RBD
QEMU can transfer the block device on a host to the client, but since QEMU 0.15, there is no need to map the image to the block device on the host. QEMU can now directly access the image as a virtual block device through librbd. This performance is better because it avoids additional context switching and can exemplify the benefits of turning on RBD caching
You can use QEMU img to convert the existing virtual machine image into Ceph block device image
qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze
To boot the virtual machine from that image, execute
qemu-system-x86_64 -m 1024 -drive format=raw,file=rbd:test/vm-test
32. Use Ceph RBD through libvirt
Libvirt library is a virtual machine abstraction layer between hypervisor and software application. Through libvirt, developers and system administrators only need to pay attention to a general management framework, general API and general shell interface (virsh) of these managers, including:
- QEMU/KVM
- XEN
- LXC
- VirtualBox
- wait
Ceph block devices support QEMU/KVM, so you can use Ceph block devices through software that can interact with libvirt. The following stack diagram explains how libvirt and QEMU use Ceph block devices through librbd
To create a virtual machine using Ceph block devices, refer to the steps below. In the exemplary embodiment, we use test as the storage pool name and client Libvirt as the user name and new libvirt image as the image name. You can name whatever you want, but please make sure to replace the corresponding name with your own name in the subsequent process.
1. Configure Ceph
To configure Ceph for libvirt, perform the following steps
# 1. Create - storage pool (or default). In this example, test GW is used as the storage pool name, and 128 homing groups are set ceph osd pool create test-gw 128 128 # Verify that the storage pool exists ceph osd lspools # 2. Create -- Ceph user this example uses client Libvirt, and the permission is limited to test GW ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=test-gw' # Verify that the name exists ceph auth list # Note: libvirt will use libvirt as ID instead of client when accessing Ceph libvirt . For a detailed explanation of the difference between ID and name, please refer to user management - user and user management - command line interface. # 3. Create an image in the RBD storage pool with QEMU. In this example, the image name is test GW img and the storage pool is test GW qemu-img create -f rbd rbd:test/vm-test-1 10G # Verify that the image exists rbd -p test ls
-
Prepare Virtual Machine Manager
You can use libvirt even without a VM Manager, but it's easier to create a domain with virt manager
-
Install Virtual Machine Manager
yum install qemu-kvm libvirt virt-manager libguestfs-tools virt-install.noarch -y
-
Download iso image
-
Start Virtual Machine Manager
virt-manager
-
33. CEPH file system
Ceph file system (Ceph FS) is a POSIX compatible file system that uses Ceph storage clusters to store data. Ceph file system uses the same Ceph storage cluster system as Ceph block devices, Ceph object storage that provides both S3 and swift APIs, or native libraries
The Ceph file system requires at least one file in the Ceph storage cluster Ceph metadata server.
[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-r659rjxv-16467869989988) (C: \ users \ Gaowen \ appdata \ roaming \ typora user images \ image-20211108140625151. PNG)]
34. Add / delete metadata server
Using CEPH deploy to add and remove metadata servers is very simple. One or more metadata servers can be added or removed with one command
-
Add a metadata server
After deploying the monitor and OSD, you can also deploy the metadata server
ceph-deploy mds create {host-name}[:{daemon-name}] [{host-name}[:{daemon-name}] ...] # for example ceph-deploy --overwrite-conf mds create cephadmin ceph01 ceph02 ceph03 ceph04
- Create CEPH file system
A Ceph file system requires at least two RADOS storage pools, one for data and one for metadata. When configuring these storage pools, consider:
- Set a high replica level for the metadata storage pool because any data loss from this storage pool will invalidate the entire file system
- Allocate low latency storage (like SSD) to the metadata storage pool because it will directly affect the operation latency of the client
For storage pool management, refer to Storage pool . For example, to create two storage pools for a file system with default settings, you can use the following command:
ceph osd pool create cephfs_data <pg_num> ceph osd pool create cephfs_metadata <pg_num> # for example ceph osd pool create cephfs_data 128 ceph osd pool create cephfs_metadata 128
After creating the storage pool, you can use the fs new command to create the file system:
ceph fs new <fs_name> <metadata> <data> # for example ceph fs new cephfs cephfs_metadata cephfs_data ceph fs ls
After the file system is created, the MDS server can reach the active state, for example, in a single MDS system
ceph mds stat e5: 1/1/1 up {0=a=up:active}
After the file system is built and MDS is active, you can mount the file system
- Mount CEPH file system with kernel driver
To mount the Ceph file system, if you know the IP address of the monitor, you can use the mount command or mount Ceph tool to automatically resolve the monitor IP address. for example
mkdir /mnt/mycephfs mount -t ceph 192.168.3.189:6789:/ /mnt/mycephfs
To mount cephx enabled Ceph file system, you must specify user name and key
sudo mount -t ceph 192.168.3.189:6789:/ /mnt/mycephfs -o name=admin,secret=AQD6vHhhQUDvJRAAX1BL9kwEX0qtjsFDW1wSMA==
- User space mount CEPH file system
Ceph v0.55 and later versions enable cephx authentication by default. Before mounting a CEPH file system from user space (FUSE), ensure that the client host has a copy of the CEPH configuration and a key ring capable of CEPH metadata server
1. On the client host, copy the Ceph configuration file on the monitor host to the / etc/ceph / directory
mkdir -p /etc/ceph scp {user}@{server-machine}:/etc/ceph/ceph.conf /etc/ceph/ceph.conf chmod -R 644 /etc/ceph
To mount the Ceph file system as a user space file system, use the Ceph fuse command, for example
mkdir /home/gw/cephfs && \ yum install ceph-fuse -y && \ ceph-fuse -m 192.168.3.189:6789 /home/gw/cephfs
mds status
View mds status
ceph mds stat