Ceph learning environment construction

Posted by corillo181 on Wed, 09 Mar 2022 01:59:11 +0100

Ceph operation

1. Prepare the machine

  • CentOS7
  • The configuration of four virtual machines (4C/4G/50G/50G) is for my practice. Please use it according to the actual situation in production

2. The virtual machine is assigned as follows

host nameroleipNAT
CephAdminceph-deploy+client192.168.3.189192.168.122.189
ceph01mon+osd192.168.3.190192.168.122.190
ceph02mon+osd192.168.3.191192.168.122.191
ceph03mon+osd192.168.3.192192.168.122.192

3. Modify yum source

Sometimes the default Yum source of CentOS is not necessarily a domestic image, resulting in the unsatisfactory speed of Yum online installation and update. At this time, it is necessary to set the yum source as a domestic image site

yum install wget -y

Alibaba cloud

cd /etc/yum.repos.d
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
yum makecache

4. CentOS7 kernel upgrade

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org 
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum --disablerepo=\* --enablerepo=elrepo-kernel repolist
yum --disablerepo=\* --enablerepo=elrepo-kernel install  kernel-ml.x86_64  -y
yum remove kernel-tools-libs.x86_64 kernel-tools.x86_64  -y
yum --disablerepo=\* --enablerepo=elrepo-kernel install kernel-ml-tools.x86_64  -y
awk -F \' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
grub2-editenv list
grub2-set-default 0
reboot
uname -r

5. Installation of epel warehouse

yum update -y && yum install epel-release -y

6. Set host name

hostnamectl set-hostname cephadmin
hostnamectl set-hostname ceph01
hostnamectl set-hostname ceph02
hostnamectl set-hostname ceph03

7. Set hosts file

sudo vim /etc/hosts
# The contents are as follows
192.168.3.189 cephadmin
192.168.3.190 ceph01
192.168.3.191 ceph02
192.168.3.192 ceph03

8. Create users and set password free login

  • Create user (running on all four machines)

    useradd -d /home/admin -m admin
    echo "123456" | passwd admin --stdin
    echo "admin ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/admin
    sudo chmod 0440 /etc/sudoers.d/admin
    
  • Set password free login (only run on cephadmin node)

    su - admin
    ssh-keygen
    ssh-copy-id admin@ceph01
    ssh-copy-id admin@ceph02
    ssh-copy-id admin@ceph03
    

9. Modify the synchronization time zone (running on all four machines)

sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
sudo yum install ntp -y
sudo systemctl enable ntpd
sudo systemctl start ntpd
sudo ntpstat

10. Install ceph deploy and install ceph package

  • Configure ceph source

    cat > /etc/yum.repos.d/ceph.repo<<'EOF'
    [Ceph]
    name=Ceph packages for $basearch
    baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/$basearch
    enabled=1
    gpgcheck=1
    type=rpm-md
    gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc
    priority=1
    [Ceph-noarch]
    name=Ceph noarch packages
    baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch
    enabled=1
    gpgcheck=1
    type=rpm-md
    gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc
    priority=1
    [ceph-source]
    name=Ceph source packages
    baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/SRPMS
    enabled=1
    gpgcheck=1
    type=rpm-md
    gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc
    priority=1
    EOF
    
  • Install CEPH deploy (only on cephadmin)

    sudo yum install ceph-deploy -y
    
  • Install EPEL release on all nodes

    sudo yum install epel-release -y
    

11. Initialize mon point (only executed on cephadmin)

# Enter admin user
su - admin
mkdir my-cluster
cd my-cluster
  • Create cluster

    ceph-deploy new {initial-monitor-node(s)}
    

    For example:

    ceph-deploy new cephadmin ceph01 ceph02 ceph03
    

    The situation is as follows:

    [ceph_deploy.conf][DEBUG ] found configuration file at: /home/admin/.cephdeploy.conf
    [ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy new cephadmin ceph01 ceph02 ceph03
    [ceph_deploy.cli][INFO  ] ceph-deploy options:
    [ceph_deploy.cli][INFO  ]  username                      : None
    [ceph_deploy.cli][INFO  ]  func                          : <function new at 0x7f8a22d452a8>
    [ceph_deploy.cli][INFO  ]  verbose                       : False
    [ceph_deploy.cli][INFO  ]  overwrite_conf                : False
    [ceph_deploy.cli][INFO  ]  quiet                         : False
    [ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f8a22d60ef0>
    [ceph_deploy.cli][INFO  ]  cluster                       : ceph
    [ceph_deploy.cli][INFO  ]  ssh_copykey                   : True
    [ceph_deploy.cli][INFO  ]  mon                           : ['cephadmin', 'ceph01', 'ceph02', 'ceph03']
    [ceph_deploy.cli][INFO  ]  public_network                : None
    [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
    [ceph_deploy.cli][INFO  ]  cluster_network               : None
    [ceph_deploy.cli][INFO  ]  default_release               : False
    [ceph_deploy.cli][INFO  ]  fsid                          : None
    [ceph_deploy.new][DEBUG ] Creating new cluster named ceph
    [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
    [cephadmin][DEBUG ] connection detected need for sudo
    [cephadmin][DEBUG ] connected to host: cephadmin
    [cephadmin][DEBUG ] detect platform information from remote host
    [cephadmin][DEBUG ] detect machine type
    [cephadmin][DEBUG ] find the location of an executable
    [cephadmin][INFO  ] Running command: sudo /usr/sbin/ip link show
    [cephadmin][INFO  ] Running command: sudo /usr/sbin/ip addr show
    [cephadmin][DEBUG ] IP addresses found: [u'192.168.124.1', u'192.168.3.189', u'192.168.122.189']
    [ceph_deploy.new][DEBUG ] Resolving host cephadmin
    [ceph_deploy.new][DEBUG ] Monitor cephadmin at 192.168.3.189
    [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
    [ceph01][DEBUG ] connected to host: cephadmin
    [ceph01][INFO  ] Running command: ssh -CT -o BatchMode=yes ceph01
    [ceph01][DEBUG ] connection detected need for sudo
    [ceph01][DEBUG ] connected to host: ceph01
    [ceph01][DEBUG ] detect platform information from remote host
    [ceph01][DEBUG ] detect machine type
    [ceph01][DEBUG ] find the location of an executable
    [ceph01][INFO  ] Running command: sudo /usr/sbin/ip link show
    [ceph01][INFO  ] Running command: sudo /usr/sbin/ip addr show
    [ceph01][DEBUG ] IP addresses found: [u'192.168.122.190', u'192.168.124.1', u'192.168.3.190']
    [ceph_deploy.new][DEBUG ] Resolving host ceph01
    [ceph_deploy.new][DEBUG ] Monitor ceph01 at 192.168.3.190
    [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
    [ceph02][DEBUG ] connected to host: cephadmin
    [ceph02][INFO  ] Running command: ssh -CT -o BatchMode=yes ceph02
    [ceph02][DEBUG ] connection detected need for sudo
    [ceph02][DEBUG ] connected to host: ceph02
    [ceph02][DEBUG ] detect platform information from remote host
    [ceph02][DEBUG ] detect machine type
    [ceph02][DEBUG ] find the location of an executable
    [ceph02][INFO  ] Running command: sudo /usr/sbin/ip link show
    [ceph02][INFO  ] Running command: sudo /usr/sbin/ip addr show
    [ceph02][DEBUG ] IP addresses found: [u'192.168.122.191', u'192.168.3.191', u'192.168.124.1']
    [ceph_deploy.new][DEBUG ] Resolving host ceph02
    [ceph_deploy.new][DEBUG ] Monitor ceph02 at 192.168.3.191
    [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
    [ceph03][DEBUG ] connected to host: cephadmin
    [ceph03][INFO  ] Running command: ssh -CT -o BatchMode=yes ceph03
    [ceph03][DEBUG ] connection detected need for sudo
    [ceph03][DEBUG ] connected to host: ceph03
    [ceph03][DEBUG ] detect platform information from remote host
    [ceph03][DEBUG ] detect machine type
    [ceph03][DEBUG ] find the location of an executable
    [ceph03][INFO  ] Running command: sudo /usr/sbin/ip link show
    [ceph03][INFO  ] Running command: sudo /usr/sbin/ip addr show
    [ceph03][DEBUG ] IP addresses found: [u'192.168.3.192', u'192.168.124.1', u'192.168.122.192']
    [ceph_deploy.new][DEBUG ] Resolving host ceph03
    [ceph_deploy.new][DEBUG ] Monitor ceph03 at 192.168.3.192
    [ceph_deploy.new][DEBUG ] Monitor initial members are ['cephadmin', 'ceph01', 'ceph02', 'ceph03']
    [ceph_deploy.new][DEBUG ] Monitor addrs are ['192.168.3.189', '192.168.3.190', '192.168.3.191', '192.168.3.192']
    [ceph_deploy.new][DEBUG ] Creating a random mon key...
    [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
    [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
    
  • Modify CEPH conf

    vim /home/admin/my-cluster/ceph.conf
    # Add the following
    public network = 192.168.3.0/24
    cluster network = 192.168.122.0/24
    osd pool default size = 3
    osd pool default min size = 2
    osd pool default pg num = 128
    osd pool default pgp num = 128
    osd pool default crush rule = 0
    osd crush chooseleaf type = 1
    max open files = 131072
    ms bing ipv6 = false
    
    [mon]
    mon clock drift allowed      = 10
    mon clock drift warn backoff = 30
    mon osd full ratio           = .95
    mon osd nearfull ratio       = .85
    mon osd down out interval    = 600
    mon osd report timeout       = 300
    mon allow pool delete      = true
    
    [osd]
    osd recovery max active      = 3    
    osd max backfills            = 5
    osd max scrubs               = 2
    osd mkfs type = xfs
    osd mkfs options xfs = -f -i size=1024
    osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog
    filestore max sync interval  = 5
    osd op threads               = 2
    

12. Install Ceph software to the specified node

ceph-deploy install --no-adjust-repos cephadmin ceph01 ceph02 ceph03

– no adjust repos uses the local source directly and does not generate the official source

Deploy the initial monitors and obtain the keys

ceph-deploy mon create-initial

After this step, you will see the following keyrings in the current directory:

ls -al /home/cephadmin/my-cluster
drwxrwxr-x 2 admin admin   4096 10 October 27:46 .
drwx------ 7 admin admin    177 10 October 27:36 ..
-rw------- 1 admin admin    113 10 October 27:46 ceph.bootstrap-mds.keyring
-rw------- 1 admin admin    113 10 October 27:46 ceph.bootstrap-mgr.keyring
-rw------- 1 admin admin    113 10 October 27:46 ceph.bootstrap-osd.keyring
-rw------- 1 admin admin    113 10 October 27:46 ceph.bootstrap-rgw.keyring
-rw------- 1 admin admin    151 10 October 27:46 ceph.client.admin.keyring
-rw-rw-r-- 1 admin admin   1107 10 October 27:36 ceph.conf
-rw-rw-r-- 1 admin admin 237600 10 October 27:46 ceph-deploy-ceph.log
-rw------- 1 admin admin     73 10 October 27:20 ceph.mon.keyring

Copy the configuration file and key to each node of the cluster

The configuration file is the generated ceph Conf, and the key is ceph client. admin. Keyring, the default secret key that needs to be used when connecting to the ceph cluster using the ceph client. Here, all nodes should copy it. The command is as follows

ceph-deploy admin cephadmin ceph01 ceph02 ceph03

13. Deploy CEPH Mgr

#'manager daemon 'is added to' Ceph 'in L version. Deploy a' Manager 'daemon with the following command
[admin@node1 my-cluster]$ ceph-deploy mgr create cephadmin 

14. Create OSD

Execute the following command on cephadmin

#Usage: CEPH deploy OSD create – data {device} {CEPH node}
ceph-deploy osd create --data /dev/sdb cephadmin
ceph-deploy osd create --data /dev/sdb ceph01
ceph-deploy osd create --data /dev/sdb ceph02
ceph-deploy osd create --data /dev/sdb ceph03

15. Check osd status

sudo ceph health
sudo ceph -s

By default, ceph client. admin. The permission of the Keyring file is 600, and the owner and group are root. If you use the cephadmin user's direct ceph command on a node in the cluster, you will be prompted that / etc / ceph / ceph cannot be found client. admin. Keyring file because of insufficient permissions

If this problem does not exist when using sudo ceph, the permission can be set to 644 for the convenience of directly using ceph command. Execute the following command under the admin user on the cephadmin node on the cluster node

sudo chmod 644 /etc/ceph/ceph.client.admin.keyring
ceph -s

16. View osds

sudo ceph osd tree

17. Start MGR monitoring module

Install CEPH mgr dashboard and install it on the node of mgr (four machines)

yum install ceph-mgr-dashboard -y
  • Mode 1: command operation

    ceph mgr module enable dashboard
    
  • Method 2: configuration file

    vim /home/cephadmin/my-cluster/ceph.conf
    # The contents are as follows
    [mon]
    mgr initial modules = dashboard
    
    # Push configuration
    ceph-deploy --overwrite-conf config push cephadmin ceph01 ceph02 ceph03
    # Restart mgr
    systemctl restart ceph-mgr@cephadmin ceph-mgr@ceph01 ceph-mgr@ceph02 ceph-mgr@ceph03
    

18. Web login configuration

By default, all HTTP connections to the dashboard are protected using SSL/TLS

  • Method 1

    #To quickly start and run the dashboard, you can generate and install a self signed certificate using the following built-in command: with root privileges
    [root@node1 my-cluster]# ceph dashboard create-self-signed-cert
    
    #To create a user with the administrator role:
    [root@node1 my-cluster]# ceph dashboard set-login-credentials admin Shanghai711
    
    #To view CEPH Mgr services:
    [root@node1 my-cluster]# ceph mgr services
    {
        "dashboard": "https://cephadmin:8443/"
    }
    
  • Method 2

    ceph config-key set mgr/dashboard/server_port 8080 # Set the port to 8080
    ceph config-key set mgr/dashboard/server_addr 192.168.3.189 # Set binding ip
    ceph config set mgr mgr/dashboard/ssl false # ssl is turned off because it is used in the intranet
    # Restart the dashboard
    ceph mgr module disable dashboard
    ceph mgr module enable dashboard
    ceph dashboard set-login-credentials admin Shanghai711 # Set user name and password
    

19. Run CEPH as a service

If you deploy Argonaut or Bobtail with Ceph deploy, Ceph can run as a service (you can also use sysvinit)

Start all Daemons

To start your CEPH cluster, add the start command when executing CEPH. Syntax:

sudo service ceph [options] [start|restart] [daemonType|daemonID]

For example:

sudo service ceph -a start

20. Monitoring cluster

After the cluster is running, you can use ceph tool to monitor. Typical monitoring includes checking OSD status, monitor status, homing group status and metadata server status.

  • Interactive mode

    To run ceph in interactive mode, do not run ceph with parameters

    ceph
    ceph> health
    ceph> status
    ceph> quorum_status
    ceph> mon_status
    
  • Check cluster health

    After starting the cluster and before reading and writing data, check the health status of the cluster. You can check with the following command

    ceph health
    

    When clustering, you may encounter things like health_ Health alarms such as warn XXX num placement groups stale will be checked later. When the cluster is ready, ceph health will give a message like health_ With the same message as OK, you can start using the cluster at this time

  • Observation cluster

    To observe what is happening in the cluster, open a new terminal and enter

    ceph -w
    

    Ceph prints various events. For example, a small Ceph cluster with one monitor and two OSD s might print these

21. Common usage

# Check cluster status
ceph status
ceph -s
ceph> status
# Check OSD status
ceph osd stat
ceph osd dup
ceph osd tree
# Check monitor status
ceph mon stat
ceph mon dump
ceph quorum_status
# Check MDS status
ceph mds stat
ceph mds dump

22. Create storage pool

When you create a storage pool, it creates the specified number of homing groups. Ceph will display creating when creating one or more homing groups; After creation, the OSD in the Acting Set of its homing group will establish interconnection; Once the interconnection is completed, the home group status should change to active+clean, which means that Ceph clients can write data to the home group

# List storage pools
ceph osd lspools
# Create storage pool
osd pool default pg num = 100
osd pool default pgp num = 100
# Create and store this
ceph osd pool create test 2
#################
determine pg_num The value is mandatory because it cannot be calculated automatically. Here are some common values:

Less than 5 OSD When you can pg_num Set to 128
OSD When the number is 5 to 10, you can pg_num Set to 512
OSD When the number is 10 to 50, you can pg_num Set to 4096
OSD When the quantity is greater than 50, you have to understand the trade-off method and how to calculate it yourself pg_num Value
 Calculate yourself pg_num When taking values, you can use pgcalc tool
#######################
# Delete storage pool
ceph osd pool delete test test --yes-i-really-really-mean-it
# Rename storage pool
ceph osd pool rename {current-pool-name} {new-pool-name}
# Viewing storage pool statistics
rados df # To view usage statistics for a storage pool
# Take a snapshot of the storage pool
ceph osd pool mksnap {pool-name} {snap-name}
# Delete snapshot of storage pool
ceph osd pool rmsnap {pool-name} {snap-name}
# Adjust storage pool option values
cph osd pool set {pool-name} {key} {value} # http://docs.ceph.org.cn/rados/operations/pools/

23. Ceph expansion

Ceph expansion node (new server)

original

host nameroleipNAT
CephAdminceph-deploy+client192.168.3.189192.168.122.189
ceph01mon+osd192.168.3.190192.168.122.190
ceph02mon+osd192.168.3.191192.168.122.191
ceph03mon+osd192.168.3.192192.168.122.192

newly added

host nameroleipNAT
CephAdminceph-deploy+client192.168.3.189192.168.122.189
ceph01mon+osd192.168.3.190192.168.122.190
ceph02mon+osd192.168.3.191192.168.122.191
ceph03mon+osd192.168.3.192192.168.122.192
ceph04mon+osd192.168.3.193192.168.122.193

In the production environment, data backfilling is generally not started immediately after the new node joins the ceph cluster, which will affect the cluster performance. So we need to set some flag bits to achieve this goal

ceph osd set noin   # Set flag bit
ceph osd set nobackfill # Set no backfill data flag

In the off peak of user access, cancel these flag bits and the cluster starts balancing tasks

ceph osd unset noin         # Unset flag bit
ceph osd unset nobackfill   # Cancel no backfill data flag
  • Modify the hosts files of all nodes and add a new node 192.168.3.193 ceph04

    vim /etc/hosts
    # The contents are as follows
    192.168.3.189 cephadmin
    192.168.3.190 ceph01
    192.168.3.191 ceph02
    192.168.3.192 ceph03
    192.168.3.193 ceph04
    
  • Modify the host name and hosts of ceph04

    hostnamectl set-hostname ceph04
    
  • Create user

    useradd -d /home/admin -m admin
    echo "123456" | passwd admin --stdin
    echo "admin ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/admin
    sudo chmod 0440 /etc/sudoers.d/admin
    
  • Set password free login (only run on cephadmin node)

    su - admin
    ssh-copy-id admin@ceph04
    
  • Modify the synchronization time zone (run on the new machine)

    sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
    sudo yum install ntp -y
    sudo systemctl enable ntpd
    sudo systemctl start ntpd
    sudo ntpstat
    
  • Configure ceph source

    cat > /etc/yum.repos.d/ceph.repo<<'EOF'
    [Ceph]
    name=Ceph packages for $basearch
    baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/$basearch
    enabled=1
    gpgcheck=1
    type=rpm-md
    gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc
    priority=1
    [Ceph-noarch]
    name=Ceph noarch packages
    baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch
    enabled=1
    gpgcheck=1
    type=rpm-md
    gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc
    priority=1
    [ceph-source]
    name=Ceph source packages
    baseurl=https://mirror.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/SRPMS
    enabled=1
    gpgcheck=1
    type=rpm-md
    gpgkey=https://mirror.tuna.tsinghua.edu.cn/ceph/keys/release.asc
    priority=1
    EOF
    
  • Installation of CEPH and CEPH radosgw at ceph04 node

    yum install ceph ceph-radosgw -y
    
  • Modify CEPH on the * * cephadmin * * node Conf file

    vim /home/admin/my-cluster
    # The amendments are as follows
    mon_initial_members = cephadmin, ceph01, ceph02, ceph03, ceph04 # Added ceph04
    
  • Add monitor to existing cluster

    ceph-deploy --overwrite-conf mon add ceph04 --address 192.168.3.193
    
  • Extended rgw

    ceph-deploy --overwrite-conf rgw create ceph04
    
  • Extended mgr

    ceph-deploy --overwrite-conf mgr create ceph04
    
  • View CEPH conf

    cat /home/admin/my-cluster/ceph.conf
    # The contents are as follows
    [global]
    fsid = 7218408f-9951-49d7-9acc-857f63369a84
    mon_initial_members = cephadmin, ceph01, ceph02, ceph03, ceph04
    mon_host = 192.168.3.189,192.168.3.190,192.168.3.191,192.168.3.192,192.168.3.193
    auth_cluster_required = cephx
    auth_service_required = cephx
    auth_client_required = cephx
    
    public network = 192.168.3.0/24
    cluster network = 192.168.122.0/24
    
  • Copy the configuration file and admin key to Ceph node in the management node

    ceph-deploy --overwrite-conf admin ceph01 ceph02 ceph03 ceph04
    
  • Create an osd and add the sdb of the new node ceph04 to the storage pool

    ceph-deploy osd create --data /dev/sdb ceph04
    

24. Ceph file storage system

Ceph file system (Ceph FS) is a POSIX compatible file system that uses Ceph storage clusters to store data. Ceph file system uses the same Ceph storage cluster system as Ceph block devices, Ceph object storage that provides both S3 and swift APIs, or native libraries.

[the external link image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-jlocj9vp-164678989987) (D: \ Gaowen \ latest work \ Ceph\Ceph_new \ image \ wechat screenshot _20211102102731.png)]

25. Ceph block equipment

A block is a sequence of bytes (for example, a 512 byte data block). Block based storage interfaces are the most common methods of storing data. They are based on rotating media, such as hard disk, CD, floppy disk, and even traditional 9-track magnetic tape. The ubiquitous block device interface makes virtual block devices an ideal choice for interacting with mass storage systems such as Ceph

Ceph block devices are thin provisioned, adjustable in size, and stripe data to multiple OSDs in the cluster. Ceph block devices take advantage of many capabilities of RADOS, such as snapshot, replication and consistency. Ceph's RADOS block device (RBD) interacts with OSD using kernel modules or librbd libraries

26. Ceph block storage

Let Ceph cluster run

After deploying CEPH cluster, how to store files in CEPH cluster? CEPH provides three interfaces for users:

  • rbd, block storage, is used in block mode. It is usually suitable for combination with virtualization, such as KVM, to provide block storage devices for virtualization
  • Object storage, which provides an object storage api through radosgw for users to upload put and get and download object files
  • Cephfs file storage, use ceph in the way that cephfs mounts the file system

Create an RBD block file in Ceph cluster for users to use. To use Ceph, you first need a resource pool. Pool is an abstract concept of data storage in Ceph. It is composed of multiple pg (Placegroup) and pgp. When creating, you can specify the number of pg. the size of pg is generally 2^n power. First create a pool as follows

1. Create a pool named test, which contains 128 pg / PGPS

    ```shell
    ceph osd pool create test 128 128
    ```

2. You can view the pool information, such as the pool list of the current cluster - lspools and pg_num and pgp_num, number of copies, size

# View pool list
ceph osd lspools
# View pg and pgp quantities
ceph osd pool get test pg_num
ceph osd pool get test pgp_num
# View the size. The default is three copies
ceph osd pool get test size

3. At this time, the pool has been created. RBD blocks can be created in the pool. RBD blocks can be created through RBD commands, such as creating a 10G block storage

rbd create -p test --image ceph-rbd-demo.img --size 10G

A CEPH RBD demo is created as above IMG RBD block file, with a size of 10G. You can view the list and details of RBD images through ls and info

# View RBD image list
rbd -p test ls
# Viewing RBD details, you can see that the image contains 2560 objects, each ojbect is 4M in size, and the objects are RBD_ data. Beginning with 10b96b8b4567
rbd -p test info ceph-rbd-demo.img

4. RBD storage blocks have been created. If they have been combined with the virtual environment, create a virtual machine, and then write data to the disk, RBD provides a map tool, which can map an RBD block locally for use, greatly simplifying the use process. When rbd map, exclusive lock object map fast diff deep flat features are not supported, Therefore, you need to disable first, and the responsible person will prompt the RBD for an error

# Turn off the default featrues
rbd -p test --image ceph-rbd-demo.img feature disable deep-flatten && \
rbd -p test --image ceph-rbd-demo.img feature disable fast-diff && \
rbd -p test --image ceph-rbd-demo.img feature disable object-map && \
rbd -p test --image ceph-rbd-demo.img feature disable exclusive-lock 

# View verification feature information
rbd -p test info ceph-rbd-demo.img

# map the RBD block locally. After mapping, you can see that the RBD block device is mapped to a local / dev/rbd0 device
rbd map -p test --image ceph-rbd-demo.img
ls -l /dev/rbd0

5. The RBD block device is mapped to the local / dev/rbd0 device, so it can be used for formatting operations on the device

adopt device list You can view the current machine RBD Mapping of block devices
[root@node-1 ~]# ls -l /dev/rbd0

The device can be used like a local disk, so it can be formatted
[root@node-1 ~]# mkfs.xfs /dev/rbd0 
[root@node-1 ~]# blkid /dev/rbd0
 Mount the disk to the system
mkdir /mnt/test-rbd
mount /dev/rbd0 /mnt/test-rbd
df -h /mnt/test-rbd
cd /mnt/test-rbd
echo "testfile for ceph rbd" > rbd.log

6. A block is a sequence of bytes (for example, a 512 byte data block). Block based storage interfaces are the most common methods of storing data. They are based on rotating media, such as hard disk, CD, floppy disk, and even traditional 9-track magnetic tape. The ubiquitous block device interface makes virtual block devices an ideal choice for interacting with mass storage systems such as Ceph

Ceph block devices are thin provisioned, adjustable in size, and stripe data to multiple OSDs in the cluster. Ceph block devices take advantage of many capabilities of RADOS, such as snapshot, replication and consistency. Ceph's RADOS block device (RBD) interacts with OSD using kernel modules or librbd libraries.

Ceph block devices provide high performance with unlimited scalability, such as Kernel module , or to abbr:KVM (kernel virtual machines) (e.g Qemu , OpenStack and CloudStack And other cloud computing systems can be integrated with Ceph block devices through libvirt and Qemu). You can run the same cluster at the same time Ceph RADOS gateway, Ceph FS file system , and Ceph block devices.

  • Common block device commands

    RBD command can be used to create, list, introspect and delete block device images, clone images, create snapshots, rollback snapshots, view snapshots, etc. for details of RBD command usage, see RBD - manage RADOS block device images

    # Create a block device image to add a block device to a node, you must first create an image in the Ceph cluster
    rbd create --size {megabytes} {pool-name}/{image-name}
    # For example, to create a 1GB image named foo in the storage pool test
    rbd create --size 1024 test/foo
    
    # List block device images to list block devices in the rbd storage pool, you can use the following command
    rbd ls {poolname}
    # For example:
    rbd ls test
    
    # Retrieve image information use the following command to retrieve the information of a specific image and replace {image name} with the image name
    rbd info {pool-name}/{image-name}
    # as
    rbd info test/foo
    
    # Resizing block device images Ceph block device images are thin provisioned and take up physical space only when you start writing data. However, they all have the maximum capacity, which is the -- size option you set. If you want to increase (or decrease) the maximum size of Ceph block device image, execute the following command:
    rbd resize --size 2048 foo (to increase)
    rbd resize --size 2048 foo --allow-shrink (to decrease)
    
    # To delete a block device image, use the following command to delete the block device and replace {image name} with the image name
    rbd rm {image-name}
    # as
    rbd rm foo
    

27. Block device command

# Create a block device image -- to add a block device to a node, you must first create an image in the Ceph storage cluster, using the following command:
rbd create --size {megabytes} {pool-name}/{image-name}
# For example, to create an image named bar with a size of 1GB in the storage pool swimmingpool, execute
rbd create --size 1024 swimmingpool/bar
# rbd is the name of the storage block in the storage pool, which can be listed by the following command:
rbd ls {poolname}
# for example
rbd ls test
# Retrieve image information -- retrieve the information of a specific image with the following command, replacing {image name} with the image name:
rbd info {image-name}
rbd info {pool-name}/{image-name}
# Resizing block device images - Ceph block device images are thin provisioned and take up physical space only when you start writing data. However, they all have the maximum capacity, which is the -- size option you set. If you want to increase (or decrease) the maximum size of Ceph block device image, execute the following command
rbd resize --size 2048 foo (to increase)
rbd resize --size 2048 foo --allow-shrink (to decrease)
# Delete block device image -- you can delete the block device with the following command and replace {image name} with the image name:
rbd rm {pool-name}/{image-name}
rbd rm swimmingpool/bar

28. Kernel module operation

# Get image list -- to mount block device images, list all images first
rbd list {pool-name}
# Mapping block device, using RBD to map the image name to the kernel module. You must specify the image name, storage pool name, and user name. If the RBD kernel module has not been loaded, the RBD command will be loaded automatically
sudo rbd map {pool-name}/{image-name} --id {user-name}
# for example
sudo rbd map rbd/myimage --id admin
# If you enable cephx authentication, you must also provide a key. You can specify the key with a key ring or key file
sudo rbd map rbd/myimage --id admin --keyring /path/to/keyring
sudo rbd map rbd/myimage --id admin --keyfile /path/to/file
# View mapped block devices -- you can use the showmapped option of the rbd command to view the block device image mapped to the kernel module
rbd showmapped
# For convenience, use the device block, unmap, and unmap command to specify the device block name with the same name.
sudo rbd unmap /dev/rbd/{poolname}/{imagename}
sudo rbd unmap /dev/rbd/rbd/foo

29. Snapshot of RBD

A snapshot is a read-only copy of an image at a specific point in time. An advanced feature of Ceph block devices is that you can create snapshots of images to preserve their history. Ceph also supports hierarchical snapshots, allowing you to clone images (such as VM images) quickly and easily. Ceph's snapshot function supports rbd commands and a variety of advanced interfaces, including QEMU , libvirt , OpenStack and CloudStack .

To use the RBD snapshot function, you must have a running Ceph cluster.

Note

If the image is still in I/O operation when taking a snapshot, the snapshot may not obtain the accurate or latest data of the image, and the snapshot may have to be cloned into a new mountable image. Therefore, we recommend stopping the I/O operation before taking a snapshot. If the image contains a file system, make sure that the file system is in a consistent state before taking a snapshot. To stop I/O operations, use the fsfreeze command. For details, please refer to the manual page of fsfreeze(8). For virtual machines, QEMU guest agent is used to automatically freeze the file system when taking snapshots.

Enabled cephx When (default), you must specify the user name or ID and its corresponding key file. For details, see user management . You can also use CEPH_ARGS environment variable to avoid repeated input of the following parameters

rbd --id {user-ID} --keyring=/path/to/secret [commands]
rbd --name {username} --keyring=/path/to/secret [commands]
# for example
rbd --id admin --keyring=/etc/ceph/ceph.keyring [commands]
rbd --name client.admin --keyring=/etc/ceph/ceph.keyring [commands]
# Create snapshot -- use the rbd command to create a snapshot. Specify the snap create option, storage pool name, and image name
rbd snap create {pool-name}/{image-name}@{snap-name}
# for example
rbd snap create test/test@test1
# List snapshots - lists snapshots of an image. You need to specify the storage pool name and image name
rbd snap ls {pool-name}/{image-name}
# for example
rbd snap ls rbd/foo
# Rollback snapshot -- use the rbd command to rollback to a snapshot, and specify the snap rollback option, storage pool name, image name, and snapshot name
rbd snap rollback {pool-name}/{image-name}@{snap-name}
# for example
rbd snap rollback rbd/foo@snapname

Note

Rolling back an image to a snapshot means overwriting the current version of the image with the data in the snapshot. The larger the image, the longer the process takes. Cloning from a snapshot is faster than rolling back to a snapshot, which is also the preferred method to return to the previous state.

# Delete snapshot -- to delete a snapshot with rbd, specify the snap rm option, storage pool name, image name, and snapshot name
rbd snap rm {pool-name}/{image-name}@{snap-name}
# for example
rbd snap rm rbd/foo@snapname

Note

Ceph OSDs deletes data asynchronously, so the disk space will not be released immediately after deleting the snapshot.

# Clear snapshots -- to delete all snapshots of an image with rbd, specify the snap purge option, storage pool name, and image name
rbd snap purge {pool-name}/{image-name}
# for example
rbd snap purge test/test

30. RBD image

RBD images can be backed up asynchronously in two Ceph clusters. This capability utilizes the logging feature of RBD image to ensure the consistency of replica crash between clusters. The mirroring function needs to be configured on each corresponding pool in peer clusters. You can set to automatically backup all images in a storage pool or only a specific subset of images. Use the rbd command to configure the mirroring function. The rbd mirror daemon is responsible for pulling the update of the image from the remote cluster and writing it to the corresponding image of the local cluster

# Storage pool configuration - the following program explains how to perform some basic management work to configure the mirroring function with the rbd command. The mirroring function is configured at the storage pool level in Ceph cluster. The steps of configuring the storage pool need to be performed in both peer clusters. For clarity, the following steps assume that the two clusters are called "local" and "remote" respectively, and that a single host has access to both clusters
# Enable mirroring function -- to enable the mirroring function of a storage pool using rbd, you need to specify the mirror pool enable command, storage pool name and mirroring mode
rbd mirror pool enable {pool-name} {mode}

The mirror mode can be pool or image:

  • Pool: when set to pool mode, all images in the storage pool with log feature enabled will be backed up.
  • Image: when set to image mode, each image needs to be Explicit enable Mirror function

for example

rbd --cluster local mirror pool enable test pool
rbd --cluster remote mirror pool enable test pool
# Disable mirroring function -- to disable the mirroring function of a storage pool using rbd, you need to specify the mirror pool disable command and the storage pool name:
rbd mirror pool disable {pool-name}
# for example
rbd --cluster local mirror pool disable test
rbd --cluster remote mirror pool disable test-gw
# Add peer cluster -- in order for the rbd mirror daemon to discover its peer cluster, it needs to register with the storage pool. To add a companion Ceph cluster using rbd, you need to specify the mirror pool peer add command, storage pool name and cluster description
rbd mirror pool peer add {pool-name} {client-name}@{cluster-name}
# for example
rbd --cluster local mirror pool peer add test client.remote@remote
rbd --cluster remote mirror pool peer add test client.local@local
# Remove peer cluster -- use rbd to remove the peer Ceph cluster, and specify the mirror pool peer remove command, storage pool name and UUID of the peer (available through rbd mirror pool info command)
rbd mirror pool peer remove {pool-name} {peer-uuid}
# for example
rbd --cluster local mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445
rbd --cluster remote mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08
  • IMAGE configuration

    The configuration of Ceph is different from that of a single cluster

    Mirror RBD image is designated as primary or secondary. This is a feature of image, not storage pool. An image designated as a secondary image cannot be modified

    When the mirroring function is enabled for an image for the first time (the mirroring mode of the storage pool is set to pool and the logging feature of the image is enabled, or through the rbd command) Explicit enable ), it is automatically promoted to the primary image

Enable log support for IMAGE

The RBD mirroring function uses the RBD log feature to ensure the crash consistency between image copies. The logging feature must be enabled before backing up an image to another peer cluster. This feature can be enabled by specifying the -- image feature exclusive lock, journaling option when creating an image using the RBD command.

Alternatively, you can dynamically enable the logging feature of the existing image. To enable the log feature using rbd, you need to specify the feature enable command, storage pool name, image name and feature Name:

rbd feature enable {pool-name}/{image-name} {feature-name}
# for example
rbd --cluster local feature enable image-pool/image-1 journaling
# Enable image mirroring function - if the mirroring function of a storage pool is configured as image mode, you also need to explicitly enable the mirroring function for each image in the storage pool. To enable the mirroring function of a specific image through rbd, specify the mirror image enable command, storage pool name and image name
rbd mirror image enable {pool-name}/{image-name}
# Disable image mirroring function -- disable the mirroring function of a specific image through rbd. Specify the mirror image disable command, storage pool name and image name
rbd mirror image disable {pool-name}/{image-name}
# for example
rbd --cluster local mirror image disable test/image-1

31. QEMU and block equipment

One of the most common uses of Ceph block devices is as a block device image of a virtual machine. For example, users can create a "gold standard" image with the operating system and related software installed and configured, then take a snapshot of the image, and finally clone the snapshot (usually many times). For details, see snapshot . Write time replication clones that can take snapshots mean that Ceph can quickly provide block device images for virtual machines, because clients do not have to download the entire image each time they start a new virtual machine.

Ceph block device can be integrated with QEMU virtual machine. For QEMU, please refer to QEMU open source processor simulator , whose documents can be referred to QEMU manual . For how to install, see install

  • usage

    QEMU command line requires you to specify storage pool name, image name, and snapshot name.

    If your configuration file is / Ceph /. Etc by default, it will be located in / Ceph /. Etc The admin user executes the command unless you specify another Ceph profile path or user. When specifying users, QEMU only needs the ID part and does not need to specify the TYPE:ID completely. See for details User management - users_ . Do not add the client type (i.e. client) before the user ID, Otherwise, the authentication will fail. You should also save the key file of the admin user or the user you specified with the: id={user} option to the default path (i.e. / etc/ceph) or local directory, and modify the ownership and permission of the key ring file. The command format is as follows

    qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...]
    # For example, you should specify the id and conf options as follows:
    qemu-img {command} [options] rbd:test/ceph-rbd-demo.img:id=glance:conf=/etc/ceph/ceph.conf
    # If the value in Tip configuration contains these characters::, @, =, a backslash \ escape can be added before this symbol.
    
  • Creating images with QEMU

    You can create block device images with QEMU. You must specify rbd, storage pool name, image name to create, and image size

    qemu-img create -f raw rbd:{pool-name}/{image-name} {size}
    # as
    qemu-img create -f raw rbd:test/vm-test 10G
    # raw data format is the only format option available when using RBD. Technically, you can use other formats supported by QEMU (such as qcow2 or vmdk), but doing so may bring additional overhead, and hot migration of virtual machines in cache on mode (see below) will lead to volume insecurity.
    
  • Changing image size with QEMU

    You can resize the block device through QEMU. You must specify rbd, storage pool name, image name to adjust, and image size

    qemu-img resize rbd:{pool-name}/{image-name} {size}
    # for example
    qemu-img resize rbd:test/vm-test 12G
    
  • Retrieving image information with QEMU

    You can use QEMU to retrieve block device image information. You must specify rbd, storage pool name, and image name

    qemu-img info rbd:{pool-name}/{image-name}
    # for example
    qemu-img info rbd:test/vm-test
    
  • Running QEMU via RBD

    QEMU can transfer the block device on a host to the client, but since QEMU 0.15, there is no need to map the image to the block device on the host. QEMU can now directly access the image as a virtual block device through librbd. This performance is better because it avoids additional context switching and can exemplify the benefits of turning on RBD caching

    You can use QEMU img to convert the existing virtual machine image into Ceph block device image

    qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze
    

    To boot the virtual machine from that image, execute

    qemu-system-x86_64 -m 1024 -drive format=raw,file=rbd:test/vm-test
    

32. Use Ceph RBD through libvirt

Libvirt library is a virtual machine abstraction layer between hypervisor and software application. Through libvirt, developers and system administrators only need to pay attention to a general management framework, general API and general shell interface (virsh) of these managers, including:

  • QEMU/KVM
  • XEN
  • LXC
  • VirtualBox
  • wait

Ceph block devices support QEMU/KVM, so you can use Ceph block devices through software that can interact with libvirt. The following stack diagram explains how libvirt and QEMU use Ceph block devices through librbd

To create a virtual machine using Ceph block devices, refer to the steps below. In the exemplary embodiment, we use test as the storage pool name and client Libvirt as the user name and new libvirt image as the image name. You can name whatever you want, but please make sure to replace the corresponding name with your own name in the subsequent process.

1. Configure Ceph

To configure Ceph for libvirt, perform the following steps

# 1. Create - storage pool (or default). In this example, test GW is used as the storage pool name, and 128 homing groups are set
ceph osd pool create test-gw 128 128
# Verify that the storage pool exists
ceph osd lspools

# 2. Create -- Ceph user this example uses client Libvirt, and the permission is limited to test GW
ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=test-gw'
# Verify that the name exists
ceph auth list
# Note: libvirt will use libvirt as ID instead of client when accessing Ceph libvirt .  For a detailed explanation of the difference between ID and name, please refer to user management - user and user management - command line interface.

# 3. Create an image in the RBD storage pool with QEMU. In this example, the image name is test GW img and the storage pool is test GW
qemu-img create -f rbd rbd:test/vm-test-1 10G

# Verify that the image exists
rbd -p test ls
  • Prepare Virtual Machine Manager

    You can use libvirt even without a VM Manager, but it's easier to create a domain with virt manager

    • Install Virtual Machine Manager

      yum install qemu-kvm libvirt virt-manager libguestfs-tools virt-install.noarch -y
      
    • Download iso image

    • Start Virtual Machine Manager

      virt-manager
      

33. CEPH file system

Ceph file system (Ceph FS) is a POSIX compatible file system that uses Ceph storage clusters to store data. Ceph file system uses the same Ceph storage cluster system as Ceph block devices, Ceph object storage that provides both S3 and swift APIs, or native libraries

The Ceph file system requires at least one file in the Ceph storage cluster Ceph metadata server.

[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-r659rjxv-16467869989988) (C: \ users \ Gaowen \ appdata \ roaming \ typora user images \ image-20211108140625151. PNG)]

34. Add / delete metadata server

Using CEPH deploy to add and remove metadata servers is very simple. One or more metadata servers can be added or removed with one command

  • Add a metadata server

    After deploying the monitor and OSD, you can also deploy the metadata server

ceph-deploy mds create {host-name}[:{daemon-name}] [{host-name}[:{daemon-name}] ...]
# for example
ceph-deploy --overwrite-conf mds create cephadmin ceph01 ceph02 ceph03 ceph04
  • Create CEPH file system

A Ceph file system requires at least two RADOS storage pools, one for data and one for metadata. When configuring these storage pools, consider:

  • Set a high replica level for the metadata storage pool because any data loss from this storage pool will invalidate the entire file system
  • Allocate low latency storage (like SSD) to the metadata storage pool because it will directly affect the operation latency of the client

For storage pool management, refer to Storage pool . For example, to create two storage pools for a file system with default settings, you can use the following command:

ceph osd pool create cephfs_data <pg_num>
ceph osd pool create cephfs_metadata <pg_num>
# for example
ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 128

After creating the storage pool, you can use the fs new command to create the file system:

ceph fs new <fs_name> <metadata> <data>
# for example
ceph fs new cephfs cephfs_metadata cephfs_data
ceph fs ls

After the file system is created, the MDS server can reach the active state, for example, in a single MDS system

ceph mds stat
e5: 1/1/1 up {0=a=up:active}

After the file system is built and MDS is active, you can mount the file system

  • Mount CEPH file system with kernel driver

To mount the Ceph file system, if you know the IP address of the monitor, you can use the mount command or mount Ceph tool to automatically resolve the monitor IP address. for example

mkdir /mnt/mycephfs
mount -t ceph 192.168.3.189:6789:/ /mnt/mycephfs

To mount cephx enabled Ceph file system, you must specify user name and key

sudo mount -t ceph 192.168.3.189:6789:/ /mnt/mycephfs -o name=admin,secret=AQD6vHhhQUDvJRAAX1BL9kwEX0qtjsFDW1wSMA==
  • User space mount CEPH file system

Ceph v0.55 and later versions enable cephx authentication by default. Before mounting a CEPH file system from user space (FUSE), ensure that the client host has a copy of the CEPH configuration and a key ring capable of CEPH metadata server

1. On the client host, copy the Ceph configuration file on the monitor host to the / etc/ceph / directory

mkdir -p /etc/ceph
scp {user}@{server-machine}:/etc/ceph/ceph.conf /etc/ceph/ceph.conf
chmod -R 644 /etc/ceph

To mount the Ceph file system as a user space file system, use the Ceph fuse command, for example

mkdir /home/gw/cephfs && \
yum install ceph-fuse -y && \
ceph-fuse -m 192.168.3.189:6789 /home/gw/cephfs
mds status

View mds status

ceph mds stat