QEMU+OCFS2: SAN storage file system using OCFS2 as virtual machine disk file

Posted by phatgreenbuds on Mon, 13 Dec 2021 12:43:12 +0100

This paper introduces OCFS2 shared cluster file system, how to configure it and how to expand it online.

What is OCFS2 file system?

OCFS2 is the abbreviation of Oracle Cluster File System Version 2. It is a shared disk file system developed internally by Oracle. It was open source in 2011 and uses GNU GPL protocol.

What is a shared disk file system? Let's explain the following by explaining the comparison of the three concepts:

  • Disk file system

This is the most common file system, which is built on the local disk (Block Storage). Through the disk file system, the contents on the disk are organized in the form of file directory, which makes it convenient for users to effectively use the storage space on the disk. Examples of disk file system include ext4, xfs, etc.

  • Shared file system

The shared file system accesses the file system mounted on the remote server through the service program running on the remote server. Examples are NFS (Network File System) and Samba (CIFS).

  • Shared disk file system

Shared disk file system, also known as Cluster File System, is a file system specially built on the network shared disk. The network shared disk passes through SAN (Storage Area Network) is accessed by multiple hosts. Compared with disk file system, shared disk file system not only solves the problem of effective management of disk space, but also solves the problem of concurrent modification of file system accessed by multiple hosts at the same time. Therefore, distributed locking mechanism is a common mechanism of shared disk file system.

From the usage scenario, the differences between the three file systems are obvious: the disk file system directly accesses the local disk, the shared file system needs to access the file system mounted on the server through the shared file service, and the shared disk file system directly accesses the shared disk.

Therefore, in the scenario of network sharing, you can directly access the shared storage device by accessing SAN storage through the shared disk file system. Short access path, high efficiency, and can solve the problem of multi host concurrent access to shared storage.

QEMU uses shared SAN storage via OCFS2

QEMU uses shared SAN storage in many ways. The common scheme is to use the Management API of SAN storage when creating a new virtual machine disk. After allocating the volume (LUN), mount the volume directly to QEMU virtual machine for use. The advantage of this scheme is that QEMU virtual machine directly accesses the LUN, with low loss and good performance. The disadvantage is that it needs to use the specific API of storage device and bind with the device, which is not universal enough.

This paper introduces a large capacity SAN storage volume as the storage of QEMU virtual machine virtual disk files through OCFS2 shared disk file system, so as to achieve the purpose of QEMU using shared storage.

OCFS2 file system configuration

  • Prepare environment

This step installs and configures the software

  • Download and install the rpm package of OCFS2 tools (also relying on net tools)

$ wget http://public-yum.oracle.com/public-yum-ol7.repo -O /etc/yum.repos.d/public-yum-ol7.repo
$ rpm --import http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol7
$ yum install yum-plugin-downloadonly -y
$ mkdir /tmp/ocfs2 && cd /tmp/ocfs2/
$ yum install --downloadonly --downloaddir=/tmp/ocfs2/ ocfs2-tools net-tools -y

See the official document for specific operation steps:
https://docs.oracle.com/cd/E5..., Chapter 23 Oracle Cluster File System Version 2

  • Install the Cloudpods kernel and compile the kernel module of ocfs2 file system

Because OCFS2 has few usage scenarios, the kernel module of OCFS2 will not be enabled in the kernel of common distributions. We provide a precompiled kernel installation package with OCFS2 enabled:

$ yum install -y yum-utils
# Add yunion Cloudpods rpm source
$ yum-config-manager --add-repo https://iso.yunion.cn/yumrepo-3.6/yunion.repo
$ yum install -y kernel-3.10.0-1062.4.3.el7.yn20191203

At the same time, write the configuration file to / etc / modules load.exe during deployment d/ocfs2. Conf to ensure that the OCFS2 module of the kernel is loaded automatically

# Load ocfs2.ko at boot
ocfs2

After installing the kernel, you need to restart to take effect. After restarting, check that the new kernel has taken effect

$ uname -r
3.10.0-1062.4.3.el7.yn20191203.x86_64
  • OCFS2 profile

OCFS2 configuration is simple. You only need to configure the same configuration file on each node to mount OCFS2 and declare the member node\
The following is a sample configuration file:

$ cat /etc/ocfs2/cluster.conf 
cluster:
        node_count = 3            <== Number of cluster nodes
        name = ocfs2              <== Cluster name

node:
        ip_port = 7777
        ip_address = 192.168.7.10
        number = 0                <== Node number
        name = client01           <== Node name
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 192.168.7.11
        number = 1
        name = client02
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 192.168.7.12
        number = 2
        name = client03


        cluster = ocfs2
  • Initialize the configuration of ocfs2

$ o2cb.init configure      First choice yes,The cluster name is filled in the configuration file above. The default is ocfs2
  • Ensure that the o2cb ocfs2 service is started and set to start automatically

systemctl enable o2cb ocfs2

At this point, the software and configuration of OCFS2 are completed. The next step is to format the disk and mount the OCFS2 file system

  • Mount OCFS2 file system

In this step, OCFS2 is used to format the network shared disk and mount it on each host.
Before that, you may need to configure the multi-path of SAN storage (the details are omitted here due to the wording). After that, use the parted partition, format it into ocfs2 (only when the partition of one machine is formatted, the formatted partition can be seen by the partprobe of other machines) and mount it to multiple machines.
The following command is executed on the first node:

# Viewing multipath disks
$ multipath -l

Use mkfs OCFS2 format partition

$ parted /dev/dm-0
$ mkfs.ocfs2 /dev/dm-1
$ mount /dev/dm-1 /data

Mount the persistent disk to / etc/fstab

# /etc/fstab
/dev/dm-1  /opt/cloud/workspace/disks  ocfs2     _netdev,defaults  0 0

On other nodes, you only need to execute partprobe to detect partition changes and mount partitions. You should also modify / etc/fstab to mount persistent partitions.

Cloudpods uses OCFS2 file system

In Cloudpods, the shared file system mounted through OCFS2 can be managed as a GPFS type shared storage type. Use the following steps to register the shared storage of OCFS2 with Cloudpods and store the virtual disk files for the virtual machine.

  • Register OCFS2 block storage

In the [storage - block storage] interface, create a GPFS type shared storage.

After the storage record is created successfully, select the "manage host" menu button of the storage, select "associate host" in the list of host hosts associated with the storage, and register all the host nodes attached to the storage, so that the Cloudpods platform can know which host directory the shared storage is attached to.

  • Creating host virtual disks using OCFS2

After the above configuration is completed, when creating a new virtual machine, you can select the new OCFS2 storage as the storage of the virtual disk.

Expansion of OCFS2 file system

First, you need to mount OCFS2 only on the first node and uninstall all other nodes. The following operations are performed only on the first node.

First, the physical volume needs to be expanded in SAN storage. This step is operated on San devices, which will not be described in detail here.

Secondly, for multipath devices, you need to rescan each disk under the device to make the operating system aware of the expansion of the device.

# First, execute multipath -l to view the underlying disk devices of the multipath device
$ multipath -ll
Jun 24 15:09:16 | ignoring extra data starting with '}' on line 16 of /etc/multipath.conf
Jun 24 15:09:16 | sdi: alua not supported
Jun 24 15:09:16 | sdb: alua not supported
Jun 24 15:09:16 | sdc: alua not supported
Jun 24 15:09:16 | sdd: alua not supported
Jun 24 15:09:16 | sde: alua not supported
Jun 24 15:09:16 | sdf: alua not supported
Jun 24 15:09:16 | sdg: alua not supported
Jun 24 15:09:16 | sdh: alua not supported
Jun 24 15:09:16 | sdq: alua not supported
Jun 24 15:09:16 | sdj: alua not supported
Jun 24 15:09:16 | sdm: alua not supported
Jun 24 15:09:16 | sdn: alua not supported
Jun 24 15:09:16 | sdo: alua not supported
Jun 24 15:09:16 | sdp: alua not supported
Jun 24 15:09:16 | sdk: alua not supported
Jun 24 15:09:16 | sdl: alua not supported
36488eef100d71ed122ace06c00000001 dm-0 HUAWEI  ,XSG1
size=15T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=-1 status=active
  |- 1:0:7:1 sdi 8:128 active ready running
  |- 1:0:0:1 sdb 8:16  active ready running
  |- 1:0:1:1 sdc 8:32  active ready running
  |- 1:0:2:1 sdd 8:48  active ready running
  |- 1:0:3:1 sde 8:64  active ready running
  |- 1:0:4:1 sdf 8:80  active ready running
  |- 1:0:5:1 sdg 8:96  active ready running
  |- 1:0:6:1 sdh 8:112 active ready running
  |- 2:0:7:1 sdq 65:0  active ready running
  |- 2:0:3:1 sdj 8:144 active ready running
  |- 2:0:6:1 sdm 8:192 active ready running
  |- 2:0:0:1 sdn 8:208 active ready running
  |- 2:0:2:1 sdo 8:224 active ready running
  |- 2:0:5:1 sdp 8:240 active ready running
  |- 2:0:1:1 sdk 8:160 active ready running
  `- 2:0:4:1 sdl 8:176 active ready running

For each device:

echo 1 > /sys/class/block/sdi/device/rescan

Then execute the following command to make the operating system aware of the capacity change of the multipath device:

$ multipathd -k
# multipathd> resize map 36488eef100d71ed122ace06c00000001
# ok
# multipathd> exit

After the above steps, the operating system has sensed the capacity change of the device. At this time, it is necessary to use parted to expand the partition table. The method is to use parted to delete the partition and then rebuild the partition

$ parted /dev/dm-0
(parted) unit s
(parted) p
Model: Linux device-mapper (multipath) (dm)
Disk /dev/dm-0: 32212254720s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
 
Number  Start  End           Size          File system  Name   Flags
 1      2048s  10737416191s  10737414144s               disks
 
(parted) rm 1
(parted) mkpart
Partition name?  []?
File system type?  [ext2]?
Start? 2048
End? 100%
device-mapper: create ioctl on 36488eef100d71ed122ace06c00000001p1 part1-mpath-36488eef100d71ed122ace06c00000001 failed: Device or resource busy
(parted) p
Model: Linux device-mapper (multipath) (dm)
Disk /dev/dm-0: 32212254720s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
 
Number  Start  End           Size          File system  Name  Flags
 1      2048s  32212252671s  32212250624s
 
(parted) quit

After expanding the partition table, use tunefs OCFS2 expansion file system

# Capacity expansion file system
# tunefs.ocfs2 -S /dev/dm-1

After the above steps, the file system expansion is completed. Finally, execute partprobe on other nodes to sense the capacity change of the device, and then re mount the partition.

Topics: cloud computing