Glusterfs version 4.1 selection and deployment

Posted by xnor82 on Tue, 16 Nov 2021 11:09:49 +0100

1 preface related

1.1 glusterfs advantages

1. Metadata free design GlusterFS design does not have centralized or distributed metadata, but instead uses elastic hash algorithm. Any server and client in the cluster can use hash algorithm, path and file name to calculate, locate the data and perform read-write access.

Conclusion:

  • The advantage of metadata free design is that it not only greatly improves the scalability, but also improves the performance and reliability of the system.
  • If you need to list files or directories, the performance will be greatly reduced, because to list files or directories, you need to query the node and aggregate the information in the node.
  • However, given a certain file name, finding the file location will be very fast.

2. Deployment between servers GlusterFS cluster servers are peer-to-peer, and each node server has the configuration information of the cluster. All information can be queried locally. The information update of each node will be announced to other nodes to ensure the consistency of node information. However, when the cluster scale is large, the information synchronization efficiency will decrease and the inconsistency probability will increase.

3. Client access First, the program reads and writes data by accessing the mount point. For users and programs, the cluster file system is transparent, and users and programs can't feel whether the file system is on the local or remote server at all.

The read-write operation will be handed over to the VFS(Virtual File System) for processing. The VFS will hand over the request to the FUSE kernel module, and FUSE will hand over the data to the GlusterFS Client through the device / dev/fuse. Finally, it is calculated by the GlusterFS Client, and finally the request or data is sent to the GlusterFS Servers through the network.

For details on the principle of glusterfs, please refer to the following articles glusterfs architecture and principle Understand GlusterFS from another perspective and analyze the shortcomings of GlusterFS glusterfs Chinese materials recommend Dr. Liu Aigui's GlusterFS original resource series

1.2 version selection

Most articles on the Internet are deployed based on version 3.x, but version 3.x has disappeared from Alibaba cloud's epel source in CentOS 7, and the lowest version is version 4.0

[root@kaifa-supply ~]# yum search  centos-release-gluster
......
centos-release-gluster-legacy.noarch : Disable unmaintained Gluster repositories from the CentOS Storage SIG
centos-release-gluster40.x86_64 : Gluster 4.0 (Short Term Stable) packages from the CentOS Storage SIG repository
centos-release-gluster41.noarch : Gluster 4.1 (Long Term Stable) packages from the CentOS Storage SIG repository
centos-release-gluster5.noarch : Gluster 5 packages from the CentOS Storage SIG repository
centos-release-gluster6.noarch : Gluster 6 packages from the CentOS Storage SIG repository
centos-release-gluster7.noarch : Gluster 7 packages from the CentOS Storage SIG repository

And it is clear that version 4.0 is also a short-term support board, so we choose to update some version 4.1 to deploy

1.3 volume knowledge

For details of storage types, see: Setting Up Volumes - Gluster Docs

In the old version, there were seven volume types In the new version, there are five volume types The common volume types are:

  • Distributed (distributed volumes are stored according to hash results, no backup, and can be read directly)
  • Replicated (the replicated volume is similar to RAID1 and can be read directly)
  • Distributed Replicated (Distributed Replicated volume is similar to RAID10 and can be read directly)

The different volume types are:

  • In the old version, there was a stripe (striped volume), which was stored in blocks and could not be read directly
  • And distributed striped volume based on striped volume combination, replication striped volume, distributed replication striped volume
  • In the new version, stripe is abandoned and dispersed (error correction volume) based on EC error correction code is enabled
  • And the combined distributed distributed error correction volume

However, we don't need to consider so much, because we usually use distributed replication volumes. The advantages are as follows

  • Distributed storage, high efficiency
  • Data is backed up based on replicated volumes
  • Documents can be read directly
  • All versions support

Of course, dispersed (error correction volume similar to RAID5) has been updated from 3.6 to 7.x. it takes gluster a lot of effort. You can see this article for more information

2 service deployment

reference resources Official: Rapid Deployment Guide

2.1 service planning

operating system

IP

host name

Additional hard disk

centos 7.4

10.0.0.101

gf-node1

sdb:5G

centos 7.4

10.0.0.102

gf-node2

sdb:5G

centos 7.4

10.0.0.103

gf-node3

sdb:5G

2.2 environmental preparation

All 5 servers do the same

# Close the firewall, selinux, etc. do not explain
# Complete hosts resolution
cat >>/etc/hosts <<EOF
10.0.0.101  gf-node01
10.0.0.102  gf-node02
10.0.0.103  gf-node03
EOF

# Install 4.1 Yum source and programs
yum install -y centos-release-gluster41
yum install -y glusterfs glusterfs-libs glusterfs-server

# Start the service and start
systemctl start  glusterd.service
systemctl enable glusterd.service
systemctl status glusterd.service

2.3 format mount disk

A total of three directories are created, brick1 is used to mount sdb, and the other two directories are used as local folders

Format disk

# View disk list
[root@gf-node01 ~]# fdisk -l
Disk /dev/sdb: 5368 MB, 5368709120 bytes, 10485760 sectors
Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors

# Format disk directly without partition
mkfs.xfs  -i size=512 /dev/sdb

Mount disk

# Create directory and mount
mkdir -p /data/brick{1..3}
echo '/dev/sdb /data/brick1 xfs defaults 0 0' >> /etc/fstab
mount -a && mount

# View results
[root@gf-node01 ~]# df -h|grep sd
/dev/sda2        48G  1.7G   47G   4% /
/dev/sdb        5.0G   33M  5.0G   1% /data/brick1

2.4 establishing host trust pool

On any host, execute the following commands to establish a trust pool. The establishment does not require an account and password, because the default is to consider the deployment environment as a secure and trusted environment

# Establish trusted pool
gluster peer probe gf-node02
gluster peer probe gf-node03

# View status
[root@gf-node01 ~]# gluster peer status
......
[root@gf-node01 ~]# gluster pool list
UUID					Hostname 	State
4068e219-5141-43a7-81ba-8294536fb054	gf-node02	Connected 
e3faffeb-4b16-45e2-9ff3-1922791e05eb	gf-node03	Connected 
3e6a4567-eda7-4001-a5d5-afaa7e08ed93	localhost	Connected

Note: once the trust pool is established, only nodes in the trust pool can be added to the new server trust pool

3 using distributed replication volumes

Only test the distributed replication volume of GlusterFs. For other volume types, please refer to Baidu for self-test if necessary

3.1 distributed replication volume creation instructions

  1. Command gluster volume create gv1 replica 3 DIR1 DIR2 DIR3
  2. The number of replicas cannot be less than 3. Otherwise, replica 3 will prevent creation because of possible brain crack, and will prompt replica 2 volumes are prone to split brain. Use arbiter or replica 3 to avoid this
  3. If the number of replicas is equal to the number of replicas (3), it is a distributed volume, and if it is a multiple, it is a distributed replication volume
  4. A group of 3 replicas is created as a replication volume, and then multiple replication volumes are formed into a distribution volume
  5. The replica order of distributed replication volumes is related to the create command and will not be arranged randomly
  6. If not all replica volumes are independent hard disks, you need to add the force parameter, Otherwise, an error volume create: GV1: failed: the brick gf-node01: / data / brick2 / GV1 is being created in the root partition. It is recommended that you don't use the system's root partition for storage backend. Or use 'force' at the end of the command if you want to override this behavior

3.2 distributed replication volume creation

# Create distributed replication volumes
gluster volume create gv1 replica 3 \
  gf-node01:/data/brick1/gv1 \
  gf-node01:/data/brick2/gv1 \
  gf-node02:/data/brick1/gv1 \
  gf-node02:/data/brick2/gv1 \
  gf-node03:/data/brick1/gv1 \
  gf-node03:/data/brick2/gv1 \
  force

# Boot volume 
gluster volume start gv1
  
# Viewing the status of a volume
[root@gf-node01 ~]# gluster volume info 
Volume Name: gv1
Type: Distributed-Replicate
Volume ID: e1e004fa-5588-4629-b7ff-048c4e17de91
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: gf-node01:/data/brick1/gv1
Brick2: gf-node01:/data/brick2/gv1
Brick3: gf-node02:/data/brick1/gv1
Brick4: gf-node02:/data/brick2/gv1
Brick5: gf-node03:/data/brick1/gv1
Brick6: gf-node03:/data/brick2/gv1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

3.3 use of distributed replication volumes

# Mount volume
[root@gf-node01 ~]# mount -t glusterfs gf-node01:/gv1 /mnt

# Write data test
[root@gf-node01 ~]# touch /mnt/test{1..9}
[root@gf-node01 ~]# ls /mnt/test{1..9}
/mnt/test1  /mnt/test2  /mnt/test3  /mnt/test4  /mnt/test5  /mnt/test6  /mnt/test7  /mnt/test8  /mnt/test9

# Validation test data
[root@gf-node01 ~]# ls /data/brick*/*
/data/brick1/gv1:
test1  test2  test4  test5  test8  test9
/data/brick2/gv1:
test1  test2  test4  test5  test8  test9

[root@gf-node02 ~]# ls /data/brick*/*
/data/brick1/gv1:
test1  test2  test4  test5  test8  test9
/data/brick2/gv1:

[root@gf-node03 ~]# ls /data/brick*/*
/data/brick1/gv1:
test3  test6  test7
/data/brick2/gv1:
test3  test6  test7

Conclusion: it can be seen that the first three are a replica set and the last three are a replica set, so the order of volumes is very important when creating volumes