1 preface related
1.1 glusterfs advantages
1. Metadata free design GlusterFS design does not have centralized or distributed metadata, but instead uses elastic hash algorithm. Any server and client in the cluster can use hash algorithm, path and file name to calculate, locate the data and perform read-write access.
Conclusion:
- The advantage of metadata free design is that it not only greatly improves the scalability, but also improves the performance and reliability of the system.
- If you need to list files or directories, the performance will be greatly reduced, because to list files or directories, you need to query the node and aggregate the information in the node.
- However, given a certain file name, finding the file location will be very fast.
2. Deployment between servers GlusterFS cluster servers are peer-to-peer, and each node server has the configuration information of the cluster. All information can be queried locally. The information update of each node will be announced to other nodes to ensure the consistency of node information. However, when the cluster scale is large, the information synchronization efficiency will decrease and the inconsistency probability will increase.
3. Client access First, the program reads and writes data by accessing the mount point. For users and programs, the cluster file system is transparent, and users and programs can't feel whether the file system is on the local or remote server at all.
The read-write operation will be handed over to the VFS(Virtual File System) for processing. The VFS will hand over the request to the FUSE kernel module, and FUSE will hand over the data to the GlusterFS Client through the device / dev/fuse. Finally, it is calculated by the GlusterFS Client, and finally the request or data is sent to the GlusterFS Servers through the network.
For details on the principle of glusterfs, please refer to the following articles glusterfs architecture and principle Understand GlusterFS from another perspective and analyze the shortcomings of GlusterFS glusterfs Chinese materials recommend Dr. Liu Aigui's GlusterFS original resource series
1.2 version selection
Most articles on the Internet are deployed based on version 3.x, but version 3.x has disappeared from Alibaba cloud's epel source in CentOS 7, and the lowest version is version 4.0
[root@kaifa-supply ~]# yum search centos-release-gluster ...... centos-release-gluster-legacy.noarch : Disable unmaintained Gluster repositories from the CentOS Storage SIG centos-release-gluster40.x86_64 : Gluster 4.0 (Short Term Stable) packages from the CentOS Storage SIG repository centos-release-gluster41.noarch : Gluster 4.1 (Long Term Stable) packages from the CentOS Storage SIG repository centos-release-gluster5.noarch : Gluster 5 packages from the CentOS Storage SIG repository centos-release-gluster6.noarch : Gluster 6 packages from the CentOS Storage SIG repository centos-release-gluster7.noarch : Gluster 7 packages from the CentOS Storage SIG repository
And it is clear that version 4.0 is also a short-term support board, so we choose to update some version 4.1 to deploy
1.3 volume knowledge
For details of storage types, see: Setting Up Volumes - Gluster Docs
In the old version, there were seven volume types In the new version, there are five volume types The common volume types are:
- Distributed (distributed volumes are stored according to hash results, no backup, and can be read directly)
- Replicated (the replicated volume is similar to RAID1 and can be read directly)
- Distributed Replicated (Distributed Replicated volume is similar to RAID10 and can be read directly)
The different volume types are:
- In the old version, there was a stripe (striped volume), which was stored in blocks and could not be read directly
- And distributed striped volume based on striped volume combination, replication striped volume, distributed replication striped volume
- In the new version, stripe is abandoned and dispersed (error correction volume) based on EC error correction code is enabled
- And the combined distributed distributed error correction volume
However, we don't need to consider so much, because we usually use distributed replication volumes. The advantages are as follows
- Distributed storage, high efficiency
- Data is backed up based on replicated volumes
- Documents can be read directly
- All versions support
Of course, dispersed (error correction volume similar to RAID5) has been updated from 3.6 to 7.x. it takes gluster a lot of effort. You can see this article for more information
2 service deployment
reference resources Official: Rapid Deployment Guide
2.1 service planning
operating system | IP | host name | Additional hard disk |
---|---|---|---|
centos 7.4 | 10.0.0.101 | gf-node1 | sdb:5G |
centos 7.4 | 10.0.0.102 | gf-node2 | sdb:5G |
centos 7.4 | 10.0.0.103 | gf-node3 | sdb:5G |
2.2 environmental preparation
All 5 servers do the same
# Close the firewall, selinux, etc. do not explain # Complete hosts resolution cat >>/etc/hosts <<EOF 10.0.0.101 gf-node01 10.0.0.102 gf-node02 10.0.0.103 gf-node03 EOF # Install 4.1 Yum source and programs yum install -y centos-release-gluster41 yum install -y glusterfs glusterfs-libs glusterfs-server # Start the service and start systemctl start glusterd.service systemctl enable glusterd.service systemctl status glusterd.service
2.3 format mount disk
A total of three directories are created, brick1 is used to mount sdb, and the other two directories are used as local folders
Format disk
# View disk list [root@gf-node01 ~]# fdisk -l Disk /dev/sdb: 5368 MB, 5368709120 bytes, 10485760 sectors Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors # Format disk directly without partition mkfs.xfs -i size=512 /dev/sdb
Mount disk
# Create directory and mount mkdir -p /data/brick{1..3} echo '/dev/sdb /data/brick1 xfs defaults 0 0' >> /etc/fstab mount -a && mount # View results [root@gf-node01 ~]# df -h|grep sd /dev/sda2 48G 1.7G 47G 4% / /dev/sdb 5.0G 33M 5.0G 1% /data/brick1
2.4 establishing host trust pool
On any host, execute the following commands to establish a trust pool. The establishment does not require an account and password, because the default is to consider the deployment environment as a secure and trusted environment
# Establish trusted pool gluster peer probe gf-node02 gluster peer probe gf-node03 # View status [root@gf-node01 ~]# gluster peer status ...... [root@gf-node01 ~]# gluster pool list UUID Hostname State 4068e219-5141-43a7-81ba-8294536fb054 gf-node02 Connected e3faffeb-4b16-45e2-9ff3-1922791e05eb gf-node03 Connected 3e6a4567-eda7-4001-a5d5-afaa7e08ed93 localhost Connected
Note: once the trust pool is established, only nodes in the trust pool can be added to the new server trust pool
3 using distributed replication volumes
Only test the distributed replication volume of GlusterFs. For other volume types, please refer to Baidu for self-test if necessary
3.1 distributed replication volume creation instructions
- Command gluster volume create gv1 replica 3 DIR1 DIR2 DIR3
- The number of replicas cannot be less than 3. Otherwise, replica 3 will prevent creation because of possible brain crack, and will prompt replica 2 volumes are prone to split brain. Use arbiter or replica 3 to avoid this
- If the number of replicas is equal to the number of replicas (3), it is a distributed volume, and if it is a multiple, it is a distributed replication volume
- A group of 3 replicas is created as a replication volume, and then multiple replication volumes are formed into a distribution volume
- The replica order of distributed replication volumes is related to the create command and will not be arranged randomly
- If not all replica volumes are independent hard disks, you need to add the force parameter, Otherwise, an error volume create: GV1: failed: the brick gf-node01: / data / brick2 / GV1 is being created in the root partition. It is recommended that you don't use the system's root partition for storage backend. Or use 'force' at the end of the command if you want to override this behavior
3.2 distributed replication volume creation
# Create distributed replication volumes gluster volume create gv1 replica 3 \ gf-node01:/data/brick1/gv1 \ gf-node01:/data/brick2/gv1 \ gf-node02:/data/brick1/gv1 \ gf-node02:/data/brick2/gv1 \ gf-node03:/data/brick1/gv1 \ gf-node03:/data/brick2/gv1 \ force # Boot volume gluster volume start gv1 # Viewing the status of a volume [root@gf-node01 ~]# gluster volume info Volume Name: gv1 Type: Distributed-Replicate Volume ID: e1e004fa-5588-4629-b7ff-048c4e17de91 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: gf-node01:/data/brick1/gv1 Brick2: gf-node01:/data/brick2/gv1 Brick3: gf-node02:/data/brick1/gv1 Brick4: gf-node02:/data/brick2/gv1 Brick5: gf-node03:/data/brick1/gv1 Brick6: gf-node03:/data/brick2/gv1 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off
3.3 use of distributed replication volumes
# Mount volume [root@gf-node01 ~]# mount -t glusterfs gf-node01:/gv1 /mnt # Write data test [root@gf-node01 ~]# touch /mnt/test{1..9} [root@gf-node01 ~]# ls /mnt/test{1..9} /mnt/test1 /mnt/test2 /mnt/test3 /mnt/test4 /mnt/test5 /mnt/test6 /mnt/test7 /mnt/test8 /mnt/test9 # Validation test data [root@gf-node01 ~]# ls /data/brick*/* /data/brick1/gv1: test1 test2 test4 test5 test8 test9 /data/brick2/gv1: test1 test2 test4 test5 test8 test9 [root@gf-node02 ~]# ls /data/brick*/* /data/brick1/gv1: test1 test2 test4 test5 test8 test9 /data/brick2/gv1: [root@gf-node03 ~]# ls /data/brick*/* /data/brick1/gv1: test3 test6 test7 /data/brick2/gv1: test3 test6 test7
Conclusion: it can be seen that the first three are a replica set and the last three are a replica set, so the order of volumes is very important when creating volumes