Ceph test tool summary

Posted by Bhaal on Thu, 21 Oct 2021 02:28:45 +0200


Fio was originally written to avoid the trouble of writing special test case programs when testing a specific workload for performance reasons or finding / reproducing errors. Writing such a test application can be cumbersome, especially if you must do so often. Therefore, I need a tool that can simulate a given I / O workload without writing custom test cases again and again.

Command format

fio [options] [job options] <job files>

Parameter introduction

Parameter nameParameter description
-filenameFile name, usually block device path / dev/sdb. Use filename=/dev/sda:/dev/sdb to represent different filenames of multiple job s.
-directWhether to use directIO. The default value is false and direct=1 is enabled.
-iodepthQueue depth.
-threadNumber of threads
-rw/-readwriteReading and writing mode. Contains read, write, trim (Linux block devices and SCSI character devices only), randread, randrite, randtrim (Linux block devices and SCSI character devices only), rw/readwrite, randrw, and trimwrite. The read / write percentage is 50 / 50 by default.
-rwmixreadRead percentage. The default value is 50
-rwmixwriteWrite percentage. If both rwmixread and rwmixwrite are given and their values do not add up to 100%, the latter will be used to override the former. If fio is required to limit reads or writes to a certain rate, this may interfere with a given rate setting. If so, the distribution may be uneven. Default: 50.
-bsSingle io block file size. ceph is generally set to 4k.
-bsrangeData block size range, the default unit is kb.
sizeAmount of data read and written by each thread
-numjobsNumber of clones created for this job.
-runtimeTotal running time. The default unit is seconds
-ramp_timeIf set, fio will run the specified workload for this amount of time before logging any performance numbers. Useful for letting performance settle before logging results, thus minimizing the runtime required for stable results. Note that the ramp_time is considered lead in time for a job, thus it will increase the total runtime if a special timeout or runtime is specified. When the unit is omitted, the value is given in seconds
-nameJob name, which also has a special purpose to indicate the start of a new job. If fio -name=job1 -name=job2, two tasks are created and the parameters before mn of - name=job1 are shared- After name is the unique parameter of job2 task.
-refill_buffersIf this option is given, fio will refill the I/O buffer on each commit. Of course, only if zero is not specified_ Buffers makes sense. It is not set by default, that is, the buffer is only filled during init, and the data in it is reused if possible, but if any verify, the buffer_ compress_ Percentage or DeDupe_ If percentage is enabled, refill_buffers will also be enabled automatically. Main function: clear the buffer to avoid hitting data from I/O cache.
-randrepeatMake the generated random data repeatable. The default value is true.
-invalidatePlease invalidate the buffer / page cache portion of this file before starting io. The default is true
-norandommapGenerally, when fio performs random IO, it will overwrite each block of the file. If this option is set, fio will only obtain a new random offset without querying the past history. This means that some blocks may not be read or written, and some blocks may have to be read / written many times. The two options are mutually exclusive with verify = and only multiple block sizes (bsrange =) are in use, because fio will only record the rewriting of complete blocks.
****The following are engine parameters, which must be written after the specified ioengine.
-clusternamerbd,rados parameter. ceph cluster name.
-rbdnamerbd parameter, RBD image name.
-poolrbd, rados parameter. Storage pool name. Required.
-clientnamerbd, rados parameter. Specify the user name (without 'client.' prefix) to access Ceph cluster. If clustername is specified, clientname should be the full type. id string. If there is no type. After the prefix is given, fio will add 'client'. default.
-busy_pollrbd, rados parameter. Poll store instead of waiting for completion. Usually this provides better throughput at cost of higher(up to 100%) CPU utilization.

test case

  1. The engine is libaio, and the kernel is used to mount the rbd device for testing

    #1. Create an image and specify the layering feature to avoid that the kernel does not support advanced features
    rbd create --size {megabytes} {pool-name}/{image-name} --image-feature layering
    #2. Notify kernal
    rbd map {pool-name}/{image-name}
    #3. Format block device
    mkfs.ext4 {block_name}
    #4. Mount equipment
    mkdir {file_path}
    mount {block_name} {file_path}
    #5. Check the equipment mounting
    #4k random write, iops
    fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randwrite -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest 
    #4k random read, iops
    fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randread -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest 
    #4k random reading and writing, 70% reading, iops
    fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randrw -rwmixread=70 -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest
    #1M sequential write, throughput
    fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=write -ioengine=libaio -bs=1M -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest
    #1M sequential read, throughput
    fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=read -ioengine=libaio -bs=1M -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest
  2. The engine is rados

    #4k random write, iops
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randwrite -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
    #4k random read, iops
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randread -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
    #64K sequential write, throughput
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randwrite -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
    #64k sequential read, throughput
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randread -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
  3. The engine is rbd

    #4k random write, iops
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randwrite -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
    #4k random read, iops
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randread -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
    #64K sequential write, throughput
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randwrite -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
    #64k sequential read, throughput
    fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randread -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128

Result analysis

[root@node-1 ~]# fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randwrite -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest 
mytest: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
Starting 8 threads
mytest: Laying out IO file (1 file / 200MiB)
Jobs: 1 (f=1): [_(7),w(1)][91.7%][r=0KiB/s,w=42.9MiB/s][r=0,w=10.0k IOPS][eta 00m:04s]               
mytest: (groupid=0, jobs=8): err= 0: pid=2178: Tue Sep  8 09:39:40 2020
  write: IOPS=9440, BW=36.9MiB/s (38.7MB/s)(1600MiB/43386msec) #Write time IOPS and bandwidth (BW) overview
    slat (usec): min=2, max=630832, avg=692.55, stdev=5663.73 #submission latency, "how long does it take for the disk to submit IO to the kernel for processing?"
    clat (nsec): min=1476, max=1769.0M, avg=23061496.85, stdev=51507870.00 #completion latency, "time when the kernel finishes executing IO"
     lat (usec): min=68, max=1769.0k, avg=23754.76, stdev=52699.12 #Total delay, main reference index
    clat percentiles (msec): #completion latency
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    8], 20.00th=[    9],
     | 30.00th=[   10], 40.00th=[   12], 50.00th=[   13], 60.00th=[   16],
     | 70.00th=[   25], 80.00th=[   31], 90.00th=[   32], 95.00th=[   34],
     | 99.00th=[  255], 99.50th=[  405], 99.90th=[  701], 99.95th=[  827],
     | 99.99th=[  995]
   bw (  KiB/s): min=   15, max=47324, per=13.75%, avg=5191.44, stdev=4918.35, samples=603
   iops        : min=    3, max=11831, avg=1297.69, stdev=1229.64, samples=603
  lat (usec)   : 2=0.01%, 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%
  lat (usec)   : 500=0.03%, 750=0.03%, 1000=0.02%
  lat (msec)   : 2=0.10%, 4=6.43%, 10=23.90%, 20=35.85%, 50=30.69%
  lat (msec)   : 100=0.93%, 250=0.97%, 500=0.70%, 750=0.25%, 1000=0.07%
  cpu          : usr=0.25%, sys=10.03%, ctx=357382, majf=0, minf=12
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,409600,0,0 short=0,0,0,0 dropped=0,0,0,0 #Number of IO issued
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=36.9MiB/s (38.7MB/s), 36.9MiB/s-36.9MiB/s (38.7MB/s-38.7MB/s), io=1600MiB (1678MB), run=43386-43386msec

Disk stats (read/write):
    dm-0: ios=0/410431, merge=0/0, ticks=0/1271401, in_queue=1271380, util=96.49%, aggrios=0/410272, aggrmerge=0/340, aggrticks=0/1212994, aggrin_queue=1212830, aggrutil=96.47%
  sda: ios=0/410272, merge=0/340, ticks=0/1212994, in_queue=1212830, util=96.47%

rados bench

The layer of rados is used to test the performance of pool storage.

Command format

rados bench -p <pool_name> <seconds> <write|seq|rand> [-b block_size] [-t concurrent_operations] [-k /.../ceph.client.admin.keyring] [-c /.../ceph.conf] [--no-cleanup] [--run-name run_name]

Parameter introduction

Parameter nameParameter description
-pTested pool
secondTest time in seconds
write|seq|randWrite | sequential read | random read
-bBlock size, 4M by default. Write only numbers without units. The default unit is k
-tNumber of concurrent, default: 16
-kSpecify ceph.client.admin.keyring
-cSpecify ceph.conf
–no-cleanupIt means that the data will not be deleted after writing. You can use Rados - P < pool at the end of the test_ Name > cleanup delete
–run-nameThe default is benchmark_last_metadata. For multi client test, this value must be set by yourself, otherwise it will cause multi client read failure

test case

#4k random write
rados bench -p rbd 60 write -b=4K -t=128 --no-cleanup

#4k sequential read
rados bench -p rbd 60 seq -b=4K -t=128

#4k random read
rados bench -p rbd 60 rand -b=4K -t=128

#Cleaning data
rados -p rbd cleanup

Result analysis

[root@node-1 ~]# rados bench -p rbd 10 write -b=4K -t=128
hints = 1
Maintaining 128 concurrent writes of 4096 bytes to objects of size 4096 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_node-1_2254
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0     128       128         0         0         0           -           0
    1     128      1308      1180   4.52467   4.60938  0.00464254   0.0860147
    2     128      2136      2008   3.86785   3.23438    0.239057    0.117895
    3     128      2724      2596   3.18055   2.29688    0.192872    0.138857
    4     128      3578      3450   3.21747   3.33594     0.12293    0.150549
    5     128      3994      3866   2.88573     1.625   0.0140784     0.14932
    6     128      4362      4234    2.6529    1.4375   0.0735093    0.183773
    7     128      4830      4702   2.53856   1.82812    0.608796    0.189726
    8     128      5476      5348   2.51127   2.52344    0.113523    0.195028
    9     128      6323      6195   2.58801   3.30859    0.493252    0.189299
Total time run:         10.1872
Total writes made:      6889
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     2.64156 #bandwidth
Stddev Bandwidth:       1.0252
Max bandwidth (MB/sec): 4.60938
Min bandwidth (MB/sec): 1.4375
Average IOPS:           676 #Average IOPS
Stddev IOPS:            262.452
Max IOPS:               1180
Min IOPS:               368
Average Latency(s):     0.189061
Stddev Latency(s):      0.249725
Max latency(s):         1.96704
Min latency(s):         0.00155043
Cleaning up (deleting benchmark objects)
Removed 6889 objects
Clean up completed and total clean up time :4.18956


Installation tutorial

#1. Installation dependency
yum -y install gcc gcc-c++ gcc-gfortran
#2. Create package directory
mkdir tools
#3. Install openmpi (in addition to openmpi, you can also use mpich, but the use is different from openmpi)
cd tools
curl -O https://download.open-mpi.org/release/open-mpi/v1.10/openmpi-1.10.7.tar.gz
./configure --prefix=/usr/local/openmpi/
make && make install
#4. Add environment variables
vim /root/.bashrc
	#Add the following
	export PATH=$PATH:/usr/local/openmpi/bin/:/usr/local/ior/bin/
	export LD_LIBRARY_PATH=/usr/local/openmpi/lib:${LD_LIBRARY_PATH}
	export MPI_CC=mpicc
source /root/.bashrc
#5. Install IOR
cd tools/
yum -y install git automake
git clone https://github.com/chaos/ior.git
cd ior
./configure --prefix=/usr/local/ior/
make && make install
#6. Install mdtest
cd tools/
mkdir mdtest && cd mdtest
wget https://nchc.dl.sourceforge.net/project/mdtest/mdtest%20latest/mdtest-1.9.3/mdtest-1.9.3.tgz
tar xf mdtest-1.9.3.tgz

Command format

mdtest [-b #] [-B] [-c] [-C] [-d testdir] [-D] [-e] [-E] [-f first] [-F]
               [-h] [-i iterations] [-I #] [-l last] [-L] [-n #] [-N #] [-p seconds]
               [-r] [-R[#]] [-s #] [-S] [-t] [-T] [-u] [-v] [-V #] [-w #] [-y]
               [-z #]

Parameter introduction

Parameter nameParameter description
-FCreate files only
-LCreate files / directories only at the subdirectory level of the directory tree
-zDirectory tree depth
-bDirectory tree branch
-i (capital i)Number of items per tree node
-nThe total number of files / directories created on the entire tree. Cannot be used with - I
-uSpecify a working directory for each work task
-dIndicate the directory where the test runs. You can test multiple directories "- d fullpath1@fullpath2@fullpath3”

test case

Mount file device

#1. Install CEPH fuse on the client node
yum install ceph-fuse
#2. The server creates a file storage pool
ceph osd pool create cephfs_data
ceph osd pool create cephfs_metadata
ceph fs new cephfs cephfs_metadata cephfs_data
#3. Copy the key file of the server to the client
mkdir /etc/ceph && scp monhost:/etc/ceph* /etc/ceph/
#4. Mount file equipment
mkdir /mnt/mycephfs
mount -t ceph monhost:/ /mnt/mycephfs -o name=foo #Kernel mount without installing CEPH fuse
ceph-fuse --id foo /mnt/mycephfs #User mode mount

Result analysis

[root@node-1 fs1]# mdtest -F -L -z 4 -b 2 -I 10 -u -d /mnt/mycephfs/mdtest1/
-- started at 10/10/2020 14:04:19 --

mdtest-3.4.0+dev was launched with 1 total task(s) on 1 node(s)
Command line used: mdtest '-F' '-L' '-z' '4' '-b' '2' '-I' '10' '-u' '-d' '/mnt/fs1/mdtest1/'
WARNING: Read bytes is 0, thus, a read test will actually just open/close.
Path: /mnt/fs1/mdtest1
FS: 7.0 GiB   Used FS: 0.0%   Inodes: 0.0 Mi   Used Inodes: 100.0%

Nodemap: 1
1 tasks, 160 files

SUMMARY rate: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation             :        517.555        517.555        517.555          0.000 #File creation
   File stat                 :       1101.718       1101.718       1101.718          0.000 #File query
   File read                 :       1087.185       1087.185       1087.185          0.000 #File reading
   File removal              :        428.320        428.320        428.320          0.000 #File deletion
   Tree creation             :        404.253        404.253        404.253          0.000 #Tree creation
   Tree removal              :        126.012        126.012        126.012          0.000 #Tree deletion
-- finished at 10/10/2020 14:04:20 --

rbd bench

Command format

rbd bench [--pool <pool>] [--namespace <namespace>] [--image <image>]/[<pool-name>/[<namespace>/]]<image-name> [--io-size <io-size>] [--io-threads <io-threads>] [--io-total <io-total>] [--io-pattern <io-pattern>] [--rw-mix-read <rw-mix-read>] --io-type <io-type> <image-spec> 

Parameter introduction

Parameter nameParameter description
-p/–poolpool name
–namespacenamespace name
–imageimage name[
–io-sizeDefault 4k
–io-threadsNumber of threads, 16 by default
–io-totalTotal data size, 1G by default
–io-patternrand or SEQ, default seq
–rw-mix-readRead ratio, default (50% read / 50% write)

test case

#4k sequential write
rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern seq --io-type write

#4k random write
rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern rand --io-type write

#4k sequential read
rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern seq --io-type read

#4k random read
rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern rand --io-type write

Result analysis

[root@node-1 ~]# rbd bench rbd/test.img --io-size 4K --io-total 200M --io-pattern rand --io-type write --io-threads 1
bench  type write io_size 4096 io_threads 1 bytes 209715200 pattern random
    1      7290   7276.67  29805226.74
    2      9694   4821.01  19746856.87
    3     12183   4062.68  16640721.35
    4     14552   3638.22  14902168.99
    5     16650   3330.17  13640371.91
    6     18427   2229.14  9130560.10
  290     50070     32.67  133797.32
  295     50252     33.50  137204.83
  296     50306     29.18  119520.84
  299     50458     36.72  150385.87
  300     50667     57.88  237068.90
  305     50785     49.04  200882.86
  310     50870     39.70  162620.82
  311     50989     48.05  196806.10
  313     51000     38.25  156661.96
  321     51116     21.16  86689.84
elapsed(Duration):   371  ops(Total operands):    51200  ops/sec(IOPS):   137.68  bytes/sec(Bandwidth): 563939.23

echo 3 > /proc/sys/vm/drop_cache


Use the free command to view the cache

[root@node-1 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           1819         558         995           9         264        1109
Swap:          2047           0        2047

Among them, buff/cache Cache resources

Manually free memory

/Proc is a virtual file system. We can read and write it as a means of communication with kernel entities. In other words, you can modify the file in / proc to adjust the current kernel behavior. Then we can adjust / proc/sys/vm/drop_caches to free memory. The operation is as follows:

Because this is a non-destructive operation and the dirty object is not releasable, the user should run it first

Writing this file will cause the kernel to delete the clean cache from memory dentries and inode,This makes the memory free.
To release pagecache,Please use
echo 1 > /proc/sys/vm/drop_caches

To release dentries And inodes, use
echo 2 > /proc/sys/vm/drop_caches

To release pagecache,dentries And inodes, use
echo 3 > /proc/sys/vm/drop_caches

Reference link


Topics: Linux Operation & Maintenance Ceph