fio
Fio was originally written to avoid the trouble of writing special test case programs when testing a specific workload for performance reasons or finding / reproducing errors. Writing such a test application can be cumbersome, especially if you must do so often. Therefore, I need a tool that can simulate a given I / O workload without writing custom test cases again and again.
Command format
fio [options] [job options] <job files>
Parameter introduction
Parameter name | Parameter description |
---|---|
-filename | File name, usually block device path / dev/sdb. Use filename=/dev/sda:/dev/sdb to represent different filenames of multiple job s. |
-direct | Whether to use directIO. The default value is false and direct=1 is enabled. |
-iodepth | Queue depth. |
-thread | Number of threads |
-rw/-readwrite | Reading and writing mode. Contains read, write, trim (Linux block devices and SCSI character devices only), randread, randrite, randtrim (Linux block devices and SCSI character devices only), rw/readwrite, randrw, and trimwrite. The read / write percentage is 50 / 50 by default. |
-rwmixread | Read percentage. The default value is 50 |
-rwmixwrite | Write percentage. If both rwmixread and rwmixwrite are given and their values do not add up to 100%, the latter will be used to override the former. If fio is required to limit reads or writes to a certain rate, this may interfere with a given rate setting. If so, the distribution may be uneven. Default: 50. |
-ioengine | libaio,rados,rbd |
-bs | Single io block file size. ceph is generally set to 4k. |
-bsrange | Data block size range, the default unit is kb. |
size | Amount of data read and written by each thread |
-numjobs | Number of clones created for this job. |
-runtime | Total running time. The default unit is seconds |
-ramp_time | If set, fio will run the specified workload for this amount of time before logging any performance numbers. Useful for letting performance settle before logging results, thus minimizing the runtime required for stable results. Note that the ramp_time is considered lead in time for a job, thus it will increase the total runtime if a special timeout or runtime is specified. When the unit is omitted, the value is given in seconds |
-name | Job name, which also has a special purpose to indicate the start of a new job. If fio -name=job1 -name=job2, two tasks are created and the parameters before mn of - name=job1 are shared- After name is the unique parameter of job2 task. |
-refill_buffers | If this option is given, fio will refill the I/O buffer on each commit. Of course, only if zero is not specified_ Buffers makes sense. It is not set by default, that is, the buffer is only filled during init, and the data in it is reused if possible, but if any verify, the buffer_ compress_ Percentage or DeDupe_ If percentage is enabled, refill_buffers will also be enabled automatically. Main function: clear the buffer to avoid hitting data from I/O cache. |
-randrepeat | Make the generated random data repeatable. The default value is true. |
-invalidate | Please invalidate the buffer / page cache portion of this file before starting io. The default is true |
-norandommap | Generally, when fio performs random IO, it will overwrite each block of the file. If this option is set, fio will only obtain a new random offset without querying the past history. This means that some blocks may not be read or written, and some blocks may have to be read / written many times. The two options are mutually exclusive with verify = and only multiple block sizes (bsrange =) are in use, because fio will only record the rewriting of complete blocks. |
**** | The following are engine parameters, which must be written after the specified ioengine. |
-clustername | rbd,rados parameter. ceph cluster name. |
-rbdname | rbd parameter, RBD image name. |
-pool | rbd, rados parameter. Storage pool name. Required. |
-clientname | rbd, rados parameter. Specify the user name (without 'client.' prefix) to access Ceph cluster. If clustername is specified, clientname should be the full type. id string. If there is no type. After the prefix is given, fio will add 'client'. default. |
-busy_poll | rbd, rados parameter. Poll store instead of waiting for completion. Usually this provides better throughput at cost of higher(up to 100%) CPU utilization. |
test case
-
The engine is libaio, and the kernel is used to mount the rbd device for testing
#1. Create an image and specify the layering feature to avoid that the kernel does not support advanced features rbd create --size {megabytes} {pool-name}/{image-name} --image-feature layering #2. Notify kernal rbd map {pool-name}/{image-name} #3. Format block device mkfs.ext4 {block_name} #4. Mount equipment mkdir {file_path} mount {block_name} {file_path} #5. Check the equipment mounting df
#4k random write, iops fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randwrite -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest #4k random read, iops fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randread -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest #4k random reading and writing, 70% reading, iops fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randrw -rwmixread=70 -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest #1M sequential write, throughput fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=write -ioengine=libaio -bs=1M -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest #1M sequential read, throughput fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=read -ioengine=libaio -bs=1M -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest
-
The engine is rados
#4k random write, iops fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randwrite -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128 #4k random read, iops fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randread -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128 #64K sequential write, throughput fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randwrite -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128 #64k sequential read, throughput fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rados -clientname=admin -pool=radospool -invalidate=0 -rw=randread -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
-
The engine is rbd
#4k random write, iops fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randwrite -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128 #4k random read, iops fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randread -bs=4k -size=1G -runtime=60 -ramp_time=60 -iodepth=128 #64K sequential write, throughput fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randwrite -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128 #64k sequential read, throughput fio -direct=1 -thread -refill_buffers -norandommap -randrepeat=0 -numjobs=1 -ioengine=rbd -clientname=admin -pool=radospool -rbdname=imagename -invalidate=0 -rw=randread -bs=64k -size=1G -runtime=60 -ramp_time=60 -iodepth=128
Result analysis
[root@node-1 ~]# fio -filename=/mnt/rbd-demo/fio.img -direct=1 -iodepth=32 -thread -rw=randwrite -ioengine=libaio -bs=4k -size=200m -numjobs=8 -runtime=60 -group_reporting -name=mytest mytest: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 ... fio-3.7 Starting 8 threads mytest: Laying out IO file (1 file / 200MiB) Jobs: 1 (f=1): [_(7),w(1)][91.7%][r=0KiB/s,w=42.9MiB/s][r=0,w=10.0k IOPS][eta 00m:04s] mytest: (groupid=0, jobs=8): err= 0: pid=2178: Tue Sep 8 09:39:40 2020 write: IOPS=9440, BW=36.9MiB/s (38.7MB/s)(1600MiB/43386msec) #Write time IOPS and bandwidth (BW) overview slat (usec): min=2, max=630832, avg=692.55, stdev=5663.73 #submission latency, "how long does it take for the disk to submit IO to the kernel for processing?" clat (nsec): min=1476, max=1769.0M, avg=23061496.85, stdev=51507870.00 #completion latency, "time when the kernel finishes executing IO" lat (usec): min=68, max=1769.0k, avg=23754.76, stdev=52699.12 #Total delay, main reference index clat percentiles (msec): #completion latency | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 8], 20.00th=[ 9], | 30.00th=[ 10], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 16], | 70.00th=[ 25], 80.00th=[ 31], 90.00th=[ 32], 95.00th=[ 34], | 99.00th=[ 255], 99.50th=[ 405], 99.90th=[ 701], 99.95th=[ 827], | 99.99th=[ 995] bw ( KiB/s): min= 15, max=47324, per=13.75%, avg=5191.44, stdev=4918.35, samples=603 iops : min= 3, max=11831, avg=1297.69, stdev=1229.64, samples=603 lat (usec) : 2=0.01%, 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01% lat (usec) : 500=0.03%, 750=0.03%, 1000=0.02% lat (msec) : 2=0.10%, 4=6.43%, 10=23.90%, 20=35.85%, 50=30.69% lat (msec) : 100=0.93%, 250=0.97%, 500=0.70%, 750=0.25%, 1000=0.07% cpu : usr=0.25%, sys=10.03%, ctx=357382, majf=0, minf=12 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,409600,0,0 short=0,0,0,0 dropped=0,0,0,0 #Number of IO issued latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=36.9MiB/s (38.7MB/s), 36.9MiB/s-36.9MiB/s (38.7MB/s-38.7MB/s), io=1600MiB (1678MB), run=43386-43386msec Disk stats (read/write): dm-0: ios=0/410431, merge=0/0, ticks=0/1271401, in_queue=1271380, util=96.49%, aggrios=0/410272, aggrmerge=0/340, aggrticks=0/1212994, aggrin_queue=1212830, aggrutil=96.47% sda: ios=0/410272, merge=0/340, ticks=0/1212994, in_queue=1212830, util=96.47%
rados bench
The layer of rados is used to test the performance of pool storage.
Command format
rados bench -p <pool_name> <seconds> <write|seq|rand> [-b block_size] [-t concurrent_operations] [-k /.../ceph.client.admin.keyring] [-c /.../ceph.conf] [--no-cleanup] [--run-name run_name]
Parameter introduction
Parameter name | Parameter description |
---|---|
-p | Tested pool |
second | Test time in seconds |
write|seq|rand | Write | sequential read | random read |
-b | Block size, 4M by default. Write only numbers without units. The default unit is k |
-t | Number of concurrent, default: 16 |
-k | Specify ceph.client.admin.keyring |
-c | Specify ceph.conf |
–no-cleanup | It means that the data will not be deleted after writing. You can use Rados - P < pool at the end of the test_ Name > cleanup delete |
–run-name | The default is benchmark_last_metadata. For multi client test, this value must be set by yourself, otherwise it will cause multi client read failure |
test case
#4k random write rados bench -p rbd 60 write -b=4K -t=128 --no-cleanup #4k sequential read rados bench -p rbd 60 seq -b=4K -t=128 #4k random read rados bench -p rbd 60 rand -b=4K -t=128 #Cleaning data rados -p rbd cleanup
Result analysis
[root@node-1 ~]# rados bench -p rbd 10 write -b=4K -t=128 hints = 1 Maintaining 128 concurrent writes of 4096 bytes to objects of size 4096 for up to 10 seconds or 0 objects Object prefix: benchmark_data_node-1_2254 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 128 128 0 0 0 - 0 1 128 1308 1180 4.52467 4.60938 0.00464254 0.0860147 2 128 2136 2008 3.86785 3.23438 0.239057 0.117895 3 128 2724 2596 3.18055 2.29688 0.192872 0.138857 4 128 3578 3450 3.21747 3.33594 0.12293 0.150549 5 128 3994 3866 2.88573 1.625 0.0140784 0.14932 6 128 4362 4234 2.6529 1.4375 0.0735093 0.183773 7 128 4830 4702 2.53856 1.82812 0.608796 0.189726 8 128 5476 5348 2.51127 2.52344 0.113523 0.195028 9 128 6323 6195 2.58801 3.30859 0.493252 0.189299 Total time run: 10.1872 Total writes made: 6889 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 2.64156 #bandwidth Stddev Bandwidth: 1.0252 Max bandwidth (MB/sec): 4.60938 Min bandwidth (MB/sec): 1.4375 Average IOPS: 676 #Average IOPS Stddev IOPS: 262.452 Max IOPS: 1180 Min IOPS: 368 Average Latency(s): 0.189061 Stddev Latency(s): 0.249725 Max latency(s): 1.96704 Min latency(s): 0.00155043 Cleaning up (deleting benchmark objects) Removed 6889 objects Clean up completed and total clean up time :4.18956
mdtest
Installation tutorial
#1. Installation dependency yum -y install gcc gcc-c++ gcc-gfortran #2. Create package directory mkdir tools #3. Install openmpi (in addition to openmpi, you can also use mpich, but the use is different from openmpi) cd tools curl -O https://download.open-mpi.org/release/open-mpi/v1.10/openmpi-1.10.7.tar.gz ./configure --prefix=/usr/local/openmpi/ make && make install #4. Add environment variables vim /root/.bashrc #Add the following export PATH=$PATH:/usr/local/openmpi/bin/:/usr/local/ior/bin/ export LD_LIBRARY_PATH=/usr/local/openmpi/lib:${LD_LIBRARY_PATH} export MPI_CC=mpicc source /root/.bashrc #5. Install IOR cd tools/ yum -y install git automake git clone https://github.com/chaos/ior.git cd ior ./bootstrap ./configure --prefix=/usr/local/ior/ make && make install #6. Install mdtest cd tools/ mkdir mdtest && cd mdtest wget https://nchc.dl.sourceforge.net/project/mdtest/mdtest%20latest/mdtest-1.9.3/mdtest-1.9.3.tgz tar xf mdtest-1.9.3.tgz make
Command format
mdtest [-b #] [-B] [-c] [-C] [-d testdir] [-D] [-e] [-E] [-f first] [-F] [-h] [-i iterations] [-I #] [-l last] [-L] [-n #] [-N #] [-p seconds] [-r] [-R[#]] [-s #] [-S] [-t] [-T] [-u] [-v] [-V #] [-w #] [-y] [-z #]
Parameter introduction
Parameter name | Parameter description |
---|---|
-F | Create files only |
-L | Create files / directories only at the subdirectory level of the directory tree |
-z | Directory tree depth |
-b | Directory tree branch |
-i (capital i) | Number of items per tree node |
-n | The total number of files / directories created on the entire tree. Cannot be used with - I |
-u | Specify a working directory for each work task |
-d | Indicate the directory where the test runs. You can test multiple directories "- d fullpath1@fullpath2@fullpath3” |
test case
Mount file device
#1. Install CEPH fuse on the client node yum install ceph-fuse #2. The server creates a file storage pool ceph osd pool create cephfs_data ceph osd pool create cephfs_metadata ceph fs new cephfs cephfs_metadata cephfs_data #3. Copy the key file of the server to the client mkdir /etc/ceph && scp monhost:/etc/ceph* /etc/ceph/ #4. Mount file equipment mkdir /mnt/mycephfs mount -t ceph monhost:/ /mnt/mycephfs -o name=foo #Kernel mount without installing CEPH fuse ceph-fuse --id foo /mnt/mycephfs #User mode mount
Result analysis
[root@node-1 fs1]# mdtest -F -L -z 4 -b 2 -I 10 -u -d /mnt/mycephfs/mdtest1/ -- started at 10/10/2020 14:04:19 -- mdtest-3.4.0+dev was launched with 1 total task(s) on 1 node(s) Command line used: mdtest '-F' '-L' '-z' '4' '-b' '2' '-I' '10' '-u' '-d' '/mnt/fs1/mdtest1/' WARNING: Read bytes is 0, thus, a read test will actually just open/close. Path: /mnt/fs1/mdtest1 FS: 7.0 GiB Used FS: 0.0% Inodes: 0.0 Mi Used Inodes: 100.0% Nodemap: 1 1 tasks, 160 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 517.555 517.555 517.555 0.000 #File creation File stat : 1101.718 1101.718 1101.718 0.000 #File query File read : 1087.185 1087.185 1087.185 0.000 #File reading File removal : 428.320 428.320 428.320 0.000 #File deletion Tree creation : 404.253 404.253 404.253 0.000 #Tree creation Tree removal : 126.012 126.012 126.012 0.000 #Tree deletion -- finished at 10/10/2020 14:04:20 --
rbd bench
Command format
rbd bench [--pool <pool>] [--namespace <namespace>] [--image <image>]/[<pool-name>/[<namespace>/]]<image-name> [--io-size <io-size>] [--io-threads <io-threads>] [--io-total <io-total>] [--io-pattern <io-pattern>] [--rw-mix-read <rw-mix-read>] --io-type <io-type> <image-spec>
Parameter introduction
Parameter name | Parameter description |
---|---|
-p/–pool | pool name |
–namespace | namespace name |
–image | image name[ |
–io-size | Default 4k |
–io-threads | Number of threads, 16 by default |
–io-total | Total data size, 1G by default |
–io-pattern | rand or SEQ, default seq |
–rw-mix-read | Read ratio, default (50% read / 50% write) |
–io-type | read,write,readwrite(rw) |
test case
#4k sequential write rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern seq --io-type write #4k random write rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern rand --io-type write #4k sequential read rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern seq --io-type read #4k random read rbd bench ceph-demo/test.img --io-threads 1 --io-size 4K --io-total 200M --io-pattern rand --io-type write
Result analysis
[root@node-1 ~]# rbd bench rbd/test.img --io-size 4K --io-total 200M --io-pattern rand --io-type write --io-threads 1 bench type write io_size 4096 io_threads 1 bytes 209715200 pattern random SEC OPS OPS/SEC BYTES/SEC 1 7290 7276.67 29805226.74 2 9694 4821.01 19746856.87 3 12183 4062.68 16640721.35 4 14552 3638.22 14902168.99 5 16650 3330.17 13640371.91 6 18427 2229.14 9130560.10 ... 290 50070 32.67 133797.32 295 50252 33.50 137204.83 296 50306 29.18 119520.84 299 50458 36.72 150385.87 300 50667 57.88 237068.90 305 50785 49.04 200882.86 310 50870 39.70 162620.82 311 50989 48.05 196806.10 313 51000 38.25 156661.96 321 51116 21.16 86689.84 elapsed(Duration): 371 ops(Total operands): 51200 ops/sec(IOPS): 137.68 bytes/sec(Bandwidth): 563939.23
echo 3 > /proc/sys/vm/drop_cache
free
Use the free command to view the cache
[root@node-1 ~]# free -m total used free shared buff/cache available Mem: 1819 558 995 9 264 1109 Swap: 2047 0 2047 Among them, buff/cache Cache resources
Manually free memory
/Proc is a virtual file system. We can read and write it as a means of communication with kernel entities. In other words, you can modify the file in / proc to adjust the current kernel behavior. Then we can adjust / proc/sys/vm/drop_caches to free memory. The operation is as follows:
Because this is a non-destructive operation and the dirty object is not releasable, the user should run it first sync Writing this file will cause the kernel to delete the clean cache from memory dentries and inode,This makes the memory free. To release pagecache,Please use echo 1 > /proc/sys/vm/drop_caches To release dentries And inodes, use echo 2 > /proc/sys/vm/drop_caches To release pagecache,dentries And inodes, use echo 3 > /proc/sys/vm/drop_caches
Reference link
https://blog.csdn.net/wyzxg/article/details/7279986/