Detailed explanation of IO test tools

Posted by MoombaDS on Mon, 24 Jan 2022 17:47:30 +0100

At present, the mainstream third-party IO testing tools include fio, iometer and Orion, which have their own advantages.

fio is more convenient to use in Linux system and iometer is more convenient to use in window system. Orion is the IO test software of oracle, which can simulate the reading and writing of oracle database scenario without installing oracle database.

The following is the IO test of SAN storage using fio tool on Linux system.

1. Install fio

Download fio-2.1.10 on the official website of FIO Tar file, unzip it/ After configure, make, and make install, you can use FIO.

2. fio parameter interpretation

You can use fio -help to view each parameter. For specific parameters, you can view the how to document on the official website. The following is a description of several common parameters

filename=/dev/emcpowerb Support file systems or bare devices,-filename=/dev/sda2 or-filename=/dev/sdb
direct=1                 The test process bypasses the machine's own buffer,Make the test results more realistic
rw=randwread             Test random read I/O
rw=randwrite             The test is written randomly I/O
rw=randrw                Test random mixed write and read I/O
rw=read                  Test sequential read I/O
rw=write                 Written in test order I/O
rw=rw                    Test sequential mixed write and read I/O
bs=4k                    Single time io The block file size of is 4 k
bsrange=512-2048         As above, specify the size range of the data block
size=5g                  The size of this test file is 5 g,4 at a time k of io Test
numjobs=30               The test thread this time is 30
runtime=1000             The test time is 1000 seconds. If it is not written, it will be 5 seconds all the time g The document is divided into 4 parts k Until you finish each time
ioengine=psync           io Engine usage pync Mode, if you want to use libaio Engine, need yum install libaio-devel package
rwmixwrite=30            In mixed read-write mode, write accounts for 30%%
group_reporting          For displaying results, summarize the information of each process
 in addition
lockmem=1g               Use only 1 g Test memory
zero_buffers             Initialize system with 0 buffer
nrfiles=8                Number of files generated per process

3. Detailed explanation of fio test scenario and report generation

Test scenario:

100% random, 100% read, 4K
fio -filename=/dev/emcpowerb -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=4k -size=1000G -numjobs=50 -runtime=180 -group_reporting -name=rand_100read_4k

100% random, 100% write, 4K
fio -filename=/dev/emcpowerb -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=1000G -numjobs=50 -runtime=180 -group_reporting -name=rand_100write_4k

100% sequential, 100% read, 4K
fio -filename=/dev/emcpowerb -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=4k -size=1000G -numjobs=50 -runtime=180 -group_reporting -name=sqe_100read_4k

100% sequential, 100% write, 4K
fio -filename=/dev/emcpowerb -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=1000G -numjobs=50 -runtime=180 -group_reporting -name=sqe_100write_4k

100% random, 70% read, 30% write 4K
fio -filename=/dev/emcpowerb -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=4k -size=1000G -numjobs=50 -runtime=180 -group_reporting -name=randrw_70read_4k

Result report view:

[root@rac01-node02]# fio -filename=/dev/sdc4 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=4k -size=1000G -numjobs=50 -runtime=180 -group_reporting -name=randrw_70read_4k_local
randrw_70read_4k_local: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.1.10
Starting 50 threads
Jobs: 21 (f=21): [____m____m_m___m____mmm__mmm__mm_m_mmm_m__m__m_m_m] [3.4% done] [7004KB/2768KB/0KB /s] [1751/692/0 iops] [eta 01h:27m:00s]
randrw_70read_4k_local: (groupid=0, jobs=50): err= 0: pid=13710: Wed May 31 10:23:31 2017
  read : io=1394.2MB, bw=7926.4KB/s, iops=1981, runt=180113msec
    clat (usec): min=39, max=567873, avg=24323.79, stdev=25645.98
     lat (usec): min=39, max=567874, avg=24324.23, stdev=25645.98
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    9], 40.00th=[   12], 50.00th=[   16], 60.00th=[   21],
     | 70.00th=[   27], 80.00th=[   38], 90.00th=[   56], 95.00th=[   75],
     | 99.00th=[  124], 99.50th=[  147], 99.90th=[  208], 99.95th=[  235],
     | 99.99th=[  314]
    bw (KB  /s): min=   15, max=  537, per=2.00%, avg=158.68, stdev=38.08
  write: io=615280KB, bw=3416.8KB/s, iops=854, runt=180113msec
    clat (usec): min=167, max=162537, avg=2054.79, stdev=7665.24
     lat (usec): min=167, max=162537, avg=2055.38, stdev=7665.23
    clat percentiles (usec):
     |  1.00th=[  201],  5.00th=[  227], 10.00th=[  249], 20.00th=[  378],
     | 30.00th=[  548], 40.00th=[  692], 50.00th=[  844], 60.00th=[  996],
     | 70.00th=[ 1160], 80.00th=[ 1304], 90.00th=[ 1720], 95.00th=[ 3856],
     | 99.00th=[40192], 99.50th=[58624], 99.90th=[98816], 99.95th=[123392],
     | 99.99th=[148480]
    bw (KB  /s): min=    6, max=  251, per=2.00%, avg=68.16, stdev=29.18
    lat (usec) : 50=0.01%, 100=0.03%, 250=3.15%, 500=5.00%, 750=5.09%
    lat (usec) : 1000=4.87%
    lat (msec) : 2=9.64%, 4=4.06%, 10=21.42%, 20=18.08%, 50=19.91%
    lat (msec) : 100=7.24%, 250=1.47%, 500=0.03%, 750=0.01%
  cpu          : usr=0.07%, sys=0.21%, ctx=522490, majf=0, minf=7
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=356911/w=153820/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=1394.2MB, aggrb=7926KB/s, minb=7926KB/s, maxb=7926KB/s, mint=180113msec, maxt=180113msec
  WRITE: io=615280KB, aggrb=3416KB/s, minb=3416KB/s, maxb=3416KB/s, mint=180113msec, maxt=180113msec

Disk stats (read/write):
  sdc: ios=356874/153927, merge=0/10, ticks=8668598/310288, in_queue=8978582, util=99.99%

IO = how many M IOS have been executed

bw = average IO bandwidth
iops=IOPS
runt = thread running time
slat = submission delay
clat = completion delay
lat = response time
bw = bandwidth
cpu = utilization
IO depth = IO queue
IO submit = number of IOS to be submitted for a single IO submission
IO complete=Like the above submit number, but for completions instead.
IO issued=The number of read/write requests issued, and how many of them were short.
IO latencies = distribution of IO latencies

IO = how many IO size s are executed in total
Aggrb = total bandwidth of group
minb = minimum Average bandwidth
maxb = maximum average bandwidth
Mint = the minimum running time of a thread in a group
Maxt = maximum running time of thread in group

ios = total number of ios executed by all group s
merge = total IO merges
ticks=Number of ticks we kept the disk busy.
io_queue = total time spent on the queue
util = disk utilization

4. Extended IO queue depth

At a certain time, there are n inflight IO requests, including IO requests in the queue and IO requests being processed by the disk. N is the queue depth.
Increasing the depth of the hard disk queue is to keep the hard disk working and reduce the idle time of the hard disk.
Increase queue depth - > improve utilization - > obtain IOPS and MBPS peaks - > note that the response time is within an acceptable range,
There are many ways to increase the queue depth. Asynchronous IO is used to initiate multiple IO requests at the same time, which is equivalent to multiple IO requests in the queue. Synchronous IO requests initiated by multiple threads are equivalent to multiple IO requests in the queue.
Increasing the application IO size will turn into multiple IO requests after reaching the bottom layer, which is equivalent to increasing the queue depth of multiple IO requests in the queue.
As the queue depth increases, the waiting time of IO in the queue will also increase, resulting in larger IO response time, which needs to be weighed.

Why parallel disk I/O? The main purpose is to improve the performance of the application. This is particularly important for virtual disks (or LUN s) composed of multiple physical disks.
If you submit one I/O at a time, although the response time is short, the throughput of the system is very small.
In contrast, submitting multiple I / OS at one time can not only shorten the moving distance of the head (through the elevator algorithm), but also improve IOPS.
If an elevator can only take one person at a time, everyone can quickly reach the destination (response time), but it takes a long waiting time (queue length).
Therefore, submitting multiple I / OS to the disk system at one time can balance throughput and overall response time.

Viewing the default queue depth on Linux system:

[root@qsdb ~]# lsscsi -l
[0:0:0:0]    disk    DGC      VRAID            0533  /dev/sda 
  state=running queue_depth=30 scsi_level=5 type=0 device_blocked=0 timeout=30
[0:0:1:0]    disk    DGC      VRAID            0533  /dev/sdb 
  state=running queue_depth=30 scsi_level=5 type=0 device_blocked=0 timeout=30
[2:0:0:0]    disk    DGC      VRAID            0533  /dev/sdd 
  state=running queue_depth=30 scsi_level=5 type=0 device_blocked=0 timeout=30
[2:0:1:0]    disk    DGC      VRAID            0533  /dev/sde 
  state=running queue_depth=30 scsi_level=5 type=0 device_blocked=0 timeout=30
[4:2:0:0]    disk    IBM      ServeRAID M5210  4.27  /dev/sdc 
  state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 timeout=90
[9:0:0:0]    cd/dvd  Lenovo   SATA ODD 81Y3677 IB00  /dev/sr0 
  state=running queue_depth=1 scsi_level=6 type=5 device_blocked=0 timeout=30

Use the dd command to set bs=2M for testing:

dd if=/dev/zero of=/dev/sdd bs=2M count=1000 oflag=direct

2097152000 bytes (2.1 GB) of 1000 + 0 read in and 1000 + 0 write out have been copied, 10.6663 seconds, 197 MB / s

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s   avgrq-sz   avgqu-sz   await   svctm   %util
sdd      0.00     0.00   0.00  380.60  0.00    389734.40  1024.00  2.39       6.28    2.56    97.42

It can be seen that after 2MB IO reaches the bottom layer, it will become multiple 512KB IOS. The average queue length is 2.39. The utilization rate of this hard disk is 97% and the MBPS reaches 197MB/s.
(why does it become 512KB IO? You can use Google to check the meaning and usage of the kernel parameter max_sectors_kb.) that is to say, increasing the queue depth can test the peak value of the hard disk.

5. Detailed explanation of IO command iostat in Linux system

[root@rac01-node01 /]# iostat -xd 3
Linux 3.8.13-16.2.1.el6uek.x86_64 (rac01-node01)     05/27/2017     _x86_64_    (40 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.05     0.75    2.50    0.50    76.59    69.83    48.96     0.00    1.17   0.47   0.14
scd0              0.00     0.00    0.02    0.00     0.11     0.00     5.25     0.00   21.37  20.94   0.05
dm-0              0.00     0.00    2.40    1.24    75.88    69.83    40.00     0.01    1.38   0.38   0.14
dm-1              0.00     0.00    0.02    0.00     0.14     0.00     8.00     0.00    0.65   0.39   0.00
sdc               0.00     0.00    0.01    0.00     0.11     0.00    10.20     0.00    0.28   0.28   0.00
sdb               0.00     0.00    0.01    0.00     0.11     0.00    10.20     0.00    0.15   0.15   0.00
sdd               0.00     0.00    0.01    0.00     0.11     0.00    10.20     0.00    0.25   0.25   0.00
sde               0.00     0.00    0.01    0.00     0.11     0.00    10.20     0.00    0.14   0.14   0.00

Output parameter description:

rrqms: How many read requests related to this device are received per second Merge When the system call needs to read data, VFS Send request to each FS,If FS It is found that different read requests read the same data Block Data, FS This request will be merged Merge)
wrqm/s: How many write requests related to this device are received per second Merge Yes.
rsec/s: The number of sectors read from the device per second.
wsec/s: The number of sectors written to the device per second.
rKB/s: The number of kilobytes read from the device per second.
wKB/s: The number of kilobytes written to the device per second.
avgrq-sz: Average requested sector size,The average size (in sectors) of the requests that were issued to the device.
avgqu-sz: Is the average length of the request queue. There is no doubt that the shorter the queue length, the better,The average queue length of the requests that were issued to the device.   
await: every last IO The average time (in microseconds and milliseconds) that a request takes to process. It can be understood here as IO Response time, generally system IO The response time should be less than 5 ms,If greater than 10 ms It's bigger.
This time includes queue time and service time, that is, generally, await greater than svctm,The smaller the difference between them, the shorter the queue time. On the contrary, the larger the difference, the longer the queue time, indicating that there is a problem with the system. svctm: Represents the average per device I/O The service time (in milliseconds) of the operation. If svctm Values and await Very close, indicating almost no I/O Wait, the disk performance is very good.
If await The value of is much higher than svctm The value of I/O If the queue wait is too long, applications running on the system will slow down. %util: All processing within the statistical time IO Time, divided by the total statistical time. For example, if the statistics interval is 1 second, the device has 0.8 Seconds in processing IO,And 0.2 Seconds idle, then the device%util = 0.8/1 = 80%,
Therefore, this parameter implies the busy degree of the equipment. Generally, if this parameter is 100%Indicates that the disk device is running close to full load (of course, if there are multiple disks, even if%util It's 100%,Because of the disk's concurrency, disk usage may not reach the bottleneck).
 
Classification: Linux