Understanding of linux operation and maintenance CPU utilization

Posted by grace5 on Thu, 10 Mar 2022 04:45:59 +0100

Start with the top command

Execute the top command on the Linux shell, and you can see such a line of CPU utilization data:

%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Here is a reference to the Linux man pages of the top command:

http://man7.org/linux/man-pages/man1/top.1.html

us, user: time running un-niced user processessy, system: time running kernel processesni, nice: time running niced user processesid, idle: time spent in the kernel idle handlerwa, IO-wait: time waiting for I/O completionhi: time spent servicing hardware interruptssi: time spent servicing software interruptsst: time stolen from this vm by the hypervisor

/proc/stat

This paper briefly introduces the basic method of calculating CPU utilization in Linux.

/proc/stat stores some statistical information of the system. At a certain moment on my machine, the content is as follows:

[linjinhe@localhost ~]$ cat /proc/stat 
cpu  117450 5606 72399 476481991 1832 0 2681 0 0 0
cpu0 31054 90 19055 119142729 427 0 1706 0 0 0
cpu1 22476 3859 18548 119155098 382 0 272 0 0 0
cpu2 29208 1397 19750 119100548 462 0 328 0 0 0
cpu3 34711 258 15045 119083615 560 0 374 0 0 0
intr 41826673 113 102 0 0 0 0 0 0 1 134 0 0 186 0 0 0 81 0 0 256375 0 0 0 29 0 602 143 16 442 94859 271462 25609 4618 8846 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 58634924
btime 1540055659
processes 5180062
procs_running 1
procs_blocked 0
softirq 49572367 5 22376247 238452 3163482 257166 0 4492 19385190 0 414733

We only focus on the first line of cpu (the second line below is the comment I added).

cpu  117450 5606   72399   476481991  1832     0    2681   0      0         0
      (us)  (ni)    (sy)     (id)      (wa)   (hi)  (si)  (st) (guest) (guest_nice)

In the previous section, for the description of CPU utilization, Linux man pages uses the word time (time running, time spent, time stolen). The statistical data here is actually the time occupied by each item (us, sy, ni, id, wa, hi, si, st) from the system startup to the current time. The unit is jiffies. The number of jiffies divided into one second can be obtained through sysconf(_SC_CLK_TCK). Generally 100, i.e. 1 jiffies == 0.01 s. (st, guest and guest_nice are related to virtualization / virtual machine. If these values are too high, there is a problem with the implementation of virtualization or the host. They are not the focus of this article.)

The basic principle of calculating CPU utilization is to sample and calculate from / proc/stat. The simplest method is to sample once a second / proc/stat, such as:

The first N Second sampling cpu_total1 = us1 + ni1 + sy1 + id1 + wa1 + hi1 + si1 + st1 + guest1 + guest_nice1
 The first N+1 Second sampling cpu_total2 = us2 + ni2 + sy2 + id2 + wa2 + hi2 + si2 + st2 + guest2 + guest_nice2
us The proportion of (us2 - us1) / (cpu_total2 - cpu_total1). 

nice

nice - run a program with modified scheduling priority

nice is a command that can modify the process scheduling priority. For details, please refer to man pages.

http://man7.org/linux/man-pages/man1/nice.1.html

In Linux, a process has a nice value, which represents the scheduling priority of the process.

The more nice (the greater the nice value), the lower the scheduling priority. How do you understand this sentence? Process scheduling is essentially a competition between processes for the limited resources of CPU. The more nice the process is, the more "humble" it will be, so the lower its chance of obtaining CPU.

In the above CPU utilization, the CPU used by user mode processes is divided into niced and UN niced, which is no essential difference. I seldom encounter the scene of using nice command (I have never encountered it personally).

Understanding us

After knowing the meaning of us, simply write a code to control us.

#include <pthread.h>
#include <stdio.h>
#include <assert.h>
#include <vector>
#include <string>

void* CpuUsWorker(void* arg)
{
    uint64_t i = 0;
    while (true)
    {
        i++;
    }   
    return nullptr;
}

void CpuUs(int n)
{
    std::vector<pthread_t> pthreads(n);
    for (int i = 0; i < n; i++)
    {
        assert(pthread_create(&pthreads[i], nullptr, CpuUsWorker, nullptr) == 0);
    }

    for (const auto& tid : pthreads)
    {
        assert(pthread_join(tid, nullptr) == 0);
    }
}

int main(int argc, char** argv)
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s threads\n", argv[0]);
        return -1;
    }   
    CpuUs(std::stoi(argv[1]));
    return 0;
}

The machine tested is four cores. The code is relatively simple. A thread can run all over one core. Here are my test results:

./cpu_us 1
%Cpu(s): 25.0 us,  0.0 sy,  0.0 ni, 75.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
./cpu_us 2
%Cpu(s): 50.0 us,  0.1 sy,  0.0 ni, 49.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
./cpu_us 3
%Cpu(s): 75.1 us,  0.0 sy,  0.0 ni, 24.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Understanding ni

  1. ni represents the CPU consumed by the niced user state process.
  2. Nice can adjust the nice value of the running program.

Here are my test results. It can be seen that ni becomes 25%, which is in line with expectations.

nice ./cpu_us 1
%Cpu(s):  0.1 us,  0.0 sy, 25.0 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

The following is the nice displayed by the top command/ cpu_ The process information of us 1. The NI column is the nice value, which is 10.

PID    USER    PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND          
6905 linjinhe  30  10   23024    844    700 S 100.0  0.0   0:03.06 cpu_us

Here is/ cpu_ The process information of us 1, whose value is 0.

PID   USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
6901 linjinhe   20   0   23024    844    700 S 100.0  0.0   0:12.36 cpu_us

Understanding sy

Generally, if sy is too high, it means that the program calls the Linux system call with great overhead. This can also simply write a program to verify it.

#include <pthread.h>
#include <stdio.h>
#include <assert.h>
#include <vector>
#include <string>

void* NoopWorker(void* arg)
{
    return nullptr;
}

void* CpuSyWorker(void* arg)
{
    while (true)
    {
        pthread_t tid;
        assert(pthread_create(&tid, nullptr, NoopWorker, nullptr) == 0);
        assert(pthread_detach(tid) == 0);
    }
}

void CpuSy(int n)
{
    std::vector<pthread_t> pthreads(n); 
    for (int i = 0; i < n; i++)
    {
        assert(pthread_create(&pthreads[i], nullptr, CpuSyWorker, nullptr) == 0);
    }
    for (const auto& tid : pthreads)
    {
        assert(pthread_join(tid, nullptr) == 0);
    }
}

int main(int argc, char** argv)
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s threads\n", argv[0]);
        return -1;
    }

    CpuSy(std::stoi(argv[1]));
}

Test results:

./cpu_sy 1
%Cpu(s):  8.8 us, 59.3 sy,  0.0 ni, 31.3 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st

A lot of system calls made sy soar. The overhead of pthread is different_ Create costs a lot.

Understanding wa

wa, even the related Linux man pages say it is unreliable

http://man7.org/linux/man-pages/man5/proc.5.html

Therefore, do not think that there is a problem with the I/O of the system when you see that the wa is very high.

1,The CPU will not wait for I/O to complete; iowait is the time that a task is waiting for I/O to complete. When a CPU goes into idle state for outstanding task I/O, another task will be scheduled on this CPU. 2,On a multi-core CPU, the task waiting for I/O to complete is not running on any CPU, so the iowait of each CPU is difficult to calculate. 3,The value in this field may decrease in certain conditions.

Say my understanding:

1. Suppose there is a single core system. The CPU does not really "wait" for I/O. At this time, the CPU is actually idle. If there are other processes that can run, run other processes. At this time, the CPU time is not included in iowait. If there are no other processes in the system to run at this time, the CPU needs to "wait" for this I/O to complete before it can continue to run. At this time, the "waiting" time is counted into iowait.

2. For multi-core systems, if there is iowait, which CPU should be given? This is a problem.

3. Wa is high, which does not indicate that there is a problem with the I/O of the system. If the whole system has only simple tasks and I/O constantly, the wa may be very high at this time, and the I/O of the system disk is far from reaching the upper limit.

4. wa is low, which does not mean that the I/O of the system is OK. Suppose that the machine carries out a large number of I/O tasks, which makes the disk bandwidth slow, and at the same time, there are computing tasks that make the CPU full. At this time, wa is very low, but the system I/O pressure is very high.

#include <pthread.h>
#include <stdio.h>
#include <assert.h>
#include <vector>
#include <string>
#include <fcntl.h>
#include <unistd.h>

void* CpuWaWorker(void* arg)
{
    std::string filename = "test_" + std::to_string(pthread_self());
    int fd = open(filename.c_str(), O_CREAT | O_WRONLY);
    assert(fd > 0);
    while (true)
    {
        assert(write(fd, filename.c_str(), filename.size()) > 0);
        assert(write(fd, "\n", 1) > 0);
        assert(fsync(fd) == 0);
    }
    return nullptr;
}

void CpuWa(int n)
{
    std::vector<pthread_t> pthreads(n);
    for (int i = 0; i < n; i++)
    {
        assert(pthread_create(&pthreads[i], nullptr, CpuWaWorker, nullptr) == 0);
    }

    for (const auto& tid : pthreads)
    {
        assert(pthread_join(tid, nullptr));
    }
}

int main(int argc, char** argv)
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s threads\n", argv[0]);
        return -1;
    }   
    CpuWa(std::stoi(argv[1]));
    return 0;
}
./cpu_wa 10
%Cpu(s):  0.3 us,  6.3 sy,  0.0 ni, 50.0 id, 41.1 wa,  0.0 hi,  2.3 si,  0.0 s

In the above example, I use multiple threads to continuously perform small I/O to brush up the value of wa, but the I/O bandwidth occupied is very small. My tester is SSD, and the I/O pressure is not large at this time.

Take another example:

./cpu_wa 10
./cpu_us 3
%Cpu(s): 75.3 us,  3.5 sy,  0.0 ni,  8.2 id, 10.3 wa,  0.0 hi,  2.7 si,  0.0 st

It can be seen that Mingming has also implemented it/ cpu_wa 10, wa even because it is carried out at the same time/ cpu_us 3 and down!!! Refer to point 4 above.

Understanding si and hi

The system call will trigger a soft interrupt, so the si will also change during the execution of some examples above, such as:

./cpu_wa 10
%Cpu(s):  0.3 us,  6.3 sy,  0.0 ni, 50.0 id, 41.1 wa,  0.0 hi,  2.3 si,  0.0 s

After the network card receives the data packet, the network card driver will notify the CPU through soft interrupt. Here we use iperf network performance testing tool to do some experiments.

$ iperf -s -i 1  # Server

$ iperf -c 192.168.1.4 -i 1 -t 60 # For the client, you can open several terminal s to execute multiple clients, so that the change of si will be more obvious
%Cpu(s):  1.7 us, 74.1 sy,  0.0 ni,  8.0 id,  0.0 wa,  0.0 hi, 16.2 si,  0.0 st

If the hardware is interrupted, no test method can be found temporarily, and it should be rarely encountered in practical application

Understanding st

st is related to virtualization. Here is my understanding.

Using virtualization technology, a 32 CPU core physical machine can create dozens of virtual machines with hundreds of single CPU cores. This is referred to as "oversold" in the public cloud scenario.

In most cases, a large number of physical server resources are idle. At this time, "oversold" will not have a significant impact. When the CPU pressure of many virtual machines increases, the resources of physical machines are obviously insufficient, which will cause each virtual machine to compete and wait for each other.

st is used to measure the CPU "stolen" by the Hypervisor to be used by other virtual machines. The higher the value, the more intense the resource competition of this physical server.

Understanding id

The CPU is idle. I don't think it's difficult to understand this from the perspective of the application layer