Performance comparison of mutex and conditional variables in C++ concurrent programming

Posted by plasmagames on Sun, 01 Sep 2019 19:06:08 +0200

introduce

This paper uses the simplest producer-consumer model to compare the performance of mutex and mutex+conditional variables by running a program and observing the cpu usage of the process.

The producer-consumer model for this example, one producer and five consumers.
Producer threads put data into the queue, five consumer threads fetch data from the queue. Before fetching data, you need to determine if there is data in the queue. This queue is a global queue and is shared data between threads, so you need to protect it with mutex locks.That is, when a producer puts data in a queue, the rest of the consumers cannot take it, and vice versa.

Code for mutex implementation

#include <iostream> // std::cout
#include <deque>    // std::deque
#include <thread>   // std::thread
#include <chrono>   // std::chrono
#include <mutex>    // std::mutex


// Global Queue
std::deque<int> g_deque;

// Global Lock
std::mutex g_mutex;

// Producer Run Marker
bool producer_is_running = true;

// Producer Thread Function
void Producer()
{
    // Number of Inventories
    int count = 8;
    
    do
    {
        // Smart lock, lock when initialized, protect within code curly brackets, and automatically unlock when curly brackets exit
        // You can manually unlock to control the fine granularity of mutexes
        std::unique_lock<std::mutex> locker( g_mutex );
        // Queue a data
        g_deque.push_front( count );
        // Unlock ahead of time, reduce the fine-grained mutex, and synchronize protection only for shared queue data
        locker.unlock(); 

        std::cout << "Producer    : I have a stock now :" << count << std::endl;
            
        // Slow down producer production and sleep for 1 second
        std::this_thread::sleep_for( std::chrono::seconds( 1 ) );

        // Inventory Decrease
        count--;
    } while( count > 0 );
    
    // Mark Producer Proofed
    producer_is_running = false;

    std::cout << "Producer    : I'm out of stock. I'm proofing!"  << std::endl;
}

// Consumer Thread Function
void Consumer(int id)
{
    int data = 0;

    do
    {
        std::unique_lock<std::mutex> locker( g_mutex );
        if( !g_deque.empty() )
        {
            data = g_deque.back();
            g_deque.pop_back();
            locker.unlock();

            std::cout << "Consumer[" << id << "] : The number of my rush arrival is :" << data << std::endl;
        }
        else
        {
            locker.unlock();
        }
    } while( producer_is_running );
    
    std::cout << "Consumer[" << id << "] : The seller has no proofing. What a pity, come back next time!"  << std::endl;
}

int main(void)
{
    std::cout << "1 producer start ..." << std::endl;
    std::thread producer( Producer );

    std::cout << "5 consumer start ..." << std::endl;
    std::thread consumer[ 5 ];
    for(int i = 0; i < 5; i++)
    {
        consumer[i] = std::thread(Consumer, i + 1);
    }

    producer.join();

    for(int i = 0; i < 5; i++)
    {
        consumer[i].join();
    }

    std::cout << "All threads joined." << std::endl;

    return 0;
}

Mutex implementation results:

Output of Results

[root@lincoding condition]# g++ -std=c++0x -pthread -D_GLIBCXX_USE_NANOSLEEP main.cpp -o  main
[root@lincoding condition]# ./main
1 producer start ...
5 consumer start ...
Producer: I now have: 8
 Consumer [1]: The number I grabbed is: 8
 Consumer [1]: The number I grabbed is: 7
 Producer: I now have: 7
 Producer: I now have: 6
 Consumer [3]: The number I grabbed is: 6
 Producer: I now have: 5
 Consumer [1]: The number I grabbed is: 5
 Producer: I now have: 4
 Consumer [2]: The number I grabbed is: 4
 Producer: I now have: 3
 Consumer [5]: The number I grabbed is: 3
 Producer: I now have: 2
 Consumer [2]: The number I grabbed is: 2
 Producer: I now have: 1
 Consumer [1]: The number I grabbed is: 1
 Producer: I'm out of stock. I'm going to proof it!Consumer [
5]: The seller has no proofing. What a pity, come back next time!
Consumer [2]: The seller has no proofing. What a pity, come back next time!
Consumer [3]: The seller has no proofing. What a pity, come back next time!
Consumer [4]: The seller has no proofing. What a pity, come back next time!
Consumer [1]: The seller has no proofing. What a pity, come back next time!
All threads joined.

You can see that mutexes can actually accomplish this task, but there are performance issues.

  • Producer is a producer thread that takes a second break during the producer data process, so this process is slow;
  • Consumer is a consumer thread, and there is a while cycle. Only when the producer is judged not to be running, the while cycle will exit. Every time in the loop, the queue will be locked, the queue will be judged not empty, then a data will be extracted from the queue and finally unlocked.So, when the producer rests for one second, the consumer thread actually does a lot of useless work, resulting in very high CPU usage!

The running environment is a 4-core cpu

[root@lincoding ~]# grep 'model name' /proc/cpuinfo | wc -l
4

The top command looks at the cpu usage and shows that the cost of using a pure mutex lock cpu is very high. The main process has 357.5%CPU usage, 54.5%sy cpu overhead, and 18.2%us user overhead cpu usage.

[root@lincoding ~]# top
top - 19:13:41 up 36 min,  3 users,  load average: 0.06, 0.05, 0.01
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.2%us, 54.5%sy,  0.0%ni, 27.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1004412k total,   313492k used,   690920k free,    41424k buffers
Swap:  2031608k total,        0k used,  2031608k free,    79968k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                       
 35346 root      20   0  137m 3288 1024 S 357.5  0.3   0:05.92 main                                                                                                                          
     1 root      20   0 19232 1492 1224 S  0.0  0.1   0:02.16 init                                                                                                                           
     2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd                                                                                                                       
     3 root      RT   0     0    0    0 S  0.0  0.0   0:00.68 migration/0  

One solution is to add an additional delay to the consumer, which can reduce the cost of mutually exclusive locks on the cpu by taking a 500-millisecond break when they don't get data.

// Consumer Thread Function
void Consumer(int id)
{
    int data = 0;

    do
    {
        std::unique_lock<std::mutex> locker( g_mutex );
        if( !g_deque.empty() )
        {
            data = g_deque.back();
            g_deque.pop_back();
            locker.unlock();

            std::cout << "Consumer[" << id << "] : The number of my rush arrival is :" << data << std::endl;
        }
        else
        {
            locker.unlock();
            // Consumers take a 500-millisecond break when they don't get data
            std::this_thread::sleep_for( std::chrono::milliseconds( 500 ) );
        }
    } while( producer_is_running );
    
    std::cout << "Consumer[" << id << "] : The seller has no proofing. What a pity, come back next time!"  << std::endl;
}

As you can see from the running results, the cpu usage has been greatly reduced

[root@lincoding ~]# ps aux | grep -v grep  |grep main
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      61296  0.0  0.1 141068  1244 pts/1    Sl+  19:40   0:00 ./main

Conditional variable + mutex implementation code

So the question arises, how can I determine how long consumers will be delayed?

  • If the producer produces very fast and the consumer delays 500 milliseconds, that's not good.
  • If the producer produces more slowly, the consumption will be delayed by 500 milliseconds, and there will also be no power and CPU usage

This requires the introduction of the conditional variable std::condition_variable, which is used in the consumer production model to wake up the waiting () consumer thread through notify_one() after the producer has produced a data so that the consumer can pull a data out of the queue.

#include <iostream> // std::cout
#include <deque>    // std::deque
#include <thread>   // std::thread
#include <chrono>   // std::chrono
#include <mutex>    // std::mutex

#include <condition_variable> // std::condition_variable


// Global Queue
std::deque<int> g_deque;

// Global Lock
std::mutex g_mutex;

// Global Conditional Variables
std::condition_variable g_cond;

// Producer Run Marker
bool producer_is_running = true;

// Producer Thread Function
void Producer()
{
    // Number of Inventories
    int count = 8;
    
    do
    {
        // Smart lock, lock when initialized, protect within code curly brackets, and automatically unlock when curly brackets exit
        // You can manually unlock to control the fine granularity of mutexes
        std::unique_lock<std::mutex> locker( g_mutex );
        // Queue a data
        g_deque.push_front( count );
        // Unlock ahead of time, reduce the fine-grained mutex, and synchronize protection only for shared queue data
        locker.unlock(); 

        std::cout << "Producer    : I have a stock now :" << count << std::endl;
        
        // Wake up a thread
        g_cond.notify_one();
        
        // Sleep for 1 second
        std::this_thread::sleep_for( std::chrono::seconds( 1 ) );

        // Inventory Decrease
        count--;
    } while( count > 0 );
    
    // Mark Producer Proofed
    producer_is_running = false;
    
    // Wake up all consumer threads
    g_cond.notify_all();
    
    std::cout << "Producer    : I'm out of stock. I'm proofing!"  << std::endl;
}

// Consumer Thread Function
void Consumer(int id)
{
    // Item number purchased
    int data = 0;

    do
    {
        // Smart lock, lock when initialized, protect within code curly brackets, and automatically unlock when curly brackets exit
        // You can manually unlock to control the fine granularity of mutexes
        std::unique_lock<std::mutex> locker( g_mutex );
        
        // The wait() function calls the mutually exclusive unlock() function first, then sleeps itself, and after waking up, it continues to hold the lock, protecting the queue operations that follow.
        // You must use unique_lock, not lock_guard, because lock_guard does not have lock and unlock interfaces, and unique_lock provides both
        g_cond.wait(locker); 
        
        // Queue is not empty
        if( !g_deque.empty() )
        {
            // Remove the last data from the queue
            data = g_deque.back();
            
            // Delete the last data in the queue
            g_deque.pop_back();
            
            // Unlock ahead of time, reduce the fine-grained mutex, and synchronize protection only for shared queue data
            locker.unlock(); 

            std::cout << "Consumer[" << id << "] : The number of my rush arrival is :" << data << std::endl;
        }
        // Queue empty
        else
        {
            locker.unlock();
        }
    
    } while( producer_is_running );
    
    std::cout << "Consumer[" << id << "] : The seller has no proofing. What a pity, come back next time!"  << std::endl;
}

int main(void)
{
    std::cout << "1 producer start ..." << std::endl;
    std::thread producer( Producer );

    std::cout << "5 consumer start ..." << std::endl;
    std::thread consumer[ 5 ];
    for(int i = 0; i < 5; i++)
    {
        consumer[i] = std::thread(Consumer, i + 1);
    }

    producer.join();

    for(int i = 0; i < 5; i++)
    {
        consumer[i].join();
    }

    std::cout << "All threads joined." << std::endl;

    return 0;
}

Conditional variable + mutex run result

[root@lincoding condition]# g++ -std=c++0x -pthread -D_GLIBCXX_USE_NANOSLEEP main.cpp -o  main
[root@lincoding condition]# 
[root@lincoding condition]# ./main 
1 producer start ...
5 consumer start ...
Producer: I now have: 8
 Consumer [4]: The number I grabbed is: 8
 Producer: I now have: 7
 Consumer [2]: The number I grabbed is: 7
 Producer: I now have: 6
 Consumer [3]: The number I grabbed is: 6
 Producer: I now have: 5
 Consumer [5]: The number I grabbed is: 5
 Producer: I now have: 4
 Consumer [1]: The number I grabbed is: 4
 Producer: I now have: 3
 Consumer [4]: The number I grabbed is: 3
 Producer: I now have: 2
 Consumer [2]: The number I grabbed is: 2
 Producer: I now have: 1
 Consumer [3]: The number I grabbed is: 1
 Producer: I'm out of stock. I'm going to proof it!
Consumer [5]: The seller has no proofing. What a pity, come back next time!
Consumer [1]: The seller has no proofing. What a pity, come back next time!
Consumer [4]: The seller has no proofing. What a pity, come back next time!
Consumer [2]: The seller has no proofing. What a pity, come back next time!
Consumer [3]: The seller has no proofing. What a pity, come back next time!
All threads joined.

Very low CPU overhead

[root@lincoding ~]# ps aux | grep -v grep  |grep main
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      73838  0.0  0.1 141068  1256 pts/1    Sl+  19:54   0:00 ./main

summary

In a scenario where the production speed of a producer is uncertain, mutexes alone cannot be used to protect shared data, which incurs a significant performance overhead on the CPU. When a producer thread produces a data, it awakens the consumer thread to consume and avoids some nullity by using mutexes + conditional variables.The performance overhead of the effort.

Topics: C++ REST