[OpenMP learning notes] interact with the running environment

Posted by Svoboda on Wed, 02 Mar 2022 10:58:47 +0100

Internal Control Variables

The OpenMP standard defines internal control variables. These variables can affect the behavior of the program at runtime, but they cannot be accessed or modified directly. We need to access or modify them through OpenMP functions or environment variables. The following is the defined internal variables

  • Nthread var: number of threads storing parallel domains
  • Dyn var: controls whether the number of threads can be dynamically adjusted during parallel domain execution
  • Nest var: controls whether nested parallelism is allowed when executing in parallel domain
  • Run sched var: the scheduling type stored in the loop regions when the runtime scheduling clause is used
  • Def sched var: stores the default scheduling type for the circular domain

nthread-var

We can set the number of threads in the following ways

OMP_NUM_THREADS We can set OMP on the command line_ NUM_ Threads is the value of the environment variable, which is used to initialize the nthread var variable

omp_set_num_threads In the program, we can use omp_set_num_threads function to set the number of threads. The syntax form is omp_set_num_threads(integer)

num_threads Finally, we can use num when constructing parallel domains_ Threads clause to control the number of threads

The priority of the above three methods increases in turn. In addition, during program execution, we can use the following functions to obtain the number of threads

  • omp_get_max_threads: get the maximum number of threads that can be used. The number can be determined, regardless of whether it is called in serial domain or parallel domain
  • omp_get_num_threads: get the number of currently running threads. If it is not called in the parallel domain, it returns 1
  • omp_get_thread_num: get the number of threads, starting from 0

The following is a usage example

void test_numthread() {
    printf("max thread nums is %d\n", omp_get_max_threads());
    printf("omp_get_num_threads: out parallel region is %d\n", omp_get_num_threads());
    
    omp_set_num_threads(2);
    printf("after omp_set_num_threads: max thread nums is %d\n", omp_get_max_threads());
#pragma omp parallel 
    {
        #pragma omp master
        {
            printf("omp_get_num_threads: in parallel region is %d\n\n", omp_get_num_threads());
        }
        printf("1: thread %d is running\n", omp_get_thread_num());
    }
    printf("\n");
#pragma omp parallel num_threads(3)
    {
        printf("2: thread %d is running\n", omp_get_thread_num());
    }
}

The following is the result of the program:

max thread nums is 4
omp_get_num_threads: out parallel region is 1
after omp_set_num_threads: max thread nums is 2
omp_get_num_threads: in parallel region is 2

1: thread 0 is running
1: thread 1 is running

2: thread 0 is running
2: thread 1 is running
2: thread 2 is running

dyn-var

Whether the dyn var control program is running can dynamically adjust the number of threads, which can be set in the following two ways

OMP_DYNAMIC Through OMP_DYNAMIC environment variable. If set to true, it means dynamic adjustment is allowed. If set to false, it is not allowed

omp_set_dynamic Through omp_set_dynamic function, omp_set_dynamic(1) indicates allowed, omp_set_dynamic(0) means No. note omp_set_dynamic can pass in other non negative integers, but the function is the same as that of inputting 1, which means true

You can use omp_get_dynamic to obtain the status of dynamic. The return values are 0 and 1. The following is an example:

void test_dynamic() {
    printf("dynamic state is %d\n", omp_get_dynamic());

    omp_set_num_threads(6);

    #pragma omp parallel 
    {
        printf("thread %d is running\n", omp_get_thread_num());
    }

    omp_set_dynamic(1);

    printf("\n");
    printf("dynamic state is %d\n", omp_get_dynamic());
    #pragma omp parallel 
    {
        printf("thread %d is running\n", omp_get_thread_num());
    }
}

Here are the output results:

dynamic state is 0
thread 3 is running
thread 4 is running
thread 0 is running
thread 5 is running
thread 1 is running
thread 2 is running

dynamic state is 1
thread 3 is running
thread 1 is running
thread 2 is running
thread 0 is running

When dynamic adjustment is allowed, the second for loop is printed only four times, that is, only four threads are executing Generally speaking, dynamic adjustment will determine the number of threads according to system resources. In most cases, it will generate threads with the same number of CPU s Another point is that the number of threads generated during dynamic adjustment will not exceed the maximum number of threads allowed by the current running environment. In the above code, if omp_set_num_threads(6) changed to omp_set_num_threads(2), then only two threads will be generated during dynamic adjustment

nest-var

Nest VaR is used to control whether nesting and parallelism can be set in the following two ways

OMP_NESTED By setting OMP_NESTED environment variable, true means allowed, false means not allowed

omp_set_nested Through omp_set_nested function, omp_set_nested(1 or other non negative integer) indicates allowed, OMP_ set_ . 0 means not allowed

You can use omp_get_nested to obtain whether nesting and parallelism are allowed. The return value is 0 or 1. The following is an example:

void test_nested() {
    int tid;
    printf("nested state is %d\n", omp_get_nested());

    #pragma omp parallel num_threads(2) private(tid)
    {
        tid = omp_get_thread_num();
        printf("In outer parallel region: thread %d is running\n", tid);
        
        #pragma omp parallel num_threads(2) firstprivate(tid)
        {
            printf("In nested parallel region: thread %d is running and outer thread is %d\n", omp_get_thread_num(), tid);
        }
    }

    omp_set_nested(1);
    printf("\n");
    printf("nested state is %d\n", omp_get_nested());

    #pragma omp parallel num_threads(2) private(tid)
    {
        tid = omp_get_thread_num();
        printf("In outer parallel region: thread %d is running\n", tid);
        
        #pragma omp parallel num_threads(2)
        {
            printf("In nested parallel region: thread %d is running and outer thread is %d\n", omp_get_thread_num(), tid);
        }
    }
}

The following is the result of the program:

nested state is 0
In outer parallel region: thread 0 is running
In nested parallel region: thread 0 is running and outer thread is 0
In outer parallel region: thread 1 is running
In nested parallel region: thread 0 is running and outer thread is 1

nested state is 1
In outer parallel region: thread 1 is running
In outer parallel region: thread 0 is running
In nested parallel region: thread 0 is running and outer thread is 0
In nested parallel region: thread 0 is running and outer thread is 1
In nested parallel region: thread 1 is running and outer thread is 1
In nested parallel region: thread 1 is running and outer thread is 0

When nested parallelism is not allowed, the new parallel domain created in the parallel domain will be executed as a single thread. After nested parallelism is allowed, a new parallel domain will be created in the parallel domain and assigned a new thread for execution

def-sched-var

Through OMP_SCHEDULE environment variable, which can set the scheduling type when cyclic scheduling is runtime. See here

Other functions

omp_get_num_procs Get the number of processors that can be used in the program, which is a global value

omp_in_parallel Judge whether it is in an active parallel region, and return 0 or 1