pthread_ Problems and solutions of deadlock caused by canceling exiting thread

Posted by james_holden on Sun, 30 Jan 2022 01:47:42 +0100

There are two kinds of thread termination in Posix: normal termination and abnormal termination. Thread actively calls pthread_exit() or return from the thread function will make the thread exit normally, which is a predictable exit method; Abnormal termination means that a thread exits under the intervention of other threads or due to its own running error (such as accessing an illegal address), such as pthreead_cancel, this exit method is unpredictable. Whether it is foreseeable thread termination or abnormal termination, there will be the problem of resource release. Without considering the exit due to running error, how to ensure that the resources occupied by the thread can be released smoothly when the thread terminates, especially the lock resources, is a problem that must be considered and solved.

The most common situation is the use of resource exclusive lock: the thread locks the critical resource in order to access it, but is cancelled by the outside world during the access process. If the thread is in the response cancellation state and responds asynchronously, or there is a cancellation point on the running path before opening the exclusive lock, Then the critical resource will always be locked and cannot be released. External cancellation operations are unpredictable, so a mechanism is indeed needed to simplify programming for resource release.

Function cancellation point in POSIX:
            pthread_join
            pthread_cond_wait
            thread_cond_timewait
            pthread_testcancel
            sem_wait
sigwait # are all cancellation points
The following system functions are also cancellation points:
             accept
             fcntl
             open
             read
             write
             lseek
             close
             send
            sendmsg
             sendto
            connect
             recv
            recvfrom
            recvmsg
             system
            tcdrain
             fsync
             msync
             pause
             wait
            waitpid
            nanosleep

When other threads call pthreead_cancel will cause this thread to exit the thread after these functions.

The default test code is as follows:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/syscall.h>


pthread_mutex_t mutexA;
int thStop = 0;
int is_safemode = 0;
int is_safe_exit = 0;
int is_setcancle = 0;
int is_notify = 0;

void *thread_function1(void *arg)
{
  pthread_t threadId = 0;
  long int pid = getpid();
  long int lwpId = syscall(SYS_gettid);
  threadId  = (pthread_t)(pthread_self());
  printf("thread[0x%lx][%ld][%ld] in function1\n",threadId,lwpId,pid);

  while(1)
  {
    printf("function1 owner:%ld waiting lock owner:%d ...\n",lwpId,mutexA.__data.__owner);
    pthread_mutex_lock(&mutexA);
    printf("function1 mutex:owner::%d;count::%d;lock:%d\n",
             mutexA.__data.__owner,mutexA.__data.__count,mutexA.__data.__lock);
    printf("I an thread[0x%lx][%ld] function1\n",threadId,lwpId);
    sleep(1);
    pthread_mutex_unlock(&mutexA);
    sleep(1);
  }
}

void clean_function2_res(void *arg)
{
  int lwpid = (int)*((int *)arg);
  if(!is_notify)
  {
   return;
  }
  printf("clean function2 res lwpid:%d\n",lwpid);
  if(mutexA.__data.__owner == lwpid)
  {
    pthread_mutex_unlock(&mutexA);
    printf("clean function2 res lock\n");
   }
}

void *thread_function2(void *arg)
{
  int oldstate = 0;
  int waitCount = 0;
  pthread_t threadId = 0;
  long int pid = getpid();
  int lwpId = syscall(SYS_gettid);
  threadId  = (pthread_t)(pthread_self());
  printf("thread[0x%lx][%d][%ld] in function2\n",threadId,lwpId,pid);
  pthread_cleanup_push(clean_function2_res,(void *)&lwpId);

  while(1)
  {
    printf("function2 owner:%d waiting lock owner:%d ...\n",lwpId,mutexA.__data.__owner);
    pthread_mutex_lock(&mutexA);
    printf("function2 mutex:owner::%d;count::%d;lock:%d\n",
             mutexA.__data.__owner,mutexA.__data.__count,mutexA.__data.__lock);
    if(thStop)
    {
      while(1)
      {
        if((is_safemode) && (is_safe_exit))
        {
          break;
        }
        printf("waiting thread[0x%ld] cancel...\n",threadId);
        usleep(500000);
        if(is_setcancle)
        {
         waitCount ++;
         pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,&oldstate);
         printf("pthread cancel oldstatue:%d;[%d]:[%d]\n",oldstate,PTHREAD_CANCEL_DISABLE,PTHREAD_CANCEL_ENABLE);
         if(waitCount > 10)
         {
          printf("it will into cancel pthread point\n");
          pthread_mutex_unlock(&mutexA);
          sleep(1);
          pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);
          //printf("waiting cancel point sleep\n");
          //usleep(500000);
          printf("waiting cancel testcancel point\n");
          pthread_testcancel();
          printf("test cancel point\n");
          while(1)
          {
            printf("waiting cancel pthread...\n");
            usleep(500000);
          }
         }
        }
      }
    }
    else
    {
     printf("I an thread[0x%lx][%d] function2\n",threadId,lwpId);
     sleep(1);
    }
    pthread_mutex_unlock(&mutexA);
    sleep(1);
    if((is_safemode) && (is_safe_exit))
    {
     break;
    }
  }


  if(is_safemode)
  {
   printf("exit pthread by safe mode\n");
   pthread_exit(NULL);
  }

 pthread_cleanup_pop(0);

}

int main(int avgc,char **pp_argv)
{
  pthread_t mthid = -1;
  unsigned int count = 0;
  int ret = -1;
  int mode = 0;

  if(avgc >= 2)
   {
    mode = atoi(pp_argv[1]);
   }

   switch(mode)
   {
     case 1:
     is_notify = 1;
     break;
     case 2:
     is_safemode = 1;
     break;
     case 3:
     is_setcancle = 1;
     break;
     case 0:
     default:
     break;
   }

  printf("notify clean mode:%d\n",is_notify);
  printf("safe mode:%d\n",is_safemode);
  printf("set cancle mode:%d\n",is_setcancle);


  is_safe_exit = 0;
  thStop = 0;
  pthread_mutex_init(&mutexA, NULL);

  pthread_create(&mthid,NULL,thread_function1,NULL);
  printf("create thread:0x%lx\n",mthid);

  pthread_create(&mthid,NULL,thread_function2,NULL);
  printf("create thread:0x%lx\n",mthid);

  do{
    sleep(1);
    count ++;
    printf("main thread count:%d...\n",count);
   }while(count < 10);

  thStop = 1;
  sleep(3);

  if(is_safemode)
  {
    is_safe_exit = 1;
  }
  else
 {
  pthread_cancel(mthid);
 }

  pthread_join(mthid,(void *)&ret);

  while(1)
  {
   printf("main thread function...\n");
   sleep(1);
  }

  pthread_mutex_destroy(&mutexA);

}

Compilation: GCC - G mylock c -lpthread -o mylock

Recurrence problem:/ mylock 0 # forcibly enters the deadlock environment;

The main thread calls thStop = 1; Let thread_function2 enters the lock state and then calls pthread_. cancel(mthid); Terminate thread_function2 ,thread_function1 because thread_ The exit of function2 does not determine whether the mutex leads to the failure to obtain the mutex, resulting in the deadlock and stopping operation;

Solution 1: register thread cleanup callback

void pthread_cleanup_push(void (*routine) (void *), void *arg)
void pthread_cleanup_pop(int execute)

pthread_cleanup_push()/pthread_cleanup_pop() is managed in a first in then out stack structure. The void routine(void *arg) function calls pthread_ cleanup_ When pushing (), press the cleanup function stack and execute pthread multiple times_ cleanup_ The call of push () will form a function chain in the cleanup function stack; From pthread_ cleanup_ Call point of push to pthread_ cleanup_ Termination actions in program segments between pops (including calling pthread_exit(), pthread)_ Both cancel and abnormal termination (excluding return) will execute pthread_ cleanup_ The cleanup function specified by push().

Operation result reference/ mylock 1

Solution 2: thread exits safely. External threads should not use pthread_cancel ends the thread, but uses the notification method. After receiving the message or parameter, the thread releases the resources and exits safely,

Operation result reference/ mylock 2

Solution 3: cancel the thread pair pthread in the secure public resource_ Response to cancel.

Set the response of this thread to the cancel signal. State has two values: PTHREAD_CANCEL_ENABLE (default) and PTHREAD_CANCEL_DISABLE, respectively, indicates that after receiving the signal, it is set to the cancel state and continues to operate ignoring the cancel signal; old_ If state is not NULL, the original cancel state is saved for recovery.

pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,&oldstate);

/***free resource executes the code safely***/

pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);

Set cancellation point pthread_testcancel,

Operation result reference/ mylock 3

 

 

 

Topics: Linux Multithreading Embedded system