Linux performance optimization gprof usage

Posted by scorpioy on Sat, 29 Jan 2022 08:20:52 +0100

  • gprof is used to analyze the time-consuming of function calls. It can be used to catch the most time-consuming functions in order to optimize the program.
  • The - pg parameter must also be added when linking gcc, so that gmon.com can be generated after the program runs Out file for gprof analysis.
  • gprof does not support multi-threaded programs by default, and does not support shared library programs by default.
  1. gcc -pg compiler
  2. Run the program and generate gmon when the program exits out
  3. gprof ./prog gmon.out -b View output

To produce gmon Out files must be compiled and linked with the - pg - g option.

 

1. Introduction

Improving the performance of an application is a very time-consuming and labor-intensive work, but it is usually not very obvious which functions in the program consume most of the execution time. GNU compiler toolkit provides a profiling tool GNU profiler (gprof). gprof# can accurately analyze performance bottlenecks for programs on Linux platform. gprof accurately gives the time and number of function calls, and gives the function call relationship.

 

gprof# user manual website http://sourceware.org/binutils/docs-2.17/gprof/index.html

 

2. Function

Gprof , is one of the GNU , GNU , binutils tools. By default, this tool is included in the linux system.

1. The "flat profile" can be displayed, including the number of calls of each function and the processor time consumed by each function,

2. The "Call graph" can be displayed, including the Call relationship of functions and how long each function Call takes.

3. "Annotated source code" can be displayed - it is a copy of the program source code, marked with the execution times of each line of code in the program.

 

3. Principle

By compiling and linking the program (using the - pg compile and link option), gcc adds a function called mcount (or "_mcount", or "_mcount", which depends on the compiler or operating system) to each function of your application, that is, every function in your application will call mcount, Mcount # will save a function call graph in memory and find the addresses of child functions and parent functions in the form of function call stack. This call graph also saves all the information related to the function, such as call time, call times, etc.

 

4. Use process

1. Add the - pg option when compiling and linking. Generally, we can add it to "makefile".

2. Execute the compiled binary program. The execution parameters and methods are the same as before.

3. Generate {gmon. In the program running directory Out file. If there was gmon Out file will be rewritten.

4. End the process. At this time, gmon Out will be refreshed again.

5. Analyze gmon with gprof tool Out file.

 

5. Parameter description

l -b# no longer output the detailed description of each field in the statistical chart.

l - p ^ only the Call graph of the function (the part of the Call graph information) is output.

l -q# only the time consumption list of the function is output.

l - e Name no longer outputs the call graph of function Name and its child functions (unless they have other unrestricted parent functions). Multiple -e# flags can be given. A - e# flag can only specify one function.

l - e Name no longer outputs the call graph of function Name and its sub functions. This flag is similar to the - e flag, but it excludes the time used by function Name and its sub functions in the calculation of total time and percentage time.

l - F # Name # output the call graph of function Name # and its sub functions. Multiple -f# flags can be specified. A - F # flag can specify only one function.

l - F # Name # outputs the call graph of the function Name # and its sub functions, which is similar to the - F # flag, but it only uses the time of the printed routine in the calculation of total time and percentage time. Multiple - F # flags can be specified. A - F # flag can only specify one function- F , sign covers - E , sign.

l - z# displays routines that have been used for zero times (calculated by call count and cumulative time).

 

General usage: gprof – b # binary program # gmon out >report. txt

 

6. Report description

Interpretation of information generated by Gprof:

  %time

Cumulative

seconds

Self 

Seconds

Calls

Self

TS/call

Total

TS/call

name

The percentage of time consumed by this function in the total time of the program

Cumulative execution time of the program

(only functions that gprof can monitor)

The execution time of the function itself

(total time of all calls)

Number of function calls

Average function execution time

(excluding called time)

(single execution time of function)

Average function execution time

(including called time)

 

(single execution time of function)

Function name

 

Meaning of Call Graph field:

Index

%time

Self

Children

Called

Name

Index value

Percentage of all time consumed by function

Function execution time

Time spent executing sub functions

Number of calls

Function name

 

be careful:

The cumulative execution time of the program only includes the functions that gprof can monitor. Functions in kernel mode and third-party library functions without - pg compilation cannot be monitored by gprof (such as sleep())

The specific parameters of Gprof # can be queried through # man # Gprof #.

 

7. Shared library support

The support for code parsing is increased by the compiler, so if you want to obtain profiling information from shared libraries, you need to use - pg # to compile these libraries. Provide a version of the C # Library (libc_p.a) compiled with code profiling support enabled.

If you need to analyze system functions (such as libc Library), you can use – lc_p replace - lc. In this way, the program will link libc_p.so or libc_p.a. This is very important because only in this way can the execution time of the underlying c library functions be monitored (for example, memcpy(), memset(), sprintf(), etc.).

gcc example1.c –pg -lc_p -o example1

Be careful to use ldd/ example | grep | libc to view the program link is libc So or libc_p.so

 

8. User time and kernel time

The biggest drawback of gprof # is that it can only analyze the user time consumed by the application in the running process, and can not get the running time of the program kernel space. Generally speaking, when an application runs, it takes some time to run both user code and "system code", such as the kernel system call sleep().

There is a method to view the running time composition of the application, and execute the program under the # time # command. This command will display the actual running time, user space running time and kernel space running time of an application.

For example, time/ program

Output:

real    2m30.295s

user    0m0.000s

sys     0m0.004s

 

9. Precautions

1. g + + should use the - pg option in both compiling and linking processes.

2. Only static connection libc library can be used, otherwise * Calling the profile code before so will cause "segmentation fault". The solution is to add - static libgcc or - static during compilation.

3. If you use ld direct link program instead of g + +, add the link file / lib / gcrt0 o. Such as ld -o# myprog / lib / gcrt0 o myprog.o utils.o -lc_p. Or gcrt1 o

4. To monitor the execution time of third-party library functions, the third-party library must also be compiled with the – pg option.

5. gprof can only analyze the user time consumed by the application

6. The program cannot run in demon mode. Otherwise, the acquisition time cannot be reached. (number of calls that can be collected)

7. First, it is a good way to run the program with gprof to judge whether gprof can generate useful information.

8. If # gprof # is not suitable for your analysis needs, there are other tools that can overcome some defects of # gprof # including # OProfile # and # Sysprof.

9. gprof is obviously useful for CPU intensive programs whose code is mostly user space. It is difficult to optimize programs that run in kernel space most of the time or run very slowly due to external factors (such as the overload of the # I/O # subsystem of the operating system).

10. gprof# does not support multi-threaded applications. Only the performance data of the main thread can be collected under multi threading. The reason is that gprof adopts ITIMER_PROF signal. In multithreading, only the main thread can respond to the signal. But there is a simple way to solve this problem: http://sam.zoy.org/writings/programming/gprof.html

11. gprof can only generate a report (gmon.out) after the program normally ends and exits.

a) Reason: gprof generates the result information by registering a function in atexit(). Any abnormal exit will not execute the action of atexit(), so it will not generate gmon Out file.

b) The program can exit normally from the main function or exit through the system call exit() function.

 

 

10. Multithreaded applications

gprof # does not support multithreaded applications, and only the performance data of the main thread can be collected under multithreading. The reason is that gprof adopts ITIMER_PROF signal. In multithreading, only the main thread can respond to the signal.

What method can be used to analyze all threads? The key is to enable each thread to respond to ITIMER_PROF signal. You can implement it through the stake function and rewrite pthread_create function.

 gprof-helper.c

#define _GNU_SOURCE

#include <sys/time.h>

#include <stdio.h>

#include <stdlib.h>

#include <dlfcn.h>

#include <pthread.h>

 

static void * wrapper_routine(void *);

 

/* Original pthread function */

static int (*pthread_create_orig)(pthread_t *__restrict,

                                  __const pthread_attr_t *__restrict,

                                  void *(*)(void *),

                                  void *__restrict) = NULL;

 

/* Library initialization function */

void wooinit(void) __attribute__((constructor));

 

void wooinit(void)

{

    pthread_create_orig = dlsym(RTLD_NEXT, "pthread_create");

    fprintf(stderr, "pthreads: using profiling hooks for gprof/n");

    if(pthread_create_orig == NULL)

    {

        char *error = dlerror();

        if(error == NULL)

        {

            error = "pthread_create is NULL";

        }

        fprintf(stderr, "%s/n", error);

        exit(EXIT_FAILURE);

    }

}

 

/* Our data structure passed to the wrapper */

typedef struct wrapper_s

{

    void * (*start_routine)(void *);

    void * arg;

    pthread_mutex_t lock;

    pthread_cond_t  wait;

    struct itimerval itimer;

} wrapper_t;

 

/* The wrapper function in charge for setting the itimer value */

static void * wrapper_routine(void * data)

{

    /* Put user data in thread-local variables */

    void * (*start_routine)(void *) = ((wrapper_t*)data)->;start_routine;

    void * arg = ((wrapper_t*)data)->;arg;

 

    /* Set the profile timer value */

    setitimer(ITIMER_PROF, &((wrapper_t*)data)->;itimer, NULL);

 

    /* Tell the calling thread that we don't need its data anymore */

    pthread_mutex_lock(&((wrapper_t*)data)->;lock);

    pthread_cond_signal(&((wrapper_t*)data)->;wait);

    pthread_mutex_unlock(&((wrapper_t*)data)->;lock);

 

    /* Call the real function */

    return start_routine(arg);

}

 

/* Our wrapper function for the real pthread_create() */

int pthread_create(pthread_t *__restrict thread,

                   __const pthread_attr_t *__restrict attr,

                   void * (*start_routine)(void *),

                   void *__restrict arg)

{

    wrapper_t wrapper_data;

    int i_return;

 

    /* Initialize the wrapper structure */

    wrapper_data.start_routine = start_routine;

    wrapper_data.arg = arg;

    getitimer(ITIMER_PROF, &wrapper_data.itimer);

    pthread_cond_init(&wrapper_data.wait, NULL);

    pthread_mutex_init(&wrapper_data.lock, NULL);

    pthread_mutex_lock(&wrapper_data.lock);

 

    /* The real pthread_create call */

    i_return = pthread_create_orig(thread,

                                   attr,

                                   &wrapper_routine,

                                   &wrapper_data);

 

    /* If the thread was successfully spawned, wait for the data

     * to be released */

    if(i_return == 0)

    {

        pthread_cond_wait(&wrapper_data.wait, &wrapper_data.lock);

    }

 

    pthread_mutex_unlock(&wrapper_data.lock);

    pthread_mutex_destroy(&wrapper_data.lock);

    pthread_cond_destroy(&wrapper_data.wait);

 

    return i_return;

}

 

///

Then compile it into a dynamic library gcc -shared -fPIC gprof-helper.c -o gprof-helper.so -lpthread -ldl 

 

Use examples:

/a.c/

#include <stdio.h>;

#include <stdlib.h>;

#include <unistd.h>;

#include <pthread.h>;

#include <string.h>;

void fun1();

void fun2();

void* fun(void * argv);

int main()

{

        int i =0;

        int id;

        pthread_t    thread[100];

        for(i =0 ;i< 100; i++)

        {

                id = pthread_create(&thread[i], NULL, fun, NULL);

                printf("thread =%d/n",i);

        }

        printf("dsfsd/n");

        return 0;

}

void*  fun(void * argv)

{

        fun1();

        fun2();

        return NULL;

}

 

void fun1()

{

        int i = 0;

        while(i<100)

        {

                i++;        

                printf("fun1/n");

        }        

}

 

void fun2()

{

        int i = 0;

        int b;

        while(i<50)

        {

                i++;

                printf("fun2/n");

                //b+=i;        

        }        

}

///

 

gcc -pg a.c  gprof-helper.so

 

Run program:

./a.out

 

analysis gmon.out:

gprof -b a.out gmon.out

 

Topics: Linux performance