Second kill STL core dump probability problem

Posted by jsbrown on Sat, 25 Dec 2021 03:12:01 +0100

I Fate comes and starts

Previously, after the code developed by a colleague went online, there was a core dump problem. After checking for several hours, it was difficult to find the cause.

I was personally interested in checking bug s, so I asked him. He said that there was a problem at a sort function of vector, and I couldn't think of the solution.

From experience, I guess there may be a problem with the implementation of the compare function, and then I asked him to make small changes. Sure enough, a miracle happened.

II core dump program

The original scene is relatively complex. In order to facilitate narration, I will simplify the original scene and learn the core dump problem with you

#include <iostream>#include <vector>#include <algorithm>using namespace std; bool compare(int a, int b){    return a >= b;} int main(int argc, char *argv[]){    vector<int> vec;     for (int i = 0; i < 17; i++)    {        int x = 0;        vec.push_back(x);    }     sort(vec.begin(), vec.end(), compare);     return 0;}

Compile and run the following:

ubuntu@VM-0-15-ubuntu:~/taoge/cpp$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~/taoge/cpp$ ./a.out Segmentation fault (core dumped)ubuntu@VM-0-15-ubuntu:~/taoge/cpp$ 

You can see that the program core dump. What should I do? Debug it.

III Debugging program

The program core dump can be directly started by gdb a.out core analysis (a.out will not run at this time), or gdb a.out can be directly used to run the program again:

ubuntu@VM-0-15-ubuntu:~/taoge/cpp$ gdb a.out GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1Copyright (C) 2016 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:<http://www.gnu.org/software/gdb/documentation/>.For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from a.out...done.(gdb) (gdb) (gdb) (gdb) rStarting program: /home/ubuntu/taoge/cpp/a.out 
Program received signal SIGSEGV, Segmentation fault.0x000000000040219f in __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)>::operator()<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (    this=0x7fffffffe1a0, __it1=<error reading variable: Cannot access memory at address 0x638000>, __it2=0)    at /usr/include/c++/5/bits/predefined_ops.h:123123             { return bool(_M_comp(*__it1, *__it2)); }(gdb) (gdb) (gdb) (gdb) bt#0  0x000000000040219f in __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)>::operator()<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (    this=0x7fffffffe1a0, __it1=<error reading variable: Cannot access memory at address 0x638000>, __it2=0)    at /usr/include/c++/5/bits/predefined_ops.h:123#1  0x000000000040207c in std::__unguarded_partition<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)> > (__first=<error reading variable: Cannot access memory at address 0x638000>,     __last=0, __pivot=0, __comp=...) at /usr/include/c++/5/bits/stl_algo.h:1897#2  0x0000000000401a6d in std::__unguarded_partition_pivot<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)> > (__first=0, __last=0, __comp=...)    at /usr/include/c++/5/bits/stl_algo.h:1918#3  0x00000000004016d0 in std::__introsort_loop<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)> > (__first=0, __last=0, __depth_limit=13, __comp=...)    at /usr/include/c++/5/bits/stl_algo.h:1948#4  0x00000000004012c9 in std::__sort<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)> > (__first=0, __last=0, __comp=...) at /usr/include/c++/5/bits/stl_algo.h:1963#5  0x0000000000400de0 in std::sort<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, bool (*)(int, int)> (__first=0, __last=0, __comp=0x400aa6 <compare(int, int)>) at /usr/include/c++/5/bits/stl_algo.h:4729#6  0x0000000000400b41 in main (argc=1, argv=0x7fffffffe448) at test.cpp:21(gdb) 

You can see the process call stack. Then, go to frame 1 to see:

(gdb) f 1#1  0x000000000040207c in std::__unguarded_partition<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(int, int)> > (__first=<error reading variable: Cannot access memory at address 0x638000>,     __last=0, __pivot=0, __comp=...) at /usr/include/c++/5/bits/stl_algo.h:18971897              while (__comp(__first, __pivot))(gdb) i args__first = <error reading variable __first (Cannot access memory at address 0x638000)>__last = 0__pivot = 0__comp = {_M_comp = 0x400aa6 <compare(int, int)>}(gdb) p *--__first$6 = (int &) @0x637ffc: 0(gdb) 

so what? What should I do? Look at the source code.

IV Source code analysis

According to the source code:__ first crossed the line, and--__ first just doesn't cross the boundary, and the value is 0. Therefore, the memory cross boundary causes the core dump to enter__ unguarded_ The logic of partition, whose code is:

/// This is a helper function...    template<typename _RandomAccessIterator, typename _Tp, typename _Compare>      _RandomAccessIterator      __unguarded_partition(_RandomAccessIterator __first,                _RandomAccessIterator __last,                _Tp __pivot, _Compare __comp)      {        while (true)      {        while (__comp(*__first, __pivot))          ++__first;        --__last;        while (__comp(__pivot, *__last))          --__last;        if (!(__first < __last))          return __first;        std::iter_swap(__first, __last);        ++__first;      }   }  

Assuming that the values of all elements to be sorted are 0, and our customized compare returns true for equal values, the following code will run all the time

  while (__comp(*__first, __pivot))          ++__first;

Until__ First out of bounds, and*__ First forms cross-border access. In retrospect, during gdb debugging, I found that*--__ The value of first is 0, visible__ First just crossed the line a little and just cheated.

The above compare will cross the boundary only under specific circumstances, so it reflects the probabilistic core dump If all values are not equal, core dump is not necessarily generated  

In addition, if 17 of the above program is changed to 16, the program will not core dump. Why? Let's look at the source program:

template<typename _RandomAccessIterator>      void __final_insertion_sort(_RandomAccessIterator __first,                     _RandomAccessIterator __last) {          if (__last - __first > _S_threshold)          {              __insertion_sort(__first, __first + _S_threshold);              __unguarded_insertion_sort(__first + _S_threshold, __last);          }          else __insertion_sort(__first, __last);      }

And_ S_ The value of threshold is exactly 16:

enum { _S_threshold = 16 };

V Repair verification

According to the above actual debugging and code analysis, we draw an important conclusion: compare must return false for equality  

Obviously, in the compare function above, when a and b are equal, compare returns true, which is problematic.

Let's take a look at the repaired program, as follows:

#include <iostream>#include <vector>#include <algorithm>using namespace std; bool compare(int a, int b){    return a > b;} int main(int argc, char *argv[]){    vector<int> vec;     for (int i = 0; i < 17; i++)    {        int x = 0;        vec.push_back(x);    }     sort(vec.begin(), vec.end(), compare);     return 0;}

After compiling and running, everything is OK, as follows:

ubuntu@VM-0-15-ubuntu:~/taoge/cpp$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~/taoge/cpp$ ./a.out ubuntu@VM-0-15-ubuntu:~/taoge/cpp$ 

Vi Last words

Checking and killing bugs is an important ability in actual development work. It has high requirements for ideas and experience. Later, I will introduce n methods of core dump debugging.

Saturday, the happiest day for migrant workers, I won't say much. I hope you are happy and everything goes well. I also hope this article will be helpful to you. I'll see you next time.