STL roaming vector

Posted by Davo on Sat, 05 Mar 2022 11:14:05 +0100

std::vector source code analysis

Observe the STL design from the perspective of source code, and the code is implemented as libstdc++(GCC 4.8.5)

Because we only focus on the implementation of vector, and the implementation of vector is almost all in the header file, we can use such a method to obtain relatively fresh source code

// main.cpp
#include <vector>
int main() {
  std::vector<int> v;
  v.emplace_back(1);
}

g++ -E main.cpp -std=c++11 > vector.cpp

Open vector in vscode CPP uses the regular "#. * \ n" to delete all compiler related lines. In this way, it can filter all precompiled instructions, and does not rely on external implementation, and there is no pressure to jump

allocator

For a trait that an allocator needs to implement, at least

allocate memory allocation
deallocate memory recycling

The minimum granularity allocated by allocator is the object, so it is necessary to increase the maximum allocated quantity

max_size maximum allocated quantity

The above is the most basic function to realize a distributor. On this basis, the construction and destruct of objects are extended. For places where allocators need to be used, such as STL, the container itself does not need to pay attention to the memory related functions of object construction and destruct.

Construction object construct ion means that templates need to be used for implementation and generalization
destroy object destruction

To sum up, realize the alloc of allocator_ Traits are as follows:

allocate allocation
deallocate recycling
Construction object construct ion means that templates need to be used for implementation and generalization
destroy object destruction
max_size maximum allocated quantity

std::allocator

The implementation of the allocator of the standard library is relatively simple. Allocation and recycling:: operator new/delete

pointer allocate(size_type __n, const void * = 0) {
  if (__n > this->max_size())
    std::__throw_bad_alloc();
  return static_cast<_Tp *>(::operator new(__n * sizeof(_Tp)));
}

void deallocate(pointer __p, size_type) { ::operator delete(__p); }

For the maximum allocation, the entire process space (virtual) can be allocated

// sizeof(size_t) = process address width
size_type max_size() const throw() { return size_t(-1) / sizeof(_Tp); }

For the construction and Deconstruction of objects, the layout construction and destructor are used

void construct(pointer __p, const _Tp &__val) {
  ::new ((void *)__p) _Tp(__val);
}

void destroy(pointer __p) { __p->~_Tp(); }

std::vector

General sequential container, supporting user-defined memory allocator;

Basic implementation

libstdc + + defines vector as follows, which provides:

template <typename _Tp, typename _Alloc = std::allocator<_Tp>>
class vector : protected _Vector_base<_Tp, _Alloc> {};

Two template parameters: an element type in a container and an allocator type, and the allocator type is not a required parameter.

Using protected inheritance_ Vector_base, but there is no use of empty base class optimization (EBO), but more class isolation;

Observe_ Vector_ The implementation of base includes an impl:

template <typename _Tp, typename _Alloc> struct _Vector_base {
  typedef
      typename __gnu_cxx::__alloc_traits<_Alloc>::template rebind<_Tp>::other
          _Tp_alloc_type;
  typedef typename __gnu_cxx::__alloc_traits<_Tp_alloc_type>::pointer pointer;

  struct _Vector_impl : public _Tp_alloc_type {
    pointer _M_start;
    pointer _M_finish;
    pointer _M_end_of_storage;
  }

public:
  _Vector_impl _M_impl;
}

_ Vector_base provides vector's operations on memory, including allocating and releasing memory_ Vector_impl public inheritance_ Tp_ alloc_ Type (the default is STD:: allocator < _tp1 >), from the semantics of C + +_ Vector_impl can also be called an allocator (as it is).

_Vector_impl

_ Vector_impl implementation is relatively simple. Three core member variables are used as the underlying expression of vector

_ M_start element space start address, address returned by data()
_ M_finish the end address of the meta space, which is related to size()
_ M_ end_ of_ The storage element is the end address of free space, which is related to capacity()

struct _Vector_impl : public _Tp_alloc_type {
  pointer _M_start;
  pointer _M_finish;
  pointer _M_end_of_storage;

  _Vector_impl()
      : _Tp_alloc_type(), _M_start(0), _M_finish(0), _M_end_of_storage(0) {}

  _Vector_impl(_Tp_alloc_type const &__a)
      : _Tp_alloc_type(__a), _M_start(0), _M_finish(0),
        _M_end_of_storage(0) {}

  void _M_swap_data(_Vector_impl &__x) {
    std::swap(_M_start, __x._M_start);
    std::swap(_M_finish, __x._M_finish);
    std::swap(_M_end_of_storage, __x._M_end_of_storage);
  }
};

_Vector_base

_ Vector_impl has provided the expression of the underlying storage_ Vector_base is the initialization of the underlying expression, the implementation of shielding memory, and the application / release interface for the upper layer

// Only one constructor is selected for display
_Vector_base(size_t __n) : _M_impl() { _M_create_storage(__n); }

void _M_create_storage(size_t __n) {
  this->_M_impl._M_start = this->_M_allocate(__n);
  this->_M_impl._M_finish = this->_M_impl._M_start;
  this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}

// Free memory
~_Vector_base() {
  _M_deallocate(this->_M_impl._M_start,
                this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
}

pointer _M_allocate(size_t __n) {
  return __n != 0 ? _M_impl.allocate(__n) : 0;
}

void _M_deallocate(pointer __p, size_t __n) {
  if (__p)
    _M_impl.deallocate(__p, __n);
}

Constructor

Taking the implementation of the three constructors as an example, it should be noted that when constructing the latter two, there will be a cost of size() replication
L174 default constructor does nothing except basic initialization
L209 construct has initializer_ Container for list init content
L214 constructs a container with the contents of the range [first, last]

174  explicit vector(const allocator_type &__a) : _Base(__a) {}

209  vector(initializer_list<value_type> __l,
210         const allocator_type &__a = allocator_type())
211      : _Base(__a) {
212    _M_range_initialize(__l.begin(), __l.end(), random_access_iterator_tag());
213  }

214  template <typename _InputIterator,
215            typename = std::_RequireInputIter<_InputIterator>>
216  vector(_InputIterator __first, _InputIterator __last,
217         const allocator_type &__a = allocator_type())
218      : _Base(__a) {
219    _M_initialize_dispatch(__first, __last, __false_type());
220  }

method

To understand the underlying implementation of std::vector, we will directly look at the methods provided later. The most basic is to add, delete, change and check the size.

Size dependent

The number of implementation elements of size() is

size_type size() const {
  return size_type(this->_M_impl._M_finish - this->_M_impl._M_start);
}

capacity() the size of free space, which is implemented as

size_type capacity() const {
  return size_type(this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
}

push_back

push_back is the most frequently used method. If you understand its implementation, the change strategy of the whole vector will be clear.

60  void push_back(const value_type &__x) {
61    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
62      _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
63      ++this->_M_impl._M_finish;
64    } else
65      _M_emplace_back_aux(__x);
66  }
67 
68  void push_back(value_type &&__x) { emplace_back(std::move(__x)); }

85  template <typename _Tp, typename _Alloc>
86  template <typename... _Args>
87  void vector<_Tp, _Alloc>::emplace_back(_Args && ...__args) {
88    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
89      _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
90                               std::forward<_Args>(__args)...);
91      ++this->_M_impl._M_finish;
92    } else
93      _M_emplace_back_aux(std::forward<_Args>(__args)...);
94  }

push_ Empty is used at the bottom of back()_ Back (c + + 11) optimization:
In the case of size() < capacity(), copy / move the structure directly at the position after the last element, and the bottom address offset is + 1
In the case of size() == capacity(), you need to apply for a new piece of memory before inserting a new element, and you need to move the previous element to the new memory. The implementation is as follows, ignoring exception handling and unnecessary branch handling.

11  template <typename _Tp, typename _Alloc>
12  template <typename... _Args>
13  void vector<_Tp, _Alloc>::_M_emplace_back_aux(_Args && ...__args) {
14    const size_type __len =
15        _M_check_len(size_type(1), "vector::_M_emplace_back_aux");
16    pointer __new_start(this->_M_allocate(__len));
17    pointer __new_finish(__new_start);
19    _Alloc_traits::construct(this->_M_impl, __new_start + size(),
20                             std::forward<_Args>(__args)...);
21    __new_finish = 0;
22    __new_finish = std::__uninitialized_move_if_noexcept_a(
23        this->_M_impl._M_start, this->_M_impl._M_finish, __new_start,
24        _M_get_Tp_allocator());
25    ++__new_finish;
26    std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
27                  _M_get_Tp_allocator());
28    _M_deallocate(this->_M_impl._M_start,
29                  this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
30    this->_M_impl._M_start = __new_start;
31    this->_M_impl._M_finish = __new_finish;
32    this->_M_impl._M_end_of_storage = __new_start + __len;
33  }

_ M_check_len checks whether there is enough space for allocation and returns the increased size. The implementation is as follows

size_type _M_check_len(size_type __n, const char *__s) const {
  if (max_size() - size() < __n)
    __throw_length_error((__s));
  const size_type __len = size() + std::max(size(), __n);
  return (__len < size() || __len > max_size()) ? max_size() : __len;
}

As you can see, the first push_ After back, size() == capacity() == 1, the second time is 2, followed by * 2, and the maximum is size_t(-1)/sizeof(T).

L14 get the space size to be allocated
L16 requests a new piece of memory
L19 construct new elements
L22 copy / move the old elements to the new memory
L26 destructs the old elements
L28 release the old space
L30-L32 update the index of the underlying implementation

Therefore, we can see that the underlying implementation of vector must be a sequence table, which can be on the stack (implement the allocator yourself) or on the heap (default).
For capacity expansion, the growth factor is 2, and there is a maximum size limit. The case of integer overflow is also considered.
With regard to constructors, there will be a call to copy constructors for each insertion

insert

Inserts the element into the container at the specified location.

insert and push_ The implementation of back is not different. There are (size() - pos) more copy / move constructors

resize

Change the number of elements that can be stored in the container

Here we only look at the implementation of the default initialization new element value

298  void resize(size_type __new_size) {
299    if (__new_size > size())
300      _M_default_append(__new_size - size());
301    else if (__new_size < size())
302      _M_erase_at_end(this->_M_impl._M_start + __new_size);
303  }

525  void _M_erase_at_end(pointer __pos) {
526    std::_Destroy(__pos, this->_M_impl._M_finish, _M_get_Tp_allocator());
527    this->_M_impl._M_finish = __pos;
528  }

408  void vector<_Tp, _Alloc>::_M_default_append(size_type __n) {
409    if (__n != 0) {
410      if (size_type(this->_M_impl._M_end_of_storage -
411                    this->_M_impl._M_finish) >= __n) {
412        std::__uninitialized_default_n_a(this->_M_impl._M_finish, __n,
413                                         _M_get_Tp_allocator());
414        this->_M_impl._M_finish += __n;
415      } else {
416        const size_type __len = _M_check_len(__n, "vector::_M_default_append");
417        const size_type __old_size = this->size();
418        pointer __new_start(this->_M_allocate(__len));
419        pointer __new_finish(__new_start);
420        try {
421          __new_finish = std::__uninitialized_move_if_noexcept_a(
422              this->_M_impl._M_start, this->_M_impl._M_finish, __new_start,
423              _M_get_Tp_allocator());
424          std::__uninitialized_default_n_a(__new_finish, __n,
425                                           _M_get_Tp_allocator());
426          __new_finish += __n;
427        } catch (...) {
428          std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator());
429          _M_deallocate(__new_start, __len);
430          throw;
431        }
432        std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
433                      _M_get_Tp_allocator());
434        _M_deallocate(this->_M_impl._M_start,
435                      this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
436        this->_M_impl._M_start = __new_start;
437        this->_M_impl._M_finish = __new_finish;
438        this->_M_impl._M_end_of_storage = __new_start + __len;
439      }
440    }
441  }

There are also three cases in resize
Ignore when you need to reset the size of the current container
When the reset size is smaller than the current container size, the processing is simple, the memory is released, and the value of finish is modified
When the reset size is larger than the current container size:

The current reset is less than or equal to the capacity of the container, with additional elements directly at the tail with the default constructor
When the reset size is larger than the container, and push_ Like back, you need to apply for memory first, then copy / move elements, and then repeat step 1
L416-L412 apply for new memory and copy / move elements
L424 is an additional element with the default constructor at the end

clear

Clear the elements in the container, and then size() = 0

The implementation is relatively simple

521  void clear() noexcept { _M_erase_at_end(this->_M_impl._M_start); }

525  void _M_erase_at_end(pointer __pos) {
526    std::_Destroy(__pos, this->_M_impl._M_finish, _M_get_Tp_allocator());
527    this->_M_impl._M_finish = __pos;
528  }

reserve

Reserve storage space and increase the capacity of vector to (greater than or equal to) new_ The value of cap
The implementation is also relatively simple, new_ When the value of cap is greater than the capacity of the container, reallocate it, copy / move it to a new memory, and finally update the underlying data structure

566   template <typename _Tp, typename _Alloc>
567   void vector<_Tp, _Alloc>::reserve(size_type __n) {
568     if (__n > this->max_size())
569       __throw_length_error(("vector::reserve"));
570     if (this->capacity() < __n) {
571       const size_type __old_size = size();
572       pointer __tmp = _M_allocate_and_copy(
573           __n, std::__make_move_if_noexcept_iterator(this->_M_impl._M_start),
574           std::__make_move_if_noexcept_iterator(this->_M_impl._M_finish));
575       std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
576                     _M_get_Tp_allocator());
577       _M_deallocate(this->_M_impl._M_start,
578                     this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
579       this->_M_impl._M_start = __tmp;
580       this->_M_impl._M_finish = __tmp + __old_size;
581       this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
582     }
583   }

shrink_to_fit

Request to remove unused capacity

void shrink_to_fit() { _M_shrink_to_fit(); }

template <typename _Tp, typename _Alloc>
bool vector<_Tp, _Alloc>::_M_shrink_to_fit() {
  if (capacity() == size())
    return false;
  return std::__shrink_to_fit_aux<vector>::_S_do_it(*this);
}

template <typename _Tp> struct __shrink_to_fit_aux<_Tp, true> {
  _Tp(__make_move_if_noexcept_iterator(__c.begin()),
      __make_move_if_noexcept_iterator(__c.end()), __c.get_allocator())
      .swap(__c);
  return true;
};

There are too many templates, which seems laborious. Let's put it another way

std::vector<int> v;
v.push_back(1); // size()=1 capacity()=1
v.push_back(1); // size()=2 capacity()=2
v.push_back(1); // size()=3 capacity()=4

std::vector<int>(v.begin(), v.end()).swap(v); // size()=3 capacity()=3

Time complexity analysis

Complexity	method	explain
\(O(1)\)	size()	Variable subtraction
\(O(1)\)	capacity()	Variable subtraction
\(O(1)\)	push_back()	The worst case of equal sharing is 3
\(O(n)\)	insert()	The operation requires copying the size()-pos
\(O(n)\)	clear()	size() secondary deconstruction
\(O(n)\)	reserve()	size() copies required for capacity expansion
\(O(n)\)	shrink_to_fit()	size() copy is required for construction, and swap() is a constant

push_back complexity proof

Prepare for libstdc + +, and the growth factor of vector is 2. Analyze and execute n push for an empty vector_ Complexity of back.

The \ (c_i \) number of copy constructions required for the \ (I \) operation can be divided into two cases:

size() < capacity(), \(c_i=1\)
size() == capacity(), expand vector, \ (c_i=i \)

The number of times obtained each time is:

\[c_i=\left\{ \begin{aligned} i. & if i-1 is exactly a power of 2\ 1, & other \end{aligned} \right. \]

n push_ The total number of times the constructor is copied

\[\sum_{i=1}^nc_i \le n + \sum_{j=0}^{\lfloor lgn \rfloor}2^j \le n+2n = 3n \]

n push_ The upper bound of back is 3n and the number of single amortization is 3, so the complexity is \ (O(1) \)

Topics: C++ Cpp

Programmer Think