Detailed explanation of C++STL -- Simulation Implementation of vector

Posted by PJ droopy pants on Sat, 01 Jan 2022 16:47:35 +0100

Overview of vector function interfaces

namespace cl
{
	//Simulation implementation vector
	template<class T>
	class vector
	{
	public:
		typedef T* iterator;
		typedef const T* const_iterator;

		//Default member function
		vector();                                           //Constructor
		vector(size_t n, const T& val);                     //Constructor
		template<class InputIterator>                      
		vector(InputIterator first, InputIterator last);    //Constructor
		vector(const vector<T>& v);                         //copy constructor 
		vector<T>& operator=(const vector<T>& v);           //Assignment operator overloaded function
		~vector();                                          //Destructor

		//Iterator correlation function
		iterator begin();
		iterator end();
		const_iterator begin()const;
		const_iterator end()const;

		//Capacity and size correlation function
		size_t size()const;
		size_t capacity()const;
		void reserve(size_t n);
		void resize(size_t n, const T& val = T());
		bool empty()const;

		//Modify container content related functions
		void push_back(const T& x);
		void pop_back();
		void insert(iterator pos, const T& x);
		iterator erase(iterator pos);
		void swap(vector<T>& v);

		//Accessing container related functions
		T& operator[](size_t i);
		const T& operator[](size_t i)const;

	private:
		iterator _start;        //Head pointing to container
		iterator _finish;       //Tail pointing to valid data
		iterator _endofstorage; //Points to the end of the container
	};
}

Note: in order to prevent naming conflicts with the vector in the standard library, the simulation implementation needs to be placed in its own namespace.

Introduction to member variables in vector

There are three member variables in the vector_ start,_ finish,_ endofstorage.

_ start points to the head of the container_ finish points to the end of valid data in the container_ The end of storage points to the end of the entire container.

Default member function

Constructor 1

vector first supports a parameterless constructor. For this parameterless constructor, we can directly set the three member variables of the construction object to null pointers.

//Constructor 1
vector()
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{}

Constructor 2

Secondly, vector also supports the construction of objects using an iterator interval. Because the iterator interval can be the iterator interval of other containers, that is, the type of iterator received by the function is uncertain, we need to design the constructor as a function template and insert the data of the iterator interval into the container one by one in the function body.

//Constructor 2
template<class InputIterator> //template function
vector(InputIterator first, InputIterator last)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
	//Insert the data of the iterator interval in [first, last] into the container one by one
	while (first != last)
	{
		push_back(*first);
		first++;
	}
}

Constructor 3

In addition, vector also supports the construction of such a container, which contains n data with the value of val. For this constructor, we can first use the reserve function to set the container capacity to N, and then use push_ The back function can insert n data with value Val into the container.

//Constructor 3
vector(size_t n, const T& val)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
	reserve(n); //Call the reserve function to set the container capacity to n
	for (size_t i = 0; i < n; i++) //Insert n data with value val into the container
	{
		push_back(val);
	}
}

be careful:
1) The constructor knows that it needs space to store n data, so it's best to use the reserve function to open up the space at one time to avoid calling push_ The back function needs to be resized many times, resulting in reduced efficiency.
2) The constructor also needs to implement two overloaded functions.

vector(long n, const T& val)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
	reserve(n); //Call the reserve function to set the container capacity to n
	for (size_t i = 0; i < n; i++) //Insert n data with value val into the container
	{
		push_back(val);
	}
}
vector(int n, const T& val)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
	reserve(n); //Call the reserve function to set the container capacity to n
	for (int i = 0; i < n; i++) //Insert n data with value val into the container
	{
		push_back(val);
	}
}

It can be seen that the difference between the two overloaded functions is that the type of parameter n is different, but this is necessary. Otherwise, when we use the following code, the compiler will preferentially match constructor 2.

vector<int> v(5, 7); //Call constructor 3???

In addition, an error is reported because the constructor 2 refers to the parameters first and last (but the int type cannot be dereferenced).

copy constructor

The constructor of vector involves deep copy. Here are two ways to write deep copy:
Writing method 1: traditional writing method
The idea of the traditional writing method of copy construction is the easiest for us to think of: first open up a space with the same size as the container, then copy the data in the container one by one, and finally update it_ finish and_ The value of endofstorage.

//Traditional writing
vector(const vector<T>& v)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
	_start = new T[v.capacity()]; //Open up a space the same size as the container v
	for (size_t i = 0; i < v.size(); i++) //Copy the data in the container v one by one
	{
		_start[i] = v[i];
	}
	_finish = _start + v.size(); //Tail of valid data of container
	_endofstorage = _start + v.capacity(); //End of entire container
}

Note: the memcpy function cannot be used when copying the data in the container one by one. When the data stored in the vector is a built-in type or a user-defined type without deep copy, there is no problem using the memcpy function, but when the data stored in the vector is a user-defined type requiring deep copy, the disadvantages of using the memcpy function are reflected. For example, when the data stored in vector is a string class.

And each string stored in the vector points to its own stored string.

If we use the memcpy function for copy construction at this time, the value of the member variable of each string stored in the copied vector will be the same as the value of the member variable of each string stored in the copied vector, that is, each corresponding string member in the two vectors points to the same string space.

This is obviously not the result we get, so how does the given code solve this problem?

In the code, it seems that the ordinary "=" is used to copy the data in the container one by one. In fact, it calls the assignment operator overload function of the stored element, and the assignment operator overload function of string class is a deep copy, so the copy result is as follows:

To sum up: if the element type stored in the vector is a built-in type (int) or a shallow copy custom type (Date), it is no problem to use the memcpy function for copy construction. However, if the element type stored in the vector is a deep copy custom type (string), using the memcpy function will not achieve the desired effect.

Writing 2: Modern Writing
The modern writing method of copy constructor is also relatively simple. Use the range for (or other traversal methods) to traverse container v, and insert the data stored in container v one by one in the traversal process.

//Modern writing
vector(const vector<T>& v)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
	reserve(v.capacity()); //Call the reserve function to set the container capacity to be the same as v
	for (auto e : v) //Insert the data in the container v one by one
	{
		push_back(e);
	}
}

Note: in the process of traversing the container v using the range for, the variable E is the copy of each data, and then the e tail is inserted into the constructed container. Even if the data stored in container v is a string class, the string copy structure (deep copy) will be automatically called during e-copy, so the problems similar to those when using memcpy can be avoided.

Assignment operator overloaded function

Of course, the overloading of the assignment operator of vector also involves the problem of deep copy. We also provide two ways to write deep copy:
Writing method 1: traditional writing method
First, judge whether you are assigning a value to yourself. If you assign a value to yourself, there is no need to operate. If you do not assign a value to yourself, first open up a space with the same size as container v, then copy the data in container v one by one, and finally update it_ finish and_ The value of endofstorage.

//Traditional writing
vector<T>& operator=(const vector<T>& v)
{
	if (this != &v) //Prevent yourself from assigning values to yourself
	{
		delete[] _start; //Free up the original space
		_start = new T[v.capacity()]; //Open up a space the same size as the container v
		for (size_t i = 0; i < v.size(); i++) //Copy the data in the container v one by one
		{
			_start[i] = v[i];
		}
		_finish = _start + v.size(); //Tail of valid data of container
		_endofstorage = _start + v.capacity(); //End of entire container
	}
	return *this; //Continuous assignment is supported
}

Note: similar to the traditional writing of copy constructor, memcpy function cannot be used for copy.

Writing 2: Modern Writing
The modern writing method of assignment operator overload is very incisive. First, reference parameters are not used when passing right-value parameters, because in this way, the copy constructor of vector can be called indirectly, and then the container v constructed by this copy can be exchanged with the left value. At this time, the assignment operation is completed, and the container v will be automatically destructed at the end of the function call.

//Modern writing
vector<T>& operator=(vector<T> v) //The compiler automatically calls its copy constructor when it receives the right value
{
	swap(v); //Swap the two objects
	return *this; //Continuous assignment is supported
}

Note: the modern writing method of assignment operator overloading is also a deep copy, which is just a deep copy of the copy constructor of the called vector. In the assignment operator overloading function, it is only the exchange of the deep copied object with the lvalue.

Destructor

When destructing a container, first judge whether the container is empty. If it is empty, there is no need to destruct. If it is not empty, first release the space for storing data in the container, and then set each member variable of the container as a null pointer.

//Destructor
~vector()
{
	if (_start) //Avoid releasing null pointers
	{
		delete[] _start; //Free up space for containers to store data
		_start = nullptr; //_ Set start to null
		_finish = nullptr; //_ finish empty
		_endofstorage = nullptr; //_ End of storage is null
	}
}

Iterator correlation function

The iterator in the vector is actually a pointer to the data type stored in the container.

typedef T* iterator;
typedef const T* const_iterator;

begin and end

The begin function in the vector returns the first address of the container, and the end function returns the address of the next data of valid data in the container.

iterator begin()
{
	return _start; //Returns the first address of the container
}
iterator end()
{
	return _finish; //Returns the address of the next data of valid data in the container
}

We also need to overload a pair of begin and end functions applicable to const objects, so that the iterator obtained when const objects call begin and end functions can only read data, not modify it.

const_iterator begin()const
{
	return _start; //Returns the first address of the container
}
const_iterator end()const
{
	return _finish; //Returns the address of the next data of valid data in the container
}

At this point, let's take a look at the code of vector using iterators. In fact, it uses pointers to traverse the container.

vector<int> v(5, 3);
vector<int>::iterator it = v.begin();
while (it != v.end())
{
	cout << *it << " ";
	it++;
}
cout << endl;

Now that we have implemented the iterator, we can actually use the range for to traverse the container, because the compiler will automatically replace the range for with the iterator at compile time.

vector<int> v(5, 3);
//Traverse the range for
for (auto e : v)
{
	cout << e << " ";
}
cout << endl;

Capacity and size correlation function

size and capacity

Compared with the three members of the vector traversing their respective points, we can easily get the number of valid data and the maximum capacity in the current container.

Since the result of subtracting two pointers is the number of data of the corresponding type between the two pointers, size can be determined by_ finish - _start, and capacity can be obtained by_ endofstorage - _start gets.

size_t size()const
{
	return _finish - _start; //Returns the number of valid data in the container
}
size_t capacity()const
{
	return _endofstorage - _start; //Returns the maximum capacity of the current container
}

reserve

reserve rule:
  1. When n is greater than the current capacity of the object, expand the capacity to N or greater than n.
  2. Do nothing when n is less than the current capacity of the object.

The implementation idea of the reserve function is also very simple, First judge whether the given n is greater than the maximum capacity of the current container (otherwise, no operation is required). During the operation, directly open up a space that can hold n data, then copy the effective data in the original container to the space, release the space where the original container stores data, and hand over the newly opened space to the container for maintenance. It is best to update the values of each member variable in the container.

void reserve(size_t n)
{
	if (n > capacity()) //Determine whether operation is required
	{
		size_t sz = size(); //Record the number of valid data in the current container
		T* tmp = new T[n]; //Open up a space that can hold n data
		if (_start) //Determine whether it is an empty container
		{
			for (size_t i = 0; i < sz; i++) //Copy the data in the container to tmp one by one
			{
				tmp[i] = _start[i];
			}
			delete[] _start; //Free up the space where the container itself stores data
		}
		_start = tmp; //Hand over the data maintained by tmp to_ start for maintenance
		_finish = _start + sz; //Tail of valid data of container
		_endofstorage = _start + n; //End of entire container
	}
}

There are two points to note in the implementation of the reserve function:
1) Before operation, the number of valid data in the current container needs to be recorded in advance.
Because we finally need to update_ The point of the finish pointer, and_ The point of the finish pointer is equal to_ The start pointer adds the number of valid data in the container, when_ After the point of the start pointer changes, we call the size function to pass_ finish - _ The number of valid data calculated by start is a random value.

2) When copying the data in the container, you cannot use the memcpy function to copy.
You may think that when strings are stored in the vector, although the container saved by the memcpy function points to the same string space as each corresponding string member in the original container, the space for storing data in the original container has not been released, which is equivalent to that there is only one container to maintain these string spaces. What's the impact.
But don't forget that when you release the space of the original container, each string stored in the original container will call the destructor of the string and release the string it points to. Therefore, the string pointed to by each string in the container saved by the memcpy function is actually a space that has been released, Accessing the container is illegal access to memory space.

Therefore, we still have to use the for loop to assign the strings in the container one by one, because this can indirectly call the overload of the assignment operator of string to realize the deep copy of string.

resize

resize rule:
  1. When n is greater than the current size, expand the size to N, and the expanded data is val. if Val is not given, it defaults to the value constructed by the default constructor of the type stored in the container.
  2. When n is smaller than the current size, reduce the size to n.

According to the rules of the resize function, when entering the function, we can first judge whether the given n is less than the current size of the container. If it is less than, we can change it_ The direction of finish is to directly reduce the size of the container to N. otherwise, judge whether the container needs to be increased, and then assign the expanded data to val.

void resize(size_t n, const T& val = T())
{
	if (n < size()) //When n is less than the current size
	{
		_finish = _start + n; //Reduce size to n
	}
	else //When n is greater than the current size
	{
		if (n > capacity()) //Judge whether to increase capacity
		{
			reserve(n);
		}
		while (_finish < _start + n) //Expand size to n
		{
			*_finish = val;
			_finish++;
		}
	}
}

Note: in C + +, built-in types can also be regarded as a class, and they also have their own default constructor. Therefore, when setting the default value for the parameter val of the resize function, set it to T().

empty

The empty function can be used directly by comparing the values in the container_ start and_ Determine whether the container is empty by pointing to the finish pointer. If the indicated position is the same, the container is empty.

bool empty()const
{
	return _start == _finish;
}

Modify container content related functions

push_back

To tail the data, you must first judge whether the container is full. If it is full, you need to increase the capacity first, and then tail the data into the container_ finish points to the location, and then_ finish + +.

//Tail interpolation data
void push_back(const T& x)
{
	if (_finish == _endofstorage) //Judge whether to increase capacity
	{
		size_t newcapacity = capacity() == 0 ? 4 : 2 * capacity(); //Double capacity
		reserve(newcapacity); //increase capacity
	}
	*_finish = x; //Tail interpolation data
	_finish++; //_ The finish pointer moves back
}

pop_back

Before tail deleting data, you must also judge whether the container is empty. If it is empty, assertion processing will be performed. If it is not empty, the container will be deleted_ finish – OK.

//Tail deletion data
void pop_back()
{
	assert(!empty()); //Assertion if container is empty
	_finish--; //_ The finish pointer moves forward
}

insert

The insert function can insert data at the pos position of the given iterator. Before inserting the data, first judge whether it is necessary to increase the capacity, and then move the pos position and its subsequent data back one bit to leave the pos position for insertion. Finally, insert the data into the pos position.

//Insert data in pos position
void insert(iterator pos, const T& x)
{
	if (_finish == _endofstorage) //Judge whether to increase capacity
	{
		size_t len = pos - _start; //Record pos and_ Interval between start
		size_t newcapacity = capacity() == 0 ? 4 : 2 * capacity(); //Double capacity
		reserve(newcapacity); //increase capacity
		pos = _start + len; //Find the position of pos in the container after compatibilization through len
	}
	//Move the pos position and its subsequent data back one bit to leave the pos position for insertion
	iterator end = _finish;
	while (end > pos + 1)
	{
		*end = *(end - 1);
		end--;
	}
	*pos = x; //Insert data into pos location
	_finish++; //Increase the number of data by one_ finish move back
}

Note: if you need to increase the capacity, you need to record the pos and_ The interval between start, and then determine the direction of pos in the container after capacity expansion through the interval. Otherwise, pos also points to the originally released space.

erase

The erase function can delete the data at the pos position of the given iterator. Before deleting the data, you need to judge that the container is empty. If it is empty, you need to do assertion processing. When deleting the data, you can move the data after the pos position forward by one bit and overwrite the data at the pos position.

//Delete pos location data
iterator erase(iterator pos)
{
	assert(!empty()); //Assertion if container is empty
	//Move the data after the pos position forward by one bit to cover the data at the pos position
	iterator it = pos + 1;
	while (it != _finish)
	{
		*(it - 1) = *it;
		it++;
	}
	_finish--; //Reduce the number of data by one_ finish move forward
	return pos;
}

swap

The swap function is used to exchange the data of two containers. We can directly call the swap function in the library to exchange the member variables of the two containers.

//Exchange data between two containers
void swap(vector<T>& v)
{
	//Exchange member variables in the container
	::swap(_start, v._start);
	::swap(_finish, v._finish);
	::swap(_endofstorage, v._endofstorage);
}

Note: when calling the swap in the library here, you need to add "::" before the swap (scope qualifier) to tell the compiler to find the swap function in the global scope first, otherwise the compiler will think that what you call is the swap function you are implementing (proximity principle).

Accessing container related functions

operator[ ]

vector also supports us to use the "subscript + []" method to access the data in the container. When implemented, we can directly return the data at the corresponding location.

T& operator[](size_t i)
{
	assert(i < size()); //Detect the legitimacy of Subscripts

	return _start[i]; //Return corresponding data
}
const T& operator[](size_t i)const
{
	assert(i < size()); //Detect the legitimacy of Subscripts

	return _start[i]; //Return corresponding data
}

Note: when overloading the operator [], you need to overload one applicable to const container, because the data obtained by const container through "subscript + []" is only allowed to read and cannot be modified.

Topics: C++