[C++] Hash_map implementation principle and source code [hash table] [hash mapping] [universal data support] [STL]

Posted by driver on Wed, 09 Feb 2022 01:52:19 +0100

Hash_map
As a nonlinear data structure
Before reading, you need to make sure you really understand what a pointer is
And without looking at any source code, the linked list data structure is realized through the principle
If unsure, the following articles are recommended:
Principle of [C/C + +] pointer and its application and understanding (including function pointer and multi-level pointer)
[C + +] what is reference and the relationship between reference and pointer (understanding of memory and pointer application)
[C++] LinkList: principle and implementation of bidirectional acyclic linked list (basic)

Or don't want to read a long speech, you can directly slide to the bottom, copy source code
There are comments and examples in the source code, which can be used by copying and pasting If you can, please point a free like to support it
Pure love needs love

I won't tell you too much nonsense, such as time complexity. Come and find Hash_map's article must know what it is
Now let's officially start
Hash_map, or unordered in the STL standard container_ map.
They are simply an array of linked list pointers

The reason why it can quickly find data is that it can find what you want in hundreds, thousands and tens of thousands of times of data through fewer cycles
Just because it is not really traversal one by one
Instead, search in the specified linked list

Without this concept
Let's abstract it in our mind first

LinkList<int> *list[3];
LinkList [0]LinkList [1]LinkList [2]
data1data4data6
data2data5
data3

Six new data are added

If you use array or linked list to find data6, you have to traverse from scratch
You have to cycle at least six times to find it
That might say, I can choose to traverse from the tail so that I can find it for the first time
So the question is, what if you want to find data1?
The number of cycles is still so many
Because you can't know your data in advance, which part of this data structure
It may be located in the front, middle and tail
The worst thing is that the data you're looking for doesn't exist, and the computer doesn't know whether the data you're looking for exists. It will traverse the whole circle
The efficiency is particularly low

Instead of using Hash_map to traverse data data6, and it will go directly to LinkList[2] to find data
If LinkList[2] does not, it proves that the data does not exist in the whole data structure

The key is the subscript of the array
It knows this data. If it does exist, it will be in the first few array elements

How does it realize this function
Then we need to introduce a unique thing
hash_value, hash value
The hash value is an unsigned positive integer
Hash_map is a linked list of data storage determined by hash value

And let Hash_map is a data structure with hash value
Then we need to introduce another concept
It's Key
This Key is not only the source of hash value, but also the Key to find this data

We put the original data of this key into the node together with the data you need to store
The hash value needs to be converted through key

using hash = unsigned int;
hash HashFunction(keyType key)
    {
        uint modle{ 53 }; // mapSize >32 <64 M53

        hash_value = ((hash)key % modle) % Array size;    // Finally, if the size of the M array is not, there is bound to be an out of bounds problem
        return hash_value;
    }

This is the hash function that converts the key into a hash value

With the concept of hash value, we need to know a noun
Hash Collisions
Hash conflict is when two different key s are converted into the same hash value

If you don't throw the original data of the key into the hash node
You will cause different data to be overwritten by new data

There are many ways to resolve hash conflicts
I personally use a prime number within% the size range of the specified array
[HashCollision] [hash conflict] [HashValue]: best hash prime
The link has a calculated prime number

As long as your array size is within this range, you can use the corresponding prime number to reduce hash conflicts
If you think this method is not suitable for you, you can also go online to find other methods

We have the hash value, and we also have the linked list value
Now you just need to let the program calculate the hash value by itself, and then write the data into the corresponding linked list

hash_ The principle of map is very simple

The key to implementation is Hash_value

You can further understand hash through the following code_ map

code

// ReSharper disable CppClangTidyClangDiagnosticInvalidSourceEncoding
// Disable string spell checking for the ReSharper plug-in

#include <iostream>

#ifndef NULL
#define NULL 0x0
#endif

using uint = unsigned int;
using hash = unsigned int;
using BOOL = short;

char* StringCopy(char* dest, const char* source)
{
    if (!(dest != nullptr && source != nullptr))
    {
        return nullptr;
    }
    char* tmp = dest;
    while ((*dest++ = *source++) != '\0') {}
    return tmp;
}

class Student
{
    const char* name;
    uint age;
public:
    Student(uint age_ = 18, const char* name_ = "Name:Unknown") :name(name_), age(age_)
    {

    }
    Student(const Student& copyObj) // Deep copy constructor: external pointer present
    {
        this->age = copyObj.age;
        this->name = new char[strlen(copyObj.name) + 1];
        StringCopy(const_cast<char*>(this->name), copyObj.name);
    }
    bool operator==(const Student& rightObj)    // Operator overloading: equal logical judgment
    {
        if (this->age == rightObj.age && strcmp(this->name, rightObj.name) != 0)
            return true;
        return false;
    }
};

template <typename keyType, typename dataType>
class Hash_node                                 // It is similar to the node implementation of linked list
{
    keyType hashkey;                            // hashKey stores the key. This key is hash_map can realize the same hash normally_ Value, overwriting in an index is also the key to the link
    dataType data;                              // data area
    Hash_node* nPoint;                          // nPoint
public:
    Hash_node(keyType hashKey_ = NULL, dataType data_ = NULL, Hash_node* nPoint_ = nullptr)
        :hashkey(hashKey_), data(data_), nPoint(nPoint_)
    {

    }

    keyType& GetHashKey()
    {
        return hashkey;
    }

    dataType& GetData()
    {
        return this->data;
    }

    Hash_node*& GetNextPoint()      // Pointer returned by nPoint object
    {                           // If the return type is hash_ The pointer type of node, then only the value inside will be returned
        return this->nPoint;
    }

    Hash_node* GetObject()
    {
        return this;
    }

};

// Template parameter 1: key type template parameter 2: data type template parameter 3: map size
template <typename keyType, typename dataType, size_t mapSize>
class Hash_map
{
    hash hash_value;
    Hash_node<keyType, dataType>* map[mapSize];

    hash HashFunction(keyType key)
    {
        uint modle{ 57 }; // mapSize >32 <64 M57
        					// https://blog.csdn.net/qq_42468226/article/details/117166210 Optimal number

        hash_value = ((hash)key % modle) % mapSize;    // Finally, if the size of the m array is not, there will be an out of bounds problem
        return hash_value;
    }

public:
    Hash_map(keyType key = NULL, const dataType& data = NULL) :hash_value(HashFunction(key)), map{ nullptr }
    {
        this->Hash_push(key, data);
    }

    // Overwrite the same key and append the same hashValue
    bool Hash_push(keyType key, const dataType& data)
    {
        /*Hash_node<keyType, dataType>** */
        this->HashFunction(key);    // DP Hash 
        Hash_node<keyType, dataType>** hnpp = &this->map[hash_value];
        while (*hnpp)   //Entering the loop proves that the element position of the map has been written by some data, so it is necessary to detect whether it needs to be overwritten or appended to the tail
        {
            // Judge whether the key s are the same and overwrite them. Otherwise, append
            if (this->map[hash_value]->GetHashKey() == key)
            {
                this->map[hash_value]->GetData() = data;
                return true;
            }
            hnpp = &(*hnpp)->GetNextPoint();
        }
        if (!*hnpp)
        {
            *hnpp = new Hash_node<keyType, dataType>{ key,data };
            return true;
        }

        return false;   //safe return
    }

    // 0 key: unmatched 1 key: matched 2 key and data: matched
    BOOL Map_check(keyType key, const dataType& data = NULL)
    {
        this->HashFunction(key);    // DP Hash 
        Hash_node<keyType, dataType>** hnpp = &this->map[hash_value];
        while (*hnpp)   //Entering the loop proves that the element position of the map has been written by some data, so it is necessary to detect whether it needs to be overwritten or appended to the tail
        {
            // Judge whether the key s are the same and overwrite them. Otherwise, append
            if ((*hnpp)->GetHashKey() == key) // hnpp[hash_value]->GetHashKey()
            {
                if ((*hnpp)->GetData() == data)
                    return 2; // Key matching data matching
                return 1;   // Key matching data mismatch
            }
            hnpp = &(*hnpp)->GetNextPoint();
        }

        return 0;  // 0 key mismatch, data mismatch
    }

};

int main(void)
{
    std::cout << "\t\tHash_map testing program" << std::endl;

    // Template parameter 1: key type template parameter 2: data type template parameter 3:map size
    Hash_map<int, int, 50>hash_map{};   // Press the first data, which comes from the default structure, and is generally at the position of [0] - 0

    hash_map.Hash_push(15, 99);     // Press the second data
    hash_map.Hash_push(15, 100);    // Overwrite the second data

    switch (hash_map.Map_check(15,100))    //switch uses Map_check function example
    {
    case 0:
        std::cout << "[KEY]:Unmatched" << std::endl;
        break;
    case 1:
        std::cout << "[KEY]:matching" << std::endl;
        break;
    case 2:
        std::cout << "[DATA]:matching" << std::endl;
        break;
    default:
        std::cout << "Hello,World!" << std::endl;
        break;
    }

    if (hash_map.Map_check(15))     // if using Map_check function example
        hash_map.Hash_push(15, 999);    // Overwrite the second data

    for (uint count{ 0 }; count < 100; count++) // Fast press 100 data
        hash_map.Hash_push(count, count);

    hash_map.Map_check(99, 99); // Data search speed example, up to 3 times of circular query

    // Generic structure support
    Hash_map<const char*, Student, 30> studentMap{ "Student number:1001",Student{18,"Zhang San"} };

    studentMap.Hash_push("Student number:1002", Student{ 18,"Li Si" });
    studentMap.Hash_push("Student number:1003", Student{ 19,"Wang Er Ma Zi" });

    if (studentMap.Map_check("Student number:1003"))
        studentMap.Hash_push("Student number:1003",Student{5,"Hello,World"});

    return 0;
}

Topics: C++ data structure linked list Hash table