Hash table -- Python implementation

Posted by vanzkee on Sat, 08 Jan 2022 17:39:26 +0100

Hash table (basic concept)

Hash table (also known as hash table) is a kind of data set, in which the storage mode of data items is especially conducive to fast search and location in the future.

Each storage location of the hash table becomes a slot that can be used to store data items. Each slot has a unique name.

The function that implements the conversion from data item to storage name is called hash function (hash function).
A common hash method is "find remainder", which takes the remainder obtained from the size of the data item hash table as the slot number.
However, there is obviously a problem. The data to be saved needs to be stored in the same slot, which will lead to conflict.
Perfect hash function; Given a set of data items, if a hash function can map each data item to a different slot, then the hash function is called 'perfect hash function'.
Perfect hash function application:
Data consistency verification; (compressibility, computability, modification resistance and conflict resistance)
Hash function:
MD5,SHA
python hash function library hashlib; Hash functions with MD5 and SHA series.
It includes six hash functions such as md5 / sha1 / sha224.

>>> import hashlib
>>> m = hashlib.md5('hello'.encode('utf-8')).hexdigest()
>>> m
'5d41402abc4b2a76b9719d911017c592'

The coolest application of hash function;
Blockchain Technology

Conflict resolution: Data necklace and linear detection

Data Necklace

We know that the characteristics of arrays are easy to address and difficult to insert and delete arrays. The characteristics of linked list are difficult to address and easy to insert and delete data.
You can see that above the hash table is an array. Each member of the array includes a pointer to the head node of a linked list. Of course, the linked list may be empty or link multiple nodes.

The way we store key value pairs mainly depends on the characteristics of the key. We find the corresponding array subscript through the hash value of the key, and then add its key value pair to the corresponding linked list. When looking for elements, we also find the corresponding value of a specific linked list according to the hash value of the key.

We can compare dictionaries in python. Dictionaries are data types that can store key value pairs. This key value association method is called 'mapping Map'.

  • Keys are unique
  • A data value can be uniquely determined by a key

Hash table supported operations:

  • add(key, val): add the key value association pair to the map. If the key already exists, val will replace the old association value;
  • get(key): given a key, return the associated data value. If it does not exist, return None;
  • len() returns the number of key Val key value pairs in the mapping;
  • Del: through del hash_ Delete the key Val Association in the form of map [key];
  • In: through key in hash_ In the form of a map statement, returns whether the key exists and is associated with a Boolean value.

Linear detection

Open addressing is to find an open empty slot for conflicting data items.
The backward slot by slot search method is linear detection in open addressing technology.
One disadvantage of linear detection method is the trend of aggregation. One way to avoid aggregation is to expand linear detection from one by one to jump detection.

Implement Hash_Map

Considering the conflict resolution algorithm, the hash size should be prime.

class Hash_Table(object):
    def __init__(self):
        self.size = 11
        self.slots = [None] * self.size
        self.val = [None] * self.size

    def hash_function(self,key):
        return key % self.size

    def rehash(self,oldhash):
        return (oldhash + 1) % self.size

    def add(self, key, val):
        hash_value = self.hash_function(key)
        if self.slots[hash_value] == None:
            self.slots[hash_value] = key
            self.val[hash_value] = val
        else:
            if self.slots[hash_value] == key:
                self.val[hash_value] = val
            else:
                nextslot = self.rehash(hash_value)
                while self.slots[nextslot] != None and self.slots[nextslot] != key:
                    nextslot = self.rehash(nextslot)
                if self.slots[nextslot] == None:
                    self.slots[nextslot] = key
                    self.val[nextslot] = val
                else:
                    self.data[nextslot] = val

    def get(self, key):
        startslot = self.hash_function(key)
        val = None
        stop = False
        found = False
        position = startslot
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                found = True
                val = self.val[position]
            else:
                position = self.rehash(position)
                if position == startslot:
                    stop = True
        return val

    def __getitem__(self, key):
        return self.get(key)

    def __setitem__(self, key, val):
        self.add(key,val)
from ADT_CLASS import Hash_Table

h = Hash_Table()
h[54] = 'cat'
h[26] = 'dog'
h[93] = 'lion'
h[26] = 'pig'
print(h.slots)
print(h.val)
print(h[26])
Output:
[None, None, None, None, 26, 93, None, None, None, None, 54]
[None, None, None, None, 'pig', 'lion', None, None, None, None, 'cat']
pig

Topics: Python data structure