Hash table (basic concept)
Hash table (also known as hash table) is a kind of data set, in which the storage mode of data items is especially conducive to fast search and location in the future.
Each storage location of the hash table becomes a slot that can be used to store data items. Each slot has a unique name.
The function that implements the conversion from data item to storage name is called hash function (hash function).
A common hash method is "find remainder", which takes the remainder obtained from the size of the data item hash table as the slot number.
However, there is obviously a problem. The data to be saved needs to be stored in the same slot, which will lead to conflict.
Perfect hash function; Given a set of data items, if a hash function can map each data item to a different slot, then the hash function is called 'perfect hash function'.
Perfect hash function application:
Data consistency verification; (compressibility, computability, modification resistance and conflict resistance)
Hash function:
MD5,SHA
python hash function library hashlib; Hash functions with MD5 and SHA series.
It includes six hash functions such as md5 / sha1 / sha224.
>>> import hashlib >>> m = hashlib.md5('hello'.encode('utf-8')).hexdigest() >>> m '5d41402abc4b2a76b9719d911017c592'
The coolest application of hash function;
Blockchain Technology
Conflict resolution: Data necklace and linear detection
Data Necklace
We know that the characteristics of arrays are easy to address and difficult to insert and delete arrays. The characteristics of linked list are difficult to address and easy to insert and delete data.
You can see that above the hash table is an array. Each member of the array includes a pointer to the head node of a linked list. Of course, the linked list may be empty or link multiple nodes.
The way we store key value pairs mainly depends on the characteristics of the key. We find the corresponding array subscript through the hash value of the key, and then add its key value pair to the corresponding linked list. When looking for elements, we also find the corresponding value of a specific linked list according to the hash value of the key.
We can compare dictionaries in python. Dictionaries are data types that can store key value pairs. This key value association method is called 'mapping Map'.
- Keys are unique
- A data value can be uniquely determined by a key
Hash table supported operations:
- add(key, val): add the key value association pair to the map. If the key already exists, val will replace the old association value;
- get(key): given a key, return the associated data value. If it does not exist, return None;
- len() returns the number of key Val key value pairs in the mapping;
- Del: through del hash_ Delete the key Val Association in the form of map [key];
- In: through key in hash_ In the form of a map statement, returns whether the key exists and is associated with a Boolean value.
Linear detection
Open addressing is to find an open empty slot for conflicting data items.
The backward slot by slot search method is linear detection in open addressing technology.
One disadvantage of linear detection method is the trend of aggregation. One way to avoid aggregation is to expand linear detection from one by one to jump detection.
Implement Hash_Map
Considering the conflict resolution algorithm, the hash size should be prime.
class Hash_Table(object): def __init__(self): self.size = 11 self.slots = [None] * self.size self.val = [None] * self.size def hash_function(self,key): return key % self.size def rehash(self,oldhash): return (oldhash + 1) % self.size def add(self, key, val): hash_value = self.hash_function(key) if self.slots[hash_value] == None: self.slots[hash_value] = key self.val[hash_value] = val else: if self.slots[hash_value] == key: self.val[hash_value] = val else: nextslot = self.rehash(hash_value) while self.slots[nextslot] != None and self.slots[nextslot] != key: nextslot = self.rehash(nextslot) if self.slots[nextslot] == None: self.slots[nextslot] = key self.val[nextslot] = val else: self.data[nextslot] = val def get(self, key): startslot = self.hash_function(key) val = None stop = False found = False position = startslot while self.slots[position] != None and not found and not stop: if self.slots[position] == key: found = True val = self.val[position] else: position = self.rehash(position) if position == startslot: stop = True return val def __getitem__(self, key): return self.get(key) def __setitem__(self, key, val): self.add(key,val)
from ADT_CLASS import Hash_Table h = Hash_Table() h[54] = 'cat' h[26] = 'dog' h[93] = 'lion' h[26] = 'pig' print(h.slots) print(h.val) print(h[26])
Output: [None, None, None, None, 26, 93, None, None, None, None, 54] [None, None, None, None, 'pig', 'lion', None, None, None, None, 'cat'] pig