C++ STL and generic programming -- a universal hash function

Posted by welsh_sponger on Tue, 08 Feb 2022 01:21:45 +0100

1, General version of Hash Function

#inlcude <functional>
class Customer{ // Customize the class of an element stored in the container
public:
	string fname;
	string lname;
	long no;
	Customer(const string _fname, const string _lname, long _no) //constructor
		:fname(_fname), lname(_lname), no(_no) {}

};

// Version 1
class CustomerHash{ // The internal operator() is a functor type
public:
	std::size operator() (const Customer& c) const{
		return ...
	}
};
unordered_set<Customer, CustomerHash> custset; // When using the first mock exam parameter, you only need to import the class name.
											   // Second template parameters can be passed to the HashFunction class name (the function object will be automatically generated inside it).

// Version 2
size_t customer_hash_func(const Customer& c){
	return...
}
unordered_set<Customer, size_t(*)(const Customer&)> custset(20, customer_hash_func);
// When using the first mock exam parameter, the class name is passed in, but the second template parameter is passed in the function type.
// Then, when calling the constructor when creating an object, you also need to pass in the parameter element value and function address (function name, that is, which function it is)

// Version 3 (similar to the written struct hash < T > {...} Partial specialization version of)
template<>
struct hash<Customer>{
	size_t operator() (const Customer& c) const noexcept{
		...
	}
};

2, Implementation operation

// a native approach
class CustomerHash{
public:
	std::size_t operator() (const Customer& c) const{
		return std::hash<std::string>()(c.fname) + 
			   std::hash<std::string>()(c.lname) +
			   std::hash<long>()(c.no);
	}
}

The above operations simply split the internal member data of the element into a basic type, and then call the corresponding hash< type > Version (the HashFcn that the system has written about the basic type) to get the respective hash-code and add them together. However, when added in this way, it is easy to produce more collisions, resulting in long chains on the same bucket and slow search.

Therefore, consider obtaining hash code through variable templates:

#include<functional>
template<typename T>
inline void hash_combine(size_t& seed, const T& val){ // 4. Get seed
	seed ^= std::hash<T>()(val) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
template<typename T>
inline void hash_val(size_t& seed, const T& val){ // 3. This function is called when there is one remaining template parameter passed in in the last step
	hash_combine(seed, val);
}
template<typename T, typename... Types>
inline void hash_val(size_t& seed, const T& val, Types&... args){ // 2. Continuously split the incoming template parameters through self calling
	hash_combine(seed, val);   // Split the incoming parameter args into val and args' one by one, and call 4 to change seed
	hash_val(seed, args...);
}
template<typename... Types>
inline size_t hash_val(const Types&... args){ // 1. Call this function at the beginning
	size_t seed = 0;
	hash_val(seed, args...);
	return seed;  // The final result returned by seed is to change the hash code of the element
}

class CustomerHash{
public:
	std::size_t operator() (const Customer& c) const{
		return hash_val(c.fname, c.lname, c.no); // Call 1 first
	}
};

Note: here, typename... And types &... And... In args... Should be regarded as an operator.

As you know, hash_val() has three versions. Which one to call depends on the number and type of the corresponding incoming parameters.

hash_val(c.fname, c.lname, c.no);
Three parameters are passed in, and the first parameter is not size_t type, so 1 is called. Enter 1, declare and define seed=0 first, and then enter hash_val() function and pass in the parameters seed and args... (the parameters are still not split at this time). Because the first parameter passed in is size_t type, and there are multiple following parameters, then 2 is called.

By hash_val(seed, args…); Enter 2
When calling 2, first split the original incoming arg... Into val and the new arg '. Then enter 2 internal operation, hash_combine(seed, val) calls 4, val is the first c.fname, then calls the HashFcn of the basic type, namely ==hash<string> () (c.fname) = = obtains its corresponding hash-code, then adds a golden ratio 0x9e3779b9, plus seed left shift 6 bit and right shift 2 bit result. After the final seed is operated with '^ =', the final seed will be obtained [this one-way operation can ensure that the obtained seed is chaotic enough!]. At this time, because the input of seed is reference input, the change of seed in step 4 means that the seed is changed in the whole range.
Hash after execution_ After combine (seed, Val), return to 2, split the newly obtained seed and args one less than the original one... And continue to pass in hash_val(seed, args...). At this time, because arg... Has two more, namely c.lname and c.no, continue to call 2.

At this time, continue to split args... Into Val and args'... [that is, Val is c.lname and args is c.no]. Continue to call 4 to obtain the newly modified seed. Then go back to 2 and continue the hash_val(seed, args...), but at this time, because args has only one incoming parameter, that is, c.no, the hash of 3 is called at this time_ val(size_t& seed, const T& val). Then, 4 is also called in 3 to get a finally modified seed. Finally, return to 1 and return the final seed value, which is the hash code of the changed element.

  • Operation example:

The HashFcn of version 1 is used here, i.e. unordered_ set<Customer, CustomerHash> custset:

Then, create an imitation function object hh through the above universal hash function template class CustomerHash, and then get the hash code of the corresponding customers, and then get the corresponding hash code% 11 to get the number finally placed in the bucket:

3, Realize Hash Function in the form of partial specialization of struct hash

As mentioned earlier, in G2 In 9, the system does not write the partial specialized version of struct hash < string > {...} by default, so you need to write it yourself. But G4 9, the system has written a partial specialized version of struct hash < string > {...}.

Like the specialized version of the basic type written in the previous system:

template<> struct hash<int>{...}
template<> struct hash<char>{...}
template<> struct hash<long>{...}
. . .

Also write a special version for a specific class MyString:

template<> struct hash<MyString>{...}