[data structure] Huffman tree & Huffman coding

Posted by glennn3 on Sat, 27 Nov 2021 04:36:33 +0100

Huffman tree

definition:

Given N weights as N leaf nodes, a binary tree is constructed. If the weighted path length of the tree reaches the minimum, such a binary tree is called the optimal binary tree, also known as Huffman tree. Huffman tree is the tree with the shortest weighted path length, and the node with larger weight is closer to the root.

understand:

Take chestnuts for example: if you have a wardrobe with ABCDEF, there are 6 clothes in total, but the frequency of wearing each clothes is different, 54% (ask is like wearing), 10%, 6%, 3%, 7%, 20%
So now the question is, how can you put your clothes in order to avoid making a mess of the wardrobe every time you look for clothes?
The answer must be to put the one with high frequency in your nearest place
you 're right!! Huffman tree is like this

Let's see how to build a Huffman tree to complete this operation

Just remember one sentence: find the two smallest vertices to merge

Then keep repeating, keep repeating~~

Obviously~~~
You can find
Each of its leaves is clothes, and the root of the tree with the highest frequency is the nearest
This forms a Huffman tree


Here's the point

Huffman coding

We know that there are many coding methods, such as ASCII coding, binary coding and so on
A Huffman code is introduced below
Or the clothes on it
We stipulate that the right subtree is 1 and the left subtree is 0 (that is, looking right is 1 and looking left is 0)
You can analyze it according to the figure

For the above clothes, we can use ASCII coding or binary coding
Of course, Huffman coding can also be used
Let's analyze which is the most efficient (that is, coding with the shortest code length)
A S C I I : 8 b i t ASCII: 8bit ASCII: 8bit
two enter system : 3 b i t Binary: 3bit Binary: 3bit
Huffman coding: because of the indefinite length of Huffman coding, we require the average code length ACL
A C L = 0.54 ∗ 1 + 0.20 ∗ 2 + 0.10 ∗ 3 + 0.07 ∗ 4 + 0.09 ∗ 5 = 1.97 ACL=0.54*1+0.20*2+0.10*3+0.07*4+0.09*5 = 1.97 ACL=0.54∗1+0.20∗2+0.10∗3+0.07∗4+0.09∗5=1.97

obviously!! Huffman is smaller than the other two coding lengths

Next, we use code to realize the construction of Huffman tree

Better understanding combined with tables~~~
This form has also become a codec table. It can be said that if you use Huffman coding to send messages to your friends, the internet police will never find it no matter how powerful it is

nodecharacterprobabilityparent nodeleft noderight nodecode
1A5411001
2B10900010
3C670001111
4D370001110
5E78000110
6F20100000
79843
816957
9261028
10461169
11100101

Huffman tree node structure

struct HTNode
{
	char ch;   //Corresponding letters, such as a
	int weight;//The number of occurrences of letters, such as 1000 occurrences of a
	int parent;
	int left;
	int right;
	char* code;
};

Initialize tree structure

void InitHTTable(int AsciiCount[], HTNode* &HT)
{
	for(int i = 0; i <= 128; i ++) AsciiCount[i] = 0;//Initialize array to 0
	
	AsciiCount[65] = 54;
	AsciiCount[66] = 10;
	AsciiCount[67] = 6;
	AsciiCount[68] = 3;
	AsciiCount[69] = 7;
	AsciiCount[70] = 20;
	
	HT=new HTNode[2 * 6];//Generate dynamic space [1...2n-1]
	int	CurrentASCII=0;//Current ASCII code
	HT[0].weight=0;
	for(int i = 1; i <= 6; i ++)
	{
		while(AsciiCount[CurrentASCII] == 0)//There may be a break in ASCII. Find the character whose count is not zero
			CurrentASCII++;
			
		HT[i].ch = CurrentASCII;//j is the ASCII code of the character whose count is not zero
		HT[i].weight = AsciiCount[CurrentASCII];//This is the number of occurrences of the character j as Huffman's weight
		HT[i].parent = 0;//Initialize parents
		HT[i].left = 0;//Initialize left child
		HT[i].right = 0;//Initialize right child
		
		CurrentASCII++;//Next ASCII code
	}
}

Find the two smallest probabilities

void selectMin(HTNode* &HT,int end,int &min1,int &min2)
{//In the array HT[1...end], find the two with the smallest weight
	int i;
	min1 = 0;
	min2 = 0;
	for(i = 1; i <= end; i++)
	{
		if(HT[i].parent != 0)//If this node already has parents, it means that it is already in the tree
			continue;
		else
		{
			if(min1 == 0)
				min1 = i;//Note the minimum value
			else if(HT[i].weight < HT[min1].weight)//Smaller value found
				min1 = i;			
		}
	}

	for(i = 1; i <= end; i++)
	{ 
		if(HT[i].parent != 0 || i == min1)
			continue;
		else
		{
			if(min2 == 0)
				min2 = i;
			else if(HT[i].weight < HT[min2].weight)
				min2 = i;			
		}
	}

	if(min1 > min2)
	{//Make sure min1 is minimum
		int t = min1;
		min1 = min2;
		min2 = t;
	}

}

Build a tree

void createHT(HTNode* &HT)
{//Create Huffman tree and get Huffman code
	int i = 0;
	for(i = 0; i <= 11; i++)
	{//Initialize all Huffman tables
		HT[i].parent=0;
		HT[i].left=0;
		HT[i].right=0;
	}

	for(i = 7; i <= 11; i++)
	{//Form Huffman tree
		int min1, min2;
		selectMin(HT, i - 1, min1, min2);//Select two minimum values from the i-1 items in front of the Huffman table, min1 is the minimum, and the left subtree of the constructed Huffman tree is small
		HT[min1].parent = i;
		HT[min2].parent = i;
		HT[i].weight = HT[min1].weight + HT[min2].weight;//Two small probabilities synthesize a node
		HT[i].left = min1;
		HT[i].right = min2;
	}

	//Huffman node generating leaves
	int code_length = 0;

	for(i = 1; i <= 6; i++)
	{
		char code[128];//Save Huffman code
		int j = i, k = 0;
		while(1)
		{
			int parentPosition = HT[j].parent;
			if(parentPosition == 0)
			{//This is the root node
				code[k] = '\0';
				HT[i].code = new char[k + 1];
				for(int x = 0; x < k; x++)
					HT[i].code[x] = code[k - 1 - x];
				HT[i].code[k] = '\0';
				
				printf("%c : %d : %s\n",HT[i].ch, HT[i].weight, HT[i].code);// Output character probability and coding value 
				break;
			}

			if(HT[parentPosition].left == j)
				code[k++] = '0';
			else
				code[k++] = '1';

			j = parentPosition;
		}
	}
}

Sprinkle flowers at the end 🌼🌻🌼🌻

Topics: Algorithm data structure Dynamic Programming