[XJTUSE] data structure learning - 3.3 Huffman tree

Posted by amorphous on Fri, 04 Feb 2022 12:40:55 +0100

3.3 Huffman tree

Basic concepts

Path length: the number of branches on the path between two nodes

External path length of the tree: the sum of the path length from each leaf node to the root node

Internal path length of the tree: the sum of the path length from each non leaf node to the root node

Weighted path length of tree: the sum of weighted path lengths of all leaf nodes in the tree

Huffman tree definition: it is a kind of tree with the shortest weighted path length

For example: find the weighted path length of the following binary tree

☑️ The depth of leaf nodes with large weight is small, and its cost relative to the total path length is the smallest. Therefore, if the weight of other leaf nodes is small, they will be pushed to the deeper part of the tree

Construction algorithm

❓ How to construct Huffman tree?

one ️⃣ According to the given n weights { w 1 , w 2 . . . , w n } \{w_1, w_2 ..., w_n\} {w1, w2..., wn}, construct a set of n binary trees F = { T 1 , T 2 , . . . , T n } F=\{T_1,T_2,...,T_n\} F={T1, T2,..., Tn}, where each binary tree contains only one with weight of w i w_i The root node of wi , whose left and right subtrees are empty trees;

two ️⃣ In F, two binary trees with the smallest weight of their root node are selected as the left and right subtrees to construct a new binary tree, and the weight of the root node of the new binary tree is the sum of the weight of their left and right subtrees;

three ️⃣ Delete the two trees from F and add the newly generated new tree at the same time;

four ️⃣ (4) Repeat steps (2) and (3) until there is only one tree in F

Huffman coding

Prefix code

Code compiled using Huffman tree has prefix property prefix: any code in a group of codes is not the prefix of another code

This feature ensures that there are no multiple possibilities when the code string is de encoded

Character encoding

Using the characteristics of Huffman tree, unequal length coding is written for characters with different frequency, so as to shorten the length of the whole file

This is isinglass

■ the frequency of t is 1, the frequency of H is 1, the frequency of I is 4, and the frequency of S is 5
■ the frequency of n is 1, the frequency of G is 1, the frequency of a is 1, and the frequency of I is 1

If the same length encoding form is adopted, the above eight letters need three binary encoding
Length = 15 * 3 = 45

Create a Huffman tree according to the frequency of the letters above

graphic

Code implementation (java)

class Letter {
    char element;//letter
    double weight;//Frequency of letters

    public Letter(char element, double weight) {
        this.element = element;
        this.weight = weight;
    }

    public char getElement() {
        return element;
    }

    public void setElement(char element) {
        this.element = element;
    }

    public double getWeight() {
        return weight;
    }

    public void setWeight(double weight) {
        this.weight = weight;
    }
}

class HuffTreeNode {
    Letter letter;
    HuffTreeNode left;//Left child node
    HuffTreeNode right;//Right child node

    public Letter getLetter() {
        return letter;
    }

    public void setLetter(Letter letter) {
        this.letter = letter;
    }

    public HuffTreeNode getLeft() {
        return left;
    }

    public void setLeft(HuffTreeNode left) {
        this.left = left;
    }

    public HuffTreeNode getRight() {
        return right;
    }

    public void setRight(HuffTreeNode right) {
        this.right = right;
    }
}

public class HuffmanTree {
    //Simple bubble sorting
    private void sort(HuffTreeNode[] nodes) {
        int flags = 0;
        for (int i = 0; i < nodes.length-1; i++) {
            for (int j = 0; j < nodes.length-1-i; j++) {
                if (nodes[j].letter.weight > nodes[j + 1].letter.weight) {
                    HuffTreeNode temp = nodes[j];
                    nodes[j] = nodes[j + 1];
                    nodes[j + 1] = temp;
                    flags = 1;//If it is not ordered, set flags to 1;

                }
            }
            if (flags == 0)
                return;
        }
    }

    /**
     * Generate Huffman tree according to letters and their frequencies
     * @param letters
     * @return
     */
    public HuffTreeNode generateHuffTree(Letter[] letters) {
        HuffTreeNode[] nodes = new HuffTreeNode[letters.length];
        for (int i = 0; i < letters.length; i++) {
            nodes[i] = new HuffTreeNode();
            nodes[i].letter = letters[i];
        }
        while (nodes.length > 1) {
            sort(nodes);
            HuffTreeNode node1 = nodes[0];
            HuffTreeNode node2 = nodes[1];
            HuffTreeNode newTree = new HuffTreeNode();
            Letter temp = new Letter('0',node1.getLetter().getWeight()+node2.getLetter().getWeight());
            newTree.setLetter(temp);
            newTree.setLeft(node1);
            newTree.setRight(node2);
            HuffTreeNode[] nodes2 = new HuffTreeNode[nodes.length - 1];//New node array, length minus one
            for (int i = 2; i < nodes.length; i++) {
                nodes2[i - 2] = nodes[i];
            }
            nodes2[nodes2.length - 1] = newTree;
            nodes = nodes2;
        }
        return nodes[0];
    }

    /**
     * Postorder traversal
     * @param root Root node
     * @param code code
     */
    public void print(HuffTreeNode root,String code){
        if(root != null) {
            print(root.getLeft(),code+"0");
            print(root.getRight(),code+"1");
            if(root.getLeft() == null && root.getRight() == null) {
                String m=root.getLetter().getElement()+"frequency:"+root.getLetter().getWeight()+" Huffman code:"+code;
                System.out.println(m);
            }
        }
    }
    public static void main(String[] args) {
        Letter a = new Letter('a', 1);
        Letter g = new Letter('g', 1);
        Letter h = new Letter('h', 1);
        Letter l = new Letter('l', 1);
        Letter n = new Letter('n', 1);
        Letter t = new Letter('t', 1);
        Letter i = new Letter('i', 4);
        Letter s = new Letter('s', 5);
        Letter[] test = {a, g, h, l, n, t, i, s};
        HuffmanTree huffmanTree = new HuffmanTree();
        huffmanTree.print(huffmanTree.generateHuffTree(test),"");
    }
}
n frequency:1.0 Huffman code: 000
t frequency:1.0 Huffman code: 001
i frequency:4.0 Huffman code: 01
a frequency:1.0 Huffman code: 1000
g frequency:1.0 Huffman code: 1001
h frequency:1.0 Huffman code: 1010
l frequency:1.0 Huffman code: 1011
s frequency:5.0 Huffman code: 11

Non equal probability random number

Generate the corresponding random number according to the given probability
For example, there are six numbers: 1, 2, 3, 4, 5 and 6. Write a random generator to generate the corresponding six numbers according to the following probabilities (0.15, 0.20, 0.10, 0.30, 0.12 and 0.13)

one ️⃣ Solution 1: you can use the random number generation function in Java API to generate numbers between [0, 1) and generate numbers according to the interval

two ️⃣ Solution 2: use Huffman tree to reduce the number of comparisons

    public static int randomGenerate() {
        double temp = Math.random();
        int result=0;
        if (temp < 0.42) {
            if (temp < 0.22) {
                if (temp < 0.10) {
                    result = 3;
                } else {
                    result = 5;
                }
            } else {
                result = 2;
            }
        } else {
            if (temp < 0.72) {
                result = 4;
            } else {
                if (temp < 0.85) {
                    result = 6;
                } else {
                    result = 1;
                }
            }
        }
        return result;
    }

Topics: Algorithm data structure