How to Alphabetically Arrange Nodes in a Huffman Tree

252 Views Asked by At

Suppose there is a string of characters helloworld. The frequency of these characters are d1, e1, h1, l3, o2, r1, w1. So, when inserting them into the huffman tree, the root node should contain d1, but where should e1 and h1 go? In other words, how does one determine whether e and d are less than/greater than h?

1

There are 1 best solutions below

0
On BEST ANSWER

Many different Huffman trees are possible with those data, the root will not correspond to any of the letters: they will all be leaves. Ignoring the letter names and looking only at the frequencies, initially you have:

           1  1  1  1  1  2  3

Now you combine two with the lowest frequence; this could be any two of the frequency-$1$ letters. Do this again, and you’ll have the following stage:

                2           2  
               / \         / \  
     1        1   1       1   1        2       3

Now the two nodes with lowest cumulative frequency are the $1$ and any of the three $2$ nodes. Thus, even ignoring which letter is which, we can proceed to two different forests at the next stage. One is this:

           3  
          / \  
         1   2         2  
            / \       / \  
           1   1     1   1        2          3

The other is this:

                4  
               / \  
              /   \  
             2     2  
            / \   / \  
   1       1   1 1   1         2          3

In the first of these two outcomes the next step will be to combine the two $2$-nodes; the three top-level nodes will then have cumulative frequencies $3,3$, and $4$, so you’ll combine the two $3$-nodes to make a $6$-node, and then combine that with the $4$-node. The result:

                            10  
                           /  \  
                          /    \  
                         /      \  
                        /        \  
                       6          4  
                      / \        / \  
                     /   \      /   \  
                    3     3    2     2  
                   / \        / \  
                  1   2      1   1  
                     / \  
                    1   1

In the second you must first combine the $1$ and $2$ leaves to make a $3$-node. The two top-level $3$-nodes must then be combined to make a $6$-node, and that will then be combined with the $4$-node to complete the tree:

                                 10  
                                /  \  
                               /    \  
                              /      \  
                             6        4  
                            / \      / \  
                           3   3    /   \  
                          / \      2     2  
                         1   2    / \   / \  
                                 1   1 1   1

Thus, there are two fundamentally different tree structures that can arise, and each of them can be labelled in several different ways. Choice of tree and (compatible) labelling will determine the specific Huffman code generated, but all will be equally efficient.