12.07.2015 Views

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sec. 5.6 Huffman Coding Trees 197A set of codes is said <strong>to</strong> meet the prefix property if no code in the set is theprefix of another. The prefix property guarantees that there will be no ambiguity inhow a bit string is decoded. In other words, once we reach the last bit of a codeduring the decoding process, we know which letter it is the code for. Huffman codescertainly have the prefix property because any prefix for a code would correspond <strong>to</strong>an internal node, while all codes correspond <strong>to</strong> leaf nodes. For example, the codefor M is ‘11111.’ Taking five right branches in the Huffman tree of Figure 5.26brings us <strong>to</strong> the leaf node containing M. We can be sure that no letter can have code‘111’ because this corresponds <strong>to</strong> an internal node of the tree, <strong>and</strong> the tree-buildingprocess places letters only at the leaf nodes.How efficient is Huffman coding? In theory, it is an optimal coding methodwhenever the true frequencies are known, <strong>and</strong> the frequency of a letter is independen<strong>to</strong>f the context of that letter in the message. In practice, the frequencies ofletters do change depending on context. For example, while E is the most commonlyused letter of the alphabet in English documents, T is more common as thefirst letter of a word. This is why most commercial compression utilities do not useHuffman coding as their primary coding method, but instead use techniques thattake advantage of the context for the letters.Another fac<strong>to</strong>r that affects the compression efficiency of Huffman coding is therelative frequencies of the letters. Some frequency patterns will save no space ascompared <strong>to</strong> fixed-length codes; others can result in great compression. In general,Huffman coding does better when there is large variation in the frequencies ofletters. In the particular case of the frequencies shown in Figure 5.31, we c<strong>and</strong>etermine the expected savings from Huffman coding if the actual frequencies of acoded message match the expected frequencies.Example 5.11 Because the sum of the frequencies in Figure 5.31 is 306<strong>and</strong> E has frequency 120, we expect it <strong>to</strong> appear 120 times in a messagecontaining 306 letters. An actual message might or might not meet thisexpectation. Letters D, L, <strong>and</strong> U have code lengths of three, <strong>and</strong> <strong>to</strong>getherare expected <strong>to</strong> appear 121 times in 306 letters. Letter C has a code length offour, <strong>and</strong> is expected <strong>to</strong> appear 32 times in 306 letters. Letter M has a codelength of five, <strong>and</strong> is expected <strong>to</strong> appear 24 times in 306 letters. Finally,letters K <strong>and</strong> Z have code lengths of six, <strong>and</strong> <strong>to</strong>gether are expected <strong>to</strong> appearonly 9 times in 306 letters. The average expected cost per character issimply the sum of the cost for each character (c i ) times the probability of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!