12.07.2015 Views

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

198 Chap. 5 Binary Treesits occurring (p i ), orThis can be reorganized asc 1 p 1 + c 2 p 2 + · · · + c n p n .c 1 f 1 + c 2 f 2 + · · · + c n f nf Twhere f i is the (relative) frequency of letter i <strong>and</strong> f T is the <strong>to</strong>tal for all letterfrequencies. For this set of frequencies, the expected cost per letter is[(1×120)+(3×121)+(4×32)+(5×24)+(6×9)]/306 = 785/306 ≈ 2.57A fixed-length code for these eight characters would require log 8 = 3 bitsper letter as opposed <strong>to</strong> about 2.57 bits per letter for Huffman coding. Thus,Huffman coding is expected <strong>to</strong> save about 14% for this set of letters.Huffman coding for all ASCII symbols should do better than this. The letters ofFigure 5.31 are atypical in that there are <strong>to</strong>o many common letters compared <strong>to</strong> thenumber of rare letters. Huffman coding for all 26 letters would yield an expectedcost of 4.29 bits per letter. The equivalent fixed-length code would require aboutfive bits. This is somewhat unfair <strong>to</strong> fixed-length coding because there is actuallyroom for 32 codes in five bits, but only 26 letters. More generally, Huffman codingof a typical text file will save around 40% over ASCII coding if we charge ASCIIcoding at eight bits per character. Huffman coding for a binary file (such as acompiled executable) would have a very different set of distribution frequencies <strong>and</strong>so would have a different space savings. Most commercial compression programsuse two or three coding schemes <strong>to</strong> adjust <strong>to</strong> different types of files.In the preceding example, “DEED” was coded in 8 bits, a saving of 33% overthe twelve bits required from a fixed-length coding. However, “MUCK” requires18 bits, more space than required by the corresponding fixed-length coding. Theproblem is that “MUCK” is composed of letters that are not expected <strong>to</strong> occuroften. If the message does not match the expected frequencies of the letters, thanthe length of the encoding will not be as expected either.5.7 Further ReadingSee Shaffer <strong>and</strong> Brown [SB93] for an example of a tree implementation where aninternal node pointer field s<strong>to</strong>res the value of its child instead of a pointer <strong>to</strong> itschild when the child is a leaf node.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!