04.02.2018 Views

Algorithms

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

838 CHAPTER 5 ■ Strings<br />

One reason for the popularity of Huffman compression is that it is effective for<br />

various types of files, not just natural language text. We have been careful to code the<br />

method so that it can work properly for any 8-bit value in each 8-bit character. In<br />

other words, we can apply it to any bytestream whatsoever. Several examples, for file<br />

types that we have considered earlier in this section, are shown in the figure at the bottom<br />

of this page. These examples show that Huffman compression is competitive with<br />

both fixed-length encoding and run-length encoding, even though those methods are<br />

designed to perform well for certain types of files. Understanding the reason Huffman<br />

encoding performs well in these domains is instructive. In the case of genomic<br />

data, Huffman compression essentially discovers a 2-bit code, as the four letters appear<br />

with approximately equal frequency so that the Huffman trie is balanced, with each<br />

character assigned a 2-bit code. In the case of run-length encoding, 00000000 and<br />

11111111 are likely to be the most frequently occurring characters, so they are likely<br />

to be encoded with 2 or 3 bits, leading to substantial compression.<br />

virus (50000 bits)<br />

% java Genome - < genomeVirus.txt | java PictureDump 512 25<br />

12536 bits<br />

% java Huffman - < genomeVirus.txt | java PictureDump 512 25<br />

12576 bits Huffman compression needs just 40 more bits than custom 2-bit code<br />

bitmap (1536 bits)<br />

% java RunLength - < q32x48.bin | java BinaryDump 0<br />

1144 bits<br />

% java Huffman - < q32x48.bin | java BinaryDump 0<br />

816 bits Huffman compression uses 29% fewer bits than customized method<br />

higher-resolution bitmap (6144 bits)<br />

% java RunLength - < q64x96.bin | java BinaryDump 0<br />

2296 bits<br />

% java Huffman - < q64x96.bin | java BinaryDump 0<br />

2032 bits gap narrows to 11% for higher resolution<br />

Compressing and expanding genomic data and bitmaps with Huffman encoding

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!