15.04.2018 Views

programming-for-dummies

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Lossless Data Compression Algorithms 441<br />

LZ78 algorithm creates a dictionary of codes that consist of the same size,<br />

the LZW algorithm creates a dictionary of codes of different sizes.<br />

When compressing text, the LZW algorithm starts out by creating a dictionary<br />

of individual letters. Assuming all uppercase letters, A would be represented<br />

by 1, B by 2, and so on. However, substituting a number <strong>for</strong> a single<br />

character isn’t likely to save much space, so the LZW algorithm continues<br />

examining the text <strong>for</strong> repetitive multiple-character strings to store as a<br />

number, such as AB, ABC, ABCD, and so on.<br />

Like most compression algorithms, the LZW algorithm works best on data<br />

that contains redundant in<strong>for</strong>mation, like this:<br />

IAMSAMSAMIAM#<br />

First, the LZW algorithm creates a dictionary of single characters represented<br />

by numbers. I gets stored as 9, A as 1, M as 13, and S as 19.<br />

When the LZW algorithm finds the second letter A, it doesn’t encode the letter<br />

A all over again because it’s done that once already. Instead, the algorithm<br />

encodes the next two characters, which happen to be AM, and assigns this twocharacter<br />

combination to the next available number, which is 27. (Numbers 1<br />

through 26 are assigned to the individual letters of the alphabet.)<br />

When the algorithm sees the letter S again, it encodes the next two-character<br />

string, SA, as the number 28. Then it finds the letter M again, so it encodes<br />

the next two-character string, MI, as the number 29. Finally, it sees the letter<br />

A again, so it checks the next two-character string, which is AM. Because the<br />

algorithms already encoded AM be<strong>for</strong>e (as the number 27), the algorithm<br />

expands to encode the three-character string, AM#, as the number 30, as<br />

shown in Figure 4-4.<br />

At the beginning of data, the LZW algorithm isn’t very efficient because it’s<br />

slowly creating its dictionary. When the algorithm’s dictionary grows with<br />

larger amounts of redundant data, it can replace these large chunks of redundant<br />

data with small number codes.<br />

Book IV<br />

Chapter 4<br />

The LZW algorithm is used to compress graphic images stored in the Graphic<br />

Interchange Format (GIF). Originally, this algorithm was patented in 1985 and<br />

the patent holder, Unisys, demanded royalties from software companies that<br />

sold programs that could create GIF files. This patent problem caused computer<br />

scientists to create and promote an alternate graphics <strong>for</strong>mat —<br />

Portable Network Graphics (PNG). However, the PNG <strong>for</strong>mat never replaced<br />

the GIF file <strong>for</strong>mat, especially after the LZW patent expired on June 20, 2003.<br />

Data Compression<br />

Algorithms

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!