12.07.2015 Views

Data Compression: The Complete Reference

Data Compression: The Complete Reference

Data Compression: The Complete Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 LZ77 (Sliding Window) 171⌈log 2 S⌉, whereS is the length of the search buffer. In practice, the search buffer maybe a few thousand bytes long, so the offset size is typically 10–12 bits. <strong>The</strong> size of the“length” field is similarly ⌈log 2 (L − 1)⌉, whereL is the length of the look-ahead buffer(see below for the −1). In practice, the look-ahead buffer is only a few tens of byteslong, so the size of the “length” field is just a few bits. <strong>The</strong> size of the “symbol” fieldis typically 8 bits, but in general, it is ⌈log 2 A⌉, whereA is the alphabet size. <strong>The</strong> totalsize of the 1-symbol token (0, 0,...) may typically be 11 + 5 + 8 = 24 bits, much longerthan the raw 8-bit size of the (single) symbol it encodes.Here is an example showing why the “length” field may be longer than the size ofthe look-ahead buffer:...Mr.␣alf␣eastman␣easily␣grows␣alf|alfa␣in␣his␣garden... .<strong>The</strong> first symbol a in the look-ahead buffer matches the 5 a’s in the search buffer. Itseems that the two extreme a’s match with a length of 3 and the encoder should selectthe last (leftmost) of them and create the token (28,3,“a”). In fact, it creates the token(3,4,“␣”). <strong>The</strong> four-symbol string alfa in the look-ahead buffer is matched to the lastthree symbols alf in the search buffer and the first symbol a in the look-ahead buffer.<strong>The</strong> reason for this is that the decoder can handle such a token naturally, without anymodifications. It starts at position 3 of its search buffer and copies the next four symbols,one by one, extending its buffer to the right. <strong>The</strong> first three symbols are copies of theold buffer contents, and the fourth one is a copy of the first of those three. <strong>The</strong> nextexample is even more convincing (and only somewhat contrived):...alf␣eastman␣easily␣yells␣A|AAAAAAAAAAAAAAAH... .<strong>The</strong> encoder creates the token (1,9,“A”), matching the first nine copies of A in the lookaheadbuffer and including the tenth A. This is why, in principle, the length of a matchcan be up to the size of the look-ahead buffer minus 1.<strong>The</strong> decoder is much simpler than the encoder (LZ77 is thus an asymmetric compressionmethod). It has to maintain a buffer, equal in size to the encoder’s window.<strong>The</strong> decoder inputs a token, finds the match in its buffer, writes the match and the thirdtoken field on the output stream, and shifts the matched string and the third field intothe buffer. This implies that LZ77, or any of its variants, is useful in cases where a file iscompressed once (or just a few times) and is decompressed often. A rarely used archiveof compressed files is a good example.At first it seems that this method does not make any assumptions about the inputdata. Specifically, it does not pay attention to any symbol frequencies. A little thinking,however, shows that because of the nature of the sliding window, the LZ77 methodalways compares the look-ahead buffer to the recently input text in the search bufferand never to text that was input long ago (and has thus been flushed out of the searchbuffer). <strong>The</strong> method thus implicitly assumes that patterns in the input data occur closetogether. <strong>Data</strong> that satisfies this assumption will compress well.<strong>The</strong> basic LZ77 method was improved in several ways by researchers and programmersduring the 1980s and 1990s. One way to improve it is to use variable-size “offset”and “length” fields in the tokens. Another way is to increase the sizes of both buffers.Increasing the size of the search buffer makes it possible to find better matches, butthe tradeoff is an increased search time. A large search buffer thus requires a more

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!