31.01.2014 Views

Version 5.0 The LEDA User Manual

Version 5.0 The LEDA User Manual

Version 5.0 The LEDA User Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9.3 Prediction by Partial Matching ( PPMIICoder )<br />

1. Definition<br />

<strong>The</strong> PPMIICoder is based on the compression scheme “Prediction by Partial Matching<br />

with Information Inheritance” by D. Shkarin [81].<br />

This coder works as follows: Suppose we have processed the first n−1 symbols x 1 . . . x n−1<br />

of the stream. Before reading the next symbol x n we try to guess it, i.e. for every symbol<br />

s we estimate the probability p(s) for the event “x n = s”. This probability distribution<br />

determines how the next symbol is encoded: <strong>The</strong> higher p(s), the fewer bits are used for<br />

encoding s. If our estimation is good, which means that p(x n ) is high, then we obtain a<br />

good compression rate.<br />

In order to predict the probality distribution for the nth symbol the PPM approach<br />

considers the preceding k symbols x n−k . . . x n−1 . We call these symbols the context of x n<br />

and k the order of the model. (For k = 0 we obtain the order-0 model from the previous<br />

section.) E.g., if the current context is “req”, then we should predict the letter “u” as<br />

next symbol with high probability.<br />

PPMII is a variant of PPM which usually achieves very accurate estimations.<br />

<strong>The</strong> PPMIICoder combines very good compression rates with acceptable speed. (Shkarin<br />

[81] reports that his coder outperforms ZIP and BZIP2 with respect to compression rates<br />

and speed.) <strong>The</strong> only disadvantage of this coder is that it needs a fair amount of main<br />

memory to store the model. However, the user can set an upper bound on the memory<br />

usage. And he can specify which model restoration method the coder shall apply when it<br />

runs out of memory:<br />

• mr restart (default):<br />

<strong>The</strong> model is deleted completely and rebuilt from scratch. This method is fast.<br />

• mr cut off :<br />

Parts of the model are freed to gain memory. This method is optimal for so-called<br />

quasistationary sources. It usually gives better compression but it is slower.<br />

• mr freeze:<br />

<strong>The</strong> model is not extended any more. This method is optimal for so-called stationary<br />

sources. (We want to point out that data streams arising in practical applications<br />

usually do not behave like a stationary source.)<br />

#include < <strong>LEDA</strong>/coding/PPMII.h ><br />

2. Types<br />

PPMIICoder :: mr method { mr restart, mr cut off, mr freeze }<br />

the different model restoration modes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!