12.07.2015 Views

Data Compression: The Complete Reference

Data Compression: The Complete Reference

Data Compression: The Complete Reference

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

66 2. Statistical Methodsgreater than that of the expanded symbol.n∑n∑2 −Li =2 −L1 +2 −L2 +132 − log 2 n=2 − log 2 n+a +2 − log 2 n−e += 2an + 2−en +1− 2 n .<strong>The</strong> Kraft-MacMillan inequality requires thatn∑2 − log 2 n − 2 × 2 − log 2 n2 an + 2−en +1− 2 n ≤ 1, or 2 an + 2−en − 2 n ≤ 0,or 2 −e ≤ 2 − 2 a , implying − e ≤ log 2 (2 − 2 a ), or e ≥−log 2 (2 − 2 a ).<strong>The</strong> inequality above implies a ≤ 1 (otherwise, 2 − 2 a is negative) but a is also positive(since we assumed compression of symbol 1). <strong>The</strong> possible range of values of a is thus(0, 1], and in this range e>a, proving the statement above. (It is easy to see thata =1→ e ≥−log 2 0=∞, anda =0.1 → e ≥−log 2 (2 − 2 0.1 ) ≈ 0.10745.)It can be shown that this is just a special case of a general result that says; If youhave an alphabet of n symbols, and you compress some of them by a certain factor, thenthe others must be expanded by a greater factor.12.6 <strong>The</strong> Counting ArgumentThis section has been removed because of space considerations and is available in thebook’s web site.2.7 Shannon-Fano CodingShannon-Fano coding was the first method developed for finding good variable-sizecodes. We start with a set of n symbols with known probabilities (or frequencies)of occurrence. <strong>The</strong> symbols are first arranged in descending order of their probabilities.<strong>The</strong> set of symbols is then divided into two subsets that have the same (or almost thesame) probabilities. All symbols in one subset get assigned codes that start with a 0,while the codes of the symbols in the other subset start with a 1. Each subset is thenrecursively divided into two, and the second bit of all the codes is determined in a similarway. When a subset contains just two symbols, their codes are distinguished by addingone more bit to each. <strong>The</strong> process continues until no more subsets remain. Table 2.12illustrates the Shannon-Fano code for a seven-symbol alphabet. Notice that the symbolsthemselves are not shown, only their probabilities.<strong>The</strong> first step splits the set of seven symbols into two subsets, the first one with twosymbols and a total probability of 0.45, the second one with the remaining five symbols

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!