12.07.2015 Views

Data Compression: The Complete Reference

Data Compression: The Complete Reference

Data Compression: The Complete Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.18 PPM 145Figure 2.73 shows how such a trie is constructed for the string “zxzyzxxyzx” assumingN = 2. A quick glance shows that the tree grows in width but not in depth. Itsdepth remains N + 1 = 3 regardless of how much input data has been read. Its widthgrows as more and more symbols are input, but not at a constant rate. Sometimes, nonew nodes are added, such as in case 10, when the last x is read. At other times, up tothree nodes are added, such as in cases 3 and 4, when the second z and the first y areadded.Level 1 of the trie (just below the root) contains one node for each symbol readso far. <strong>The</strong>se are the order-1 contexts. Level 2 contains all the order-2 contexts, andso on. Every context can be found by starting at the root and sliding down to one ofthe leaves. In case 3, e.g., the two contexts are xz (symbol z preceded by the order-1context x) andzxz (symbol z preceded by the order-2 context zx). In case 10, there areseven contexts ranging from xxy and xyz on the left to zxz and zyz on the right.<strong>The</strong> numbers in the nodes are context counts. <strong>The</strong> “z,4” ontherightbranchofcase 10 implies that z has been seen 4 times. <strong>The</strong> “x,3” and“y,1” below it mean thatthese four occurrences were followed by x three times and by y once. <strong>The</strong> circled nodesshow the different orders of the context of the last symbol added to the trie. In case 3,e.g., the second z has just been read and added to the trie. It was added twice, belowthe x of the left branch and the x of the right branch (the latter is indicated by thearrow). Also, the count of the original z has been incremented to 2. This shows thatthe new z follows the two contexts x (of order 1) and zx (order 2).It should now be easy for the reader to follow the ten steps of constructing the treeand to understand intuitively how nodes are added and counts updated. Notice thatthree nodes (or, in general, N +1 nodes, one at each level of the trie) are involved in eachstep (except the first few steps when the trie hasn’t reached its final height yet). Someof the three are new nodes added to the trie; the others have their counts incremented.<strong>The</strong> next point that should be discussed is how the algorithm decides which nodesto update and which to add. To simplify the algorithm, one more pointer is added toeach node, pointing backward to the node representing the next shorter context. Apointer that points backward in a tree is called a vine pointer.Figure 2.74 shows the first ten steps in the of construction of the PPM trie for the14-symbol string “assanissimassa”. Each of the ten steps shows the new vine pointers(dashed lines in the figure) constructed by the trie updating algorithm while that stepwas handled. Notice that old vine pointers are not deleted; they are just not shown inlater diagrams. In general, a vine pointer points from a node X on level n toanodewith the same symbol X on level n − 1. All nodes on level 1 point to the root.A node in the PPM trie thus contains the following fields:1. <strong>The</strong> code (ASCII or other) of the symbol.2. <strong>The</strong> count.3. A down pointer, pointing to the leftmost child of the node. In Figure 2.74, Case 10,for example, the leftmost son of the root is “a,2”. That of “a,2” is“n,1” and that of“s,4” is“a,1”.4. A right pointer, pointing to the next sibling of the node. <strong>The</strong> root has no rightsibling. <strong>The</strong> next sibling of node “a,2” is“i,2” and that of “i,2” is“m,1”.5. A vine pointer. <strong>The</strong>se are shown as dashed arrows in Figure 2.74.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!