Algorithms and Data Structures
Algorithms and Data Structures
Algorithms and Data Structures
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 173<br />
Instead of using the probabilities p i <strong>and</strong> q j , we will subsequently use such frequency counts <strong>and</strong> denote<br />
them by<br />
a i = number of times the search argument x equals k i<br />
b j = number of times the search argument x lies between k j <strong>and</strong> k j+1<br />
By convention, b 0 is the number of times that x is less than k 1 , <strong>and</strong> b n is the frequency of x being greater<br />
than k n (see Fig. 4.37). We will subsequently use P to denote the accumulated weighted path length instead<br />
of the average path length:<br />
P = (Si: 1 ≤ i ≤ n : a i *h i ) + (Si: 1 ≤ i ≤ m : b i *h' i )<br />
Thus, apart from avoiding the computation of the probabilities from measured frequency counts, we gain<br />
the further advantage of being able to use integers instead of fractions in our search for the optimal tree.<br />
Considering the fact that the number of possible configurations of n nodes grows exponentially with n,<br />
the task of finding the optimum seems rather hopeless for large n. Optimal trees, however, have one<br />
significant property that helps to find them: all their subtrees are optimal too. For instance, if the tree in Fig.<br />
4.37 is optimal, then the subtree with keys k 3 <strong>and</strong> k 4 is also optimal as shown. This property suggests an<br />
algorithm that systematically finds larger <strong>and</strong> larger trees, starting with individual nodes as smallest possible<br />
subtrees. The tree thus grows from the leaves to the root, which is, since we are used to drawing trees<br />
upside-down, the bottom-up direction [4-6].<br />
The equation that is the key to this algorithm is derived as follows: Let P be the weighted path length of a<br />
tree, <strong>and</strong> let P L <strong>and</strong> P R be those of the left <strong>and</strong> right subtrees of its root. Clearly, P is the sum of P L <strong>and</strong> P R ,<br />
<strong>and</strong> the number of times a search travels on the leg to the root, which is simply the total number W of<br />
search trials. We call W the weight of the tree. Its average path length is then P/W:<br />
P = P L + W + P R<br />
W = (Si: 1 ≤ i ≤ n : a i ) + (Si: 1 ≤ i ≤ m : b i )<br />
These considerations show the need for a denotation of the weights <strong>and</strong> the path lengths of any subtree<br />
consisting of a number of adjacent keys. Let T ij be the optimal subtree consisting of nodes with keys k i+1 ,<br />
k i+2 , ... , k j . Then let w ij denote the weight <strong>and</strong> let p ij denote the path length of T ij . Clearly P = p 0,n <strong>and</strong><br />
W = w 0,n . These quantities are defined by the following recurrence relations:<br />
w ii = b i (0 ≤ i ≤ n)<br />
w ij = w i,j-1 + a j + b j (0 ≤ i < j ≤ n)<br />
p ii = w ii (0 ≤ i ≤ n)<br />
p ij = w ij + MIN k: i < k ≤ j : (p i,k-1 + p kj ) (0 ≤ i < j ≤ n)<br />
The last equation follows immediately from the definitions of P <strong>and</strong> of optimality. Since there are<br />
approximately n 2 /2 values p ij , <strong>and</strong> because its definition calls for a choice among all cases such that 0 < j-<br />
i ≤ n, the minimization operation will involve approximately n 3 /6 operations. Knuth pointed out that a<br />
factor n can be saved by the following consideration, which alone makes this algorithm usable for practical<br />
purposes.<br />
Let r ij be a value of k which achieves the minimum for It is possible to limit the search for r ij to a much<br />
smaller interval, i.e., to reduce the number of the j-i evaluation steps. The key is the observation that if we<br />
have found the root r ij of the optimal subtree T ij , then neither extending the tree by adding a node at the<br />
right, nor shrinking the tree by removing its leftmost node ever can cause the optimal root to move to the<br />
left. This is expressed by the relation<br />
r i,j-1 ≤ r ij ≤ r i+1,j<br />
which limits the search for possible solutions for r ij to the range r i,j-1 ... r i+1,j . This results in a total<br />
number of elementary steps in the order of n 2 .