25.11.2014 Views

Algorithms and Data Structures

Algorithms and Data Structures

Algorithms and Data Structures

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 173<br />

Instead of using the probabilities p i <strong>and</strong> q j , we will subsequently use such frequency counts <strong>and</strong> denote<br />

them by<br />

a i = number of times the search argument x equals k i<br />

b j = number of times the search argument x lies between k j <strong>and</strong> k j+1<br />

By convention, b 0 is the number of times that x is less than k 1 , <strong>and</strong> b n is the frequency of x being greater<br />

than k n (see Fig. 4.37). We will subsequently use P to denote the accumulated weighted path length instead<br />

of the average path length:<br />

P = (Si: 1 ≤ i ≤ n : a i *h i ) + (Si: 1 ≤ i ≤ m : b i *h' i )<br />

Thus, apart from avoiding the computation of the probabilities from measured frequency counts, we gain<br />

the further advantage of being able to use integers instead of fractions in our search for the optimal tree.<br />

Considering the fact that the number of possible configurations of n nodes grows exponentially with n,<br />

the task of finding the optimum seems rather hopeless for large n. Optimal trees, however, have one<br />

significant property that helps to find them: all their subtrees are optimal too. For instance, if the tree in Fig.<br />

4.37 is optimal, then the subtree with keys k 3 <strong>and</strong> k 4 is also optimal as shown. This property suggests an<br />

algorithm that systematically finds larger <strong>and</strong> larger trees, starting with individual nodes as smallest possible<br />

subtrees. The tree thus grows from the leaves to the root, which is, since we are used to drawing trees<br />

upside-down, the bottom-up direction [4-6].<br />

The equation that is the key to this algorithm is derived as follows: Let P be the weighted path length of a<br />

tree, <strong>and</strong> let P L <strong>and</strong> P R be those of the left <strong>and</strong> right subtrees of its root. Clearly, P is the sum of P L <strong>and</strong> P R ,<br />

<strong>and</strong> the number of times a search travels on the leg to the root, which is simply the total number W of<br />

search trials. We call W the weight of the tree. Its average path length is then P/W:<br />

P = P L + W + P R<br />

W = (Si: 1 ≤ i ≤ n : a i ) + (Si: 1 ≤ i ≤ m : b i )<br />

These considerations show the need for a denotation of the weights <strong>and</strong> the path lengths of any subtree<br />

consisting of a number of adjacent keys. Let T ij be the optimal subtree consisting of nodes with keys k i+1 ,<br />

k i+2 , ... , k j . Then let w ij denote the weight <strong>and</strong> let p ij denote the path length of T ij . Clearly P = p 0,n <strong>and</strong><br />

W = w 0,n . These quantities are defined by the following recurrence relations:<br />

w ii = b i (0 ≤ i ≤ n)<br />

w ij = w i,j-1 + a j + b j (0 ≤ i < j ≤ n)<br />

p ii = w ii (0 ≤ i ≤ n)<br />

p ij = w ij + MIN k: i < k ≤ j : (p i,k-1 + p kj ) (0 ≤ i < j ≤ n)<br />

The last equation follows immediately from the definitions of P <strong>and</strong> of optimality. Since there are<br />

approximately n 2 /2 values p ij , <strong>and</strong> because its definition calls for a choice among all cases such that 0 < j-<br />

i ≤ n, the minimization operation will involve approximately n 3 /6 operations. Knuth pointed out that a<br />

factor n can be saved by the following consideration, which alone makes this algorithm usable for practical<br />

purposes.<br />

Let r ij be a value of k which achieves the minimum for It is possible to limit the search for r ij to a much<br />

smaller interval, i.e., to reduce the number of the j-i evaluation steps. The key is the observation that if we<br />

have found the root r ij of the optimal subtree T ij , then neither extending the tree by adding a node at the<br />

right, nor shrinking the tree by removing its leftmost node ever can cause the optimal root to move to the<br />

left. This is expressed by the relation<br />

r i,j-1 ≤ r ij ≤ r i+1,j<br />

which limits the search for possible solutions for r ij to the range r i,j-1 ... r i+1,j . This results in a total<br />

number of elementary steps in the order of n 2 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!