Algorithms and Data Structures

More documents

Recommendations

Info

N.Wirth. Algorithms and Data Structures. Oberon version 172 a) 3 b) 3 c) 2 d) 1 e) 1 2 1 1 3 3 2 1 2 2 3 Fig. 4.36. The search trees with 3 nodes The weighted path lengths of trees (a) to (e) are computed according to their definition as P(a) = 11/7, P(b) = 12/7, P(c) = 12/7, P(d) = 15/7, P(e) = 17/7 Hence, in this example, not the perfectly balanced tree (c), but the degenerate tree (a) turns out to be optimal. The example of the compiler scanner immediately suggests that this problem should be viewed under a slightly more general condition: words occurring in the source text are not always keywords; as a matter of fact, their being keywords is rather the exception. Finding that a given word k is not a key in the search tree can be considered as an access to a hypothetical "special node" inserted between the next lower and next higher key (see Fig. 4.19) with an associated external path length. If the probability q i of a search argument x lying between the two keys k i and k i+1 is also known, this information may considerably change the structure of the optimal search tree. Hence, we generalize the problem by also considering unsuccessful searches. The overall average weighted path length is now P = (Si: 1 ≤ i ≤ n : p i *h i ) + (Si: 1 ≤ i ≤ m : q i *h' i ) where (Si: 1 ≤ i ≤ n : p i ) + (Si: 1 ≤ i ≤ m : q i ) = 1 and where, h i is the level of the (internal) node i and h' j is the level of the external node j. The average weighted path length may be called the cost of the search tree, since it represents a measure for the expected amount of effort to be spent for searching. The search tree that requires the minimal cost among all trees with a given set of keys k i and probabilities p i and q i is called the optimal tree. k 2|a 2 k 1|a 1 k 4|a 4 b 0 b 1 k 3|a 3 b 4 b 2 b 3 Fig. 4.37. Search tree with associated access frequencies For finding the optimal tree, there is no need to require that the p's and q's sum up to 1. In fact, these probabilities are commonly determined by experiments in which the accesses to nodes are counted.
N.Wirth. Algorithms and Data Structures. Oberon version 173 Instead of using the probabilities p i and q j , we will subsequently use such frequency counts and denote them by a i = number of times the search argument x equals k i b j = number of times the search argument x lies between k j and k j+1 By convention, b 0 is the number of times that x is less than k 1 , and b n is the frequency of x being greater than k n (see Fig. 4.37). We will subsequently use P to denote the accumulated weighted path length instead of the average path length: P = (Si: 1 ≤ i ≤ n : a i *h i ) + (Si: 1 ≤ i ≤ m : b i *h' i ) Thus, apart from avoiding the computation of the probabilities from measured frequency counts, we gain the further advantage of being able to use integers instead of fractions in our search for the optimal tree. Considering the fact that the number of possible configurations of n nodes grows exponentially with n, the task of finding the optimum seems rather hopeless for large n. Optimal trees, however, have one significant property that helps to find them: all their subtrees are optimal too. For instance, if the tree in Fig. 4.37 is optimal, then the subtree with keys k 3 and k 4 is also optimal as shown. This property suggests an algorithm that systematically finds larger and larger trees, starting with individual nodes as smallest possible subtrees. The tree thus grows from the leaves to the root, which is, since we are used to drawing trees upside-down, the bottom-up direction [4-6]. The equation that is the key to this algorithm is derived as follows: Let P be the weighted path length of a tree, and let P L and P R be those of the left and right subtrees of its root. Clearly, P is the sum of P L and P R , and the number of times a search travels on the leg to the root, which is simply the total number W of search trials. We call W the weight of the tree. Its average path length is then P/W: P = P L + W + P R W = (Si: 1 ≤ i ≤ n : a i ) + (Si: 1 ≤ i ≤ m : b i ) These considerations show the need for a denotation of the weights and the path lengths of any subtree consisting of a number of adjacent keys. Let T ij be the optimal subtree consisting of nodes with keys k i+1 , k i+2 , ... , k j . Then let w ij denote the weight and let p ij denote the path length of T ij . Clearly P = p 0,n and W = w 0,n . These quantities are defined by the following recurrence relations: w ii = b i (0 ≤ i ≤ n) w ij = w i,j-1 + a j + b j (0 ≤ i < j ≤ n) p ii = w ii (0 ≤ i ≤ n) p ij = w ij + MIN k: i < k ≤ j : (p i,k-1 + p kj ) (0 ≤ i < j ≤ n) The last equation follows immediately from the definitions of P and of optimality. Since there are approximately n 2 /2 values p ij , and because its definition calls for a choice among all cases such that 0 < j- i ≤ n, the minimization operation will involve approximately n 3 /6 operations. Knuth pointed out that a factor n can be saved by the following consideration, which alone makes this algorithm usable for practical purposes. Let r ij be a value of k which achieves the minimum for It is possible to limit the search for r ij to a much smaller interval, i.e., to reduce the number of the j-i evaluation steps. The key is the observation that if we have found the root r ij of the optimal subtree T ij , then neither extending the tree by adding a node at the right, nor shrinking the tree by removing its leftmost node ever can cause the optimal root to move to the left. This is expressed by the relation r i,j-1 ≤ r ij ≤ r i+1,j which limits the search for possible solutions for r ij to the range r i,j-1 ... r i+1,j . This results in a total number of elementary steps in the order of n 2 .
Page 1 and 2:
Algorithms and Data Structures © N
Page 3 and 4:
N.Wirth. Algorithms and Data Struct
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122: N.Wirth. Algorithms and Data Struct
Page 171: N.Wirth. Algorithms and Data Struct
Page 211: N.Wirth. Algorithms and Data Struct
show all

Algorithms and Data Structures

Create successful ePaper yourself

Delete template?

Save as template?