Algorithms and Data Structures
Algorithms and Data Structures
Algorithms and Data Structures
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 171<br />
ELSE (*delete p^*)<br />
q := p;<br />
IF q.right = NIL THEN p := q.left; h := TRUE<br />
ELSIF q.left = NIL THEN p := q.right; h := TRUE<br />
ELSE<br />
del(q.left, h);<br />
IF h THEN balanceL(p, h) END<br />
END<br />
END<br />
END delete<br />
Fortunately, deletion of an element in a balanced tree can also be performed with — in the worst case<br />
— O(log n) operations. An essential difference between the behaviour of the insertion <strong>and</strong> deletion<br />
procedures must not be overlooked, however. Whereas insertion of a single key may result in at most one<br />
rotation (of two or three nodes), deletion may require a rotation at every node along the search path.<br />
Consider, for instance, deletion of the rightmost node of a Fibonacci-tree. In this case the deletion of any<br />
single node leads to a reduction of the height of the tree; in addition, deletion of its rightmost node requires<br />
the maximum number of rotations. This therefore represents the worst choice of node in the worst case of a<br />
balanced tree, a rather unlucky combination of chances. How probable are rotations, then, in general?<br />
The surprising result of empirical tests is that whereas one rotation is invoked for approximately every<br />
two insertions, one is required for every five deletions only. Deletion in balanced trees is therefore about as<br />
easy — or as complicated — as insertion.<br />
4.6 Optimal Search Trees<br />
So far our consideration of organizing search trees has been based on the assumption that the frequency<br />
of access is equal for all nodes, that is, that all keys are equally probable to occur as a search argument.<br />
This is probably the best assumption if one has no idea of access distribution. However, there are cases<br />
(they are the exception rather than the rule) in which information about the probabilities of access to<br />
individual keys is available. These cases usually have the characteristic that the keys always remain the<br />
same, i.e., the search tree is subjected neither to insertion nor deletion, but retains a constant structure. A<br />
typical example is the scanner of a compiler which determines for each word (identifier) whether or not it is<br />
a keyword (reserved word). Statistical measurements over hundreds of compiled programs may in this<br />
case yield accurate information on the relative frequencies of occurrence, <strong>and</strong> thereby of access, of<br />
individual keys.<br />
Assume that in a search tree the probability with which node i is accessed is<br />
Pr {x = k i } = p i , (Si: 1 ≤ i ≤ n : p i ) = 1<br />
We now wish to organize the search tree in a way that the total number of search steps - counted over<br />
sufficiently many trials - becomes minimal. For this purpose the definition of path length is modified by (1)<br />
attributing a certain weight to each node <strong>and</strong> by (2) assuming the root to be at level 1 (instead of 0),<br />
because it accounts for the first comparison along the search path. Nodes that are frequently accessed<br />
become heavy nodes; those that are rarely visited become light nodes. The (internal) weighted path length<br />
is then the sum of all paths from the root to each node weighted by that node's probability of access.<br />
P = Si: 1 ≤ i ≤ n : p i * h i<br />
h i is the level of node i. The goal is now to minimize the weighted path length for a given probability<br />
distribution. As an example, consider the set of keys 1, 2, 3, with probabilities of access p 1 = 1/7, p 2 =<br />
2/7 <strong>and</strong> p 3 = 4/7. These three keys can be arranged in five different ways as search trees (see Fig. 4.36).