25.11.2014 Views

Algorithms and Data Structures

Algorithms and Data Structures

Algorithms and Data Structures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 177<br />

a 2 = 2 key 2 = Ernst b 2 = 1<br />

a 3 = 4 key 3 = Peter b 3 = 1<br />

The results of procedure Find are shown in Fig. 4.39 <strong>and</strong> demonstrate that the structures obtained for the<br />

three cases may differ significantly. The total weight is 40, the path length of the balanced tree is 78, <strong>and</strong><br />

that of the optimal tree is 66.<br />

balanced tree<br />

optimal tree<br />

not considering key misses<br />

Ernst<br />

Albert<br />

Peter<br />

Albert<br />

Ernst<br />

Peter<br />

Peter<br />

Ernst<br />

Albert<br />

Fig. 4.39. The 3 trees generated by the Optimal Tree procedure<br />

It is evident from this algorithm that the effort to determine the optimal structure is of the order of n 2 ;<br />

also, the amount of required storage is of the order n 2 . This is unacceptable if n is very large. <strong>Algorithms</strong><br />

with greater efficiency are therefore highly desirable. One of them is the algorithm developed by Hu <strong>and</strong><br />

Tucker [4-5] which requires only O(n) storage <strong>and</strong> O(n*log(n)) computations. However, it considers only<br />

the case in which the key frequencies are zero, i.e., where only the unsuccessful search trials are registered.<br />

Another algorithm, also requiring O(n) storage elements <strong>and</strong> O(n*log(n)) computations was described by<br />

Walker <strong>and</strong> Gotlieb [4-7]. Instead of trying to find the optimum, this algorithm merely promises to yield a<br />

nearly optimal tree. It can therefore be based on heuristic principles. The basic idea is the following.<br />

Consider the nodes (genuine <strong>and</strong> special nodes) being distributed on a linear scale, weighted by their<br />

frequencies (or probabilities) of access. Then find the node which is closest to the center of gravity. This<br />

node is called the centroid, <strong>and</strong> its index is<br />

(Si: 1 ≤ i ≤ n : i*a i ) + (Si: 1 ≤ i ≤ m : i*b i ) / W<br />

rounded to the nearest integer. If all nodes have equal weight, then the root of the desired optimal tree<br />

evidently coincides with the centroid Otherwise - so the reasoning goes - it will in most cases be in the<br />

close neighborhood of the centroid. A limited search is then used to find the local optimum, whereafter this<br />

procedure is applied to the resulting two subtrees. The likelihood of the root lying very close to the centroid<br />

grows with the size n of the tree. As soon as the subtrees have reached a manageable size, their optimum<br />

can be determined by the above exact algorithm.<br />

4.7 B-Trees<br />

So far, we have restricted our discussion to trees in which every node has at most two descendants, i.e.,<br />

to binary trees. This is entirely satisfactory if, for instance, we wish to represent family relationships with a<br />

preference to the pedigree view, in which every person is associated with his parents. After all, no one has<br />

more than two parents. But what about someone who prefers the posterity view? He has to cope with the<br />

fact that some people have more than two children, <strong>and</strong> his trees will contain nodes with many branches.<br />

For lack of a better term, we shall call them multiway trees.<br />

Of course, there is nothing special about such structures, <strong>and</strong> we have already encountered all the<br />

programming <strong>and</strong> data definition facilities to cope with such situations. If, for instance, an absolute upper<br />

limit on the number of children is given (which is admittedly a somewhat futuristic assumption), then one<br />

may represent the children as an array component of the record representing a person. If the number of<br />

children varies strongly among different persons, however, this may result in a poor utilization of available<br />

storage. In this case it will be much more appropriate to arrange the offspring as a linear list, with a pointer

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!