11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

364 Chap. 10 Indexingneighboring sibling, if possible. If the sibling is also full, then these two nodes splitinto three. Similarly, when a node underflows, it is combined with its two siblings,<strong>and</strong> the total reduced to two nodes. Thus, the nodes are always <strong>at</strong> least two thirdsfull. 210.5.2 B-Tree <strong>Analysis</strong>The asymptotic cost of search, insertion, <strong>and</strong> deletion of records from B-trees,B + -trees, <strong>and</strong> B ∗ -trees is Θ(log n) where n is the total number of records in thetree. However, the base of the log is the (average) branching factor of the tree.Typical d<strong>at</strong>abase applic<strong>at</strong>ions use extremely high branching factors, perhaps 100 ormore. Thus, in practice the B-tree <strong>and</strong> its variants are extremely shallow.As an illustr<strong>at</strong>ion, consider a B + -tree of order 100 <strong>and</strong> leaf nodes th<strong>at</strong> containup to 100 records. A B + -tree with height one (th<strong>at</strong> is, just a single leaf node) canhave <strong>at</strong> most 100 records. A B + -tree with height two (a root internal node whosechildren are leaves) must have <strong>at</strong> least 100 records (2 leaves with 50 records each).It has <strong>at</strong> most 10,000 records (100 leaves with 100 records each). A B + -tree withheight three must have <strong>at</strong> least 5000 records (two second-level nodes with 50 childrencontaining 50 records each) <strong>and</strong> <strong>at</strong> most one million records (100 second-levelnodes with 100 full children each). A B + -tree with height four must have <strong>at</strong> least250,000 records <strong>and</strong> <strong>at</strong> most 100 million records. Thus, it would require an extremelylarge d<strong>at</strong>abase to gener<strong>at</strong>e a B + -tree of more than height four.The B + -tree split <strong>and</strong> insert rules guarantee th<strong>at</strong> every node (except perhaps theroot) is <strong>at</strong> least half full. So they are on average about 3/4 full. But the internalnodes are purely overhead, since the keys stored there are used only by the tree todirect search, r<strong>at</strong>her than store actual d<strong>at</strong>a. Does this overhead amount to a significantuse of space? No, because once again the high fan-out r<strong>at</strong>e of the tree structuremeans th<strong>at</strong> the vast majority of nodes are leaf nodes. Recall (from Section 6.4) th<strong>at</strong>a full K-ary tree has approxim<strong>at</strong>ely 1/K of its nodes as internal nodes. This meansth<strong>at</strong> while half of a full binary tree’s nodes are internal nodes, in a B + -tree of order100 probably only about 1/75 of its nodes are internal nodes. This means th<strong>at</strong> theoverhead associ<strong>at</strong>ed with internal nodes is very low.We can reduce the number of disk fetches required for the B-tree even moreby using the following methods. First, the upper levels of the tree can be stored inmain memory <strong>at</strong> all times. Because the tree branches so quickly, the top two levels(levels 0 <strong>and</strong> 1) require rel<strong>at</strong>ively little space. If the B-tree is only height four, then2 This concept can be extended further if higher space utiliz<strong>at</strong>ion is required. However, the upd<strong>at</strong>eroutines become much more complic<strong>at</strong>ed. I once worked on a project where we implemented 3-for-4node split <strong>and</strong> merge routines. This gave better performance than the 2-for-3 node split <strong>and</strong> mergeroutines of the B ∗ -tree. However, the spitting <strong>and</strong> merging routines were so complic<strong>at</strong>ed th<strong>at</strong> eventheir author could no longer underst<strong>and</strong> them once they were completed!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!