11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

348 Chap. 10 Indexingamong the cylinders, sorting the records within each cylinder, <strong>and</strong> upd<strong>at</strong>ing boththe system index table <strong>and</strong> the within-cylinder block table. Such reorganiz<strong>at</strong>ionwas typical of d<strong>at</strong>abase systems during the 1960s <strong>and</strong> would normally be doneeach night or weekly.10.3 Tree-based IndexingLinear indexing is efficient when the d<strong>at</strong>abase is st<strong>at</strong>ic, th<strong>at</strong> is, when records areinserted <strong>and</strong> deleted rarely or never. ISAM is adequ<strong>at</strong>e for a limited number ofupd<strong>at</strong>es, but not for frequent changes. Because it has essentially two levels ofindexing, ISAM will also break down for a truly large d<strong>at</strong>abase where the numberof cylinders is too gre<strong>at</strong> for the top-level index to fit in main memory.In their most general form, d<strong>at</strong>abase applic<strong>at</strong>ions have the following characteristics:1. Large sets of records th<strong>at</strong> are frequently upd<strong>at</strong>ed.2. Search is by one or a combin<strong>at</strong>ion of several keys.3. Key range queries or min/max queries are used.For such d<strong>at</strong>abases, a better organiz<strong>at</strong>ion must be found. One approach wouldbe to use the binary search tree (BST) to store primary <strong>and</strong> secondary key indices.BSTs can store duplic<strong>at</strong>e key values, they provide efficient insertion <strong>and</strong> deletion aswell as efficient search, <strong>and</strong> they can perform efficient range queries. When thereis enough main memory, the BST is a viable option for implementing both primary<strong>and</strong> secondary key indices.Unfortun<strong>at</strong>ely, the BST can become unbalanced. Even under rel<strong>at</strong>ively goodconditions, the depth of leaf nodes can easily vary by a factor of two. This mightnot be a significant concern when the tree is stored in main memory because thetime required is still Θ(log n) for search <strong>and</strong> upd<strong>at</strong>e. When the tree is stored ondisk, however, the depth of nodes in the tree becomes crucial. Every time a BSTnode B is visited, it is necessary to visit all nodes along the p<strong>at</strong>h from the root to B.Each node on this p<strong>at</strong>h must be retrieved from disk. Each disk access returns ablock of inform<strong>at</strong>ion. If a node is on the same block as its parent, then the cost tofind th<strong>at</strong> node is trivial once its parent is in main memory. Thus, it is desirable tokeep subtrees together on the same block. Unfortun<strong>at</strong>ely, many times a node is noton the same block as its parent. Thus, each access to a BST node could potentiallyrequire th<strong>at</strong> another block to be read from disk. Using a buffer pool to store multipleblocks in memory can mitig<strong>at</strong>e disk access problems if BST accesses display goodlocality of reference. But a buffer pool cannot elimin<strong>at</strong>e disk I/O entirely. Theproblem becomes gre<strong>at</strong>er if the BST is unbalanced, because nodes deep in the treehave the potential of causing many disk blocks to be read. Thus, there are twosignificant issues th<strong>at</strong> must be addressed to have efficient search from a disk-based

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!