11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

344 Chap. 10 Indexing1 2003 5894 10528Second Level Index1 2001 2003 5688 5894 9942 10528 10984Linear Index: Disk BlocksFigure 10.2 A simple two-level linear index. The linear index is stored on disk.The smaller, second-level index is stored in main memory. Each element in thesecond-level index stores the first key value in the corresponding disk block of theindex file. In this example, the first disk block of the linear index stores keys inthe range 1 to 2001, <strong>and</strong> the second disk block stores keys in the range 2003 to5688. Thus, the first entry of the second-level index is key value 1 (the first keyin the first block of the linear index), while the second entry of the second-levelindex is key value 2003.second-level index is stored in main memory, accessing a record by this methodrequires two disk reads: one from the index file <strong>and</strong> one from the d<strong>at</strong>abase file forthe actual record.Every time a record is inserted to or deleted from the d<strong>at</strong>abase, all associ<strong>at</strong>edsecondary indices must be upd<strong>at</strong>ed. Upd<strong>at</strong>es to a linear index are expensive, becausethe entire contents of the array might be shifted. Another problem is th<strong>at</strong>multiple records with the same secondary key each duplic<strong>at</strong>e th<strong>at</strong> key value withinthe index. When the secondary key field has many duplic<strong>at</strong>es, such as when it hasa limited range (e.g., a field to indic<strong>at</strong>e job c<strong>at</strong>egory from among a small number ofpossible job c<strong>at</strong>egories), this duplic<strong>at</strong>ion might waste considerable space.One improvement on the simple sorted array is a two-dimensional array whereeach row corresponds to a secondary key value. A row contains the primary keyswhose records have the indic<strong>at</strong>ed secondary key value. Figure 10.3 illustr<strong>at</strong>es thisapproach. Now there is no duplic<strong>at</strong>ion of secondary key values, possibly yielding aconsiderable space savings. The cost of insertion <strong>and</strong> deletion is reduced, becauseonly one row of the table need be adjusted. Note th<strong>at</strong> a new row is added to the arraywhen a new secondary key value is added. This might lead to moving many records,but this will happen infrequently in applic<strong>at</strong>ions suited to using this arrangement.A drawback to this approach is th<strong>at</strong> the array must be of fixed size, whichimposes an upper limit on the number of primary keys th<strong>at</strong> might be associ<strong>at</strong>edwith a particular secondary key. Furthermore, those secondary keys with fewerrecords than the width of the array will waste the remainder of their row. A betterapproach is to have a one-dimensional array of secondary key values, where eachsecondary key is associ<strong>at</strong>ed with a linked list. This works well if the index is storedin main memory, but not so well when it is stored on disk because the linked listfor a given key might be sc<strong>at</strong>tered across several disk blocks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!