11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

342 Chap. 10 Indexingcould be sorted or organized using a tree structure, thereby imposing a logical orderon the records without physically rearranging them. One d<strong>at</strong>abase might haveseveral associ<strong>at</strong>ed index files, each supporting efficient access through a differentkey field.Each record of a d<strong>at</strong>abase normally has a unique identifier, called the primarykey. For example, the primary key for a set of personnel records might be theSocial Security number or ID number for the individual. Unfortun<strong>at</strong>ely, the IDnumber is generally an inconvenient value on which to perform a search becausethe searcher is unlikely to know it. Instead, the searcher might know the desiredemployee’s name. Altern<strong>at</strong>ively, the searcher might be interested in finding allemployees whose salary is in a certain range. If these are typical search requeststo the d<strong>at</strong>abase, then the name <strong>and</strong> salary fields deserve separ<strong>at</strong>e indices. However,key values in the name <strong>and</strong> salary indices are not likely to be unique.A key field such as salary, where a particular key value might be duplic<strong>at</strong>ed inmultiple records, is called a secondary key. Most searches are performed using asecondary key. The secondary key index (or more simply, secondary index) willassoci<strong>at</strong>e a secondary key value with the primary key of each record having th<strong>at</strong>secondary key value. At this point, the full d<strong>at</strong>abase might be searched directlyfor the record with th<strong>at</strong> primary key, or there might be a primary key index (orprimary index) th<strong>at</strong> rel<strong>at</strong>es each primary key value with a pointer to the actualrecord on disk. In the l<strong>at</strong>ter case, only the primary index provides the loc<strong>at</strong>ion ofthe actual record on disk, while the secondary indices refer to the primary index.Indexing is an important technique for organizing large d<strong>at</strong>abases, <strong>and</strong> manyindexing methods have been developed. Direct access through hashing is discussedin Section 9.4. A simple list sorted by key value can also serve as an index to therecord file. Indexing disk files by sorted lists are discussed in the following section.Unfortun<strong>at</strong>ely, a sorted list does not perform well for insert <strong>and</strong> delete oper<strong>at</strong>ions.A third approach to indexing is the tree index. Trees are typically used to organizelarge d<strong>at</strong>abases th<strong>at</strong> must support record insertion, deletion, <strong>and</strong> key rangesearches. Section 10.2 briefly describes ISAM, a tent<strong>at</strong>ive step toward solving theproblem of storing a large d<strong>at</strong>abase th<strong>at</strong> must support insertion <strong>and</strong> deletion ofrecords. Its shortcomings help to illustr<strong>at</strong>e the value of tree indexing techniques.Section 10.3 introduces the basic issues rel<strong>at</strong>ed to tree indexing. Section 10.4 introducesthe 2-3 tree, a balanced tree structure th<strong>at</strong> is a simple form of the B-treecovered in Section 10.5. B-trees are the most widely used indexing method forlarge disk-based d<strong>at</strong>abases, <strong>and</strong> for implementing file systems. Since they havesuch gre<strong>at</strong> practical importance, many vari<strong>at</strong>ions have been invented. Section 10.5begins with a discussion of the variant normally referred to simply as a “B-tree.”Section 10.5.1 presents the most widely implemented variant, the B + -tree.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!