12.07.2015 Views

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10IndexingMany large-scale computing applications are centered around data sets that are <strong>to</strong>olarge <strong>to</strong> fit in<strong>to</strong> main memory. The classic example is a large database of recordswith multiple search keys, requiring the ability <strong>to</strong> insert, delete, <strong>and</strong> search forrecords. Hashing provides outst<strong>and</strong>ing performance for such situations, but onlyin the limited case in which all searches are of the form “find the record with keyvalue K.” Unfortunately, many applications require more general search capabilities.One example is a range query search for all records whose key lies within somerange. Other queries might involve visiting all records in order of their key value,or finding the record with the greatest key value. Hash tables are not organized <strong>to</strong>support any of these queries efficiently.This chapter introduces file structures used <strong>to</strong> organize a large collection ofrecords s<strong>to</strong>red on disk. Such file structures support efficient insertion, deletion, <strong>and</strong>search operations, for exact-match queries, range queries, <strong>and</strong> largest/smallest keyvalue searches.Before discussing such file structures, we must become familiar with some basicfile-processing terminology. An entry-sequenced file s<strong>to</strong>res records in the orderthat they were added <strong>to</strong> the file. Entry-sequenced files are the disk-based equivalent<strong>to</strong> an unsorted list <strong>and</strong> so do not support efficient search. The natural solution is <strong>to</strong>sort the records by order of the search key. However, a typical database, such as acollection of employee or cus<strong>to</strong>mer records maintained by a business, might containmultiple search keys. To answer a question about a particular cus<strong>to</strong>mer mightrequire a search on the name of the cus<strong>to</strong>mer. Businesses often wish <strong>to</strong> sort <strong>and</strong>output the records by zip code order for a bulk mailing. Government paperworkmight require the ability <strong>to</strong> search by Social Security number. Thus, there mightnot be a single “correct” order in which <strong>to</strong> s<strong>to</strong>re the records.Indexing is the process of associating a key with the location of a correspondingdata record. Section 8.5 discussed the concept of a key sort, in which an index357

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!