11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8File Processing <strong>and</strong> ExternalSortingEarlier chapters presented basic d<strong>at</strong>a structures <strong>and</strong> algorithms th<strong>at</strong> oper<strong>at</strong>e on d<strong>at</strong>astored in main memory. Some applic<strong>at</strong>ions require th<strong>at</strong> large amounts of inform<strong>at</strong>ionbe stored <strong>and</strong> processed — so much inform<strong>at</strong>ion th<strong>at</strong> it cannot all fit into mainmemory. In th<strong>at</strong> case, the inform<strong>at</strong>ion must reside on disk <strong>and</strong> be brought into mainmemory selectively for processing.You probably already realize th<strong>at</strong> main memory access is much faster than accessto d<strong>at</strong>a stored on disk or other storage devices. The rel<strong>at</strong>ive difference in accesstimes is so gre<strong>at</strong> th<strong>at</strong> efficient disk-based programs require a different approach toalgorithm design than most programmers are used to. As a result, many programmersdo a poor job when it comes to file processing applic<strong>at</strong>ions.This chapter presents the fundamental issues rel<strong>at</strong>ing to the design of algorithms<strong>and</strong> d<strong>at</strong>a structures for disk-based applic<strong>at</strong>ions. 1 We begin with a descriptionof the significant differences between primary memory <strong>and</strong> secondary storage.Section 8.2 discusses the physical aspects of disk drives. Section 8.3 presents basicmethods for managing buffer pools. Section 8.4 discusses the Java model forr<strong>and</strong>om access to d<strong>at</strong>a stored on disk. Section 8.5 discusses the basic principles forsorting collections of records too large to fit in main memory.8.1 Primary versus Secondary Storage<strong>Computer</strong> storage devices are typically classified into primary or main memory<strong>and</strong> secondary or peripheral storage. Primary memory usually refers to R<strong>and</strong>om1 <strong>Computer</strong> technology changes rapidly. I provide examples of disk drive specific<strong>at</strong>ions <strong>and</strong> otherhardware performance numbers th<strong>at</strong> are reasonably up to d<strong>at</strong>e as of the time when the book waswritten. When you read it, the numbers might seem out of d<strong>at</strong>e. However, the basic principles do notchange. The approxim<strong>at</strong>e r<strong>at</strong>ios for time, space, <strong>and</strong> cost between memory <strong>and</strong> disk have remainedsurprisingly steady for over 20 years.265

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!