12.07.2015 Views

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

298 Chap. 8 File Processing <strong>and</strong> External Sorting• R<strong>and</strong>omAccessFile(String name, String mode): Class construc<strong>to</strong>r,opens a disk file for processing.• read(byte[] b): Read some bytes from the current position in the file.The current position moves forward as the bytes are read.• write(byte[] b): Write some bytes at the current position in the file(overwriting the bytes already at that position). The current position movesforward as the bytes are written.• seek(long pos): Move the current position in the file <strong>to</strong> pos. Thisallows bytes at arbitrary places within the file <strong>to</strong> be read or written.• close(): Close a file at the end of processing.8.5 External SortingWe now consider the problem of sorting collections of records <strong>to</strong>o large <strong>to</strong> fit inmain memory. Because the records must reside in peripheral or external memory,such sorting methods are called external sorts. This is in contrast <strong>to</strong> the internalsorts discussed in Chapter 7 which assume that the records <strong>to</strong> be sorted are s<strong>to</strong>red inmain memory. Sorting large collections of records is central <strong>to</strong> many applications,such as processing payrolls <strong>and</strong> other large business databases. As a consequence,many external sorting algorithms have been devised. Years ago, sorting algorithmdesigners sought <strong>to</strong> optimize the use of specific hardware configurations, such asmultiple tape or disk drives. Most computing <strong>to</strong>day is done on personal computers<strong>and</strong> low-end workstations with relatively powerful CPUs, but only one or at mosttwo disk drives. The techniques presented here are geared <strong>to</strong>ward optimized processingon a single disk drive. This approach allows us <strong>to</strong> cover the most importantissues in external sorting while skipping many less important machine-dependentdetails. Readers who have a need <strong>to</strong> implement efficient external sorting algorithmsthat take advantage of more sophisticated hardware configurations should consultthe references in Section 8.6.When a collection of records is <strong>to</strong>o large <strong>to</strong> fit in main memory, the only practicalway <strong>to</strong> sort it is <strong>to</strong> read some records from disk, do some rearranging, thenwrite them back <strong>to</strong> disk. This process is repeated until the file is sorted, with eachrecord read perhaps many times. Given the high cost of disk I/O, it should come asno surprise that the primary goal of an external sorting algorithm is <strong>to</strong> minimize theamount of information that must be read from or written <strong>to</strong> disk. A certain amoun<strong>to</strong>f additional CPU processing can profitably be traded for reduced disk access.Before discussing external sorting techniques, consider again the basic modelfor accessing information from disk. The file <strong>to</strong> be sorted is viewed by the programmeras a sequential series of fixed-size blocks. Assume (for simplicity) that each

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!