11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

284 Chap. 8 File Processing <strong>and</strong> External Sortingrelaxed for special-purpose sorting applic<strong>at</strong>ions, but ignoring such complic<strong>at</strong>ionsmakes the principles clearer.As explained in Section 8.2, a sector is the basic unit of I/O. In other words,all disk reads <strong>and</strong> writes are for one or more complete sectors. Sector sizes aretypically a power of two, in the range 512 to 16K bytes, depending on the oper<strong>at</strong>ingsystem <strong>and</strong> the size <strong>and</strong> speed of the disk drive. The block size used for externalsorting algorithms should be equal to or a multiple of the sector size.Under this model, a sorting algorithm reads a block of d<strong>at</strong>a into a buffer in mainmemory, performs some processing on it, <strong>and</strong> <strong>at</strong> some future time writes it back todisk. From Section 8.1 we see th<strong>at</strong> reading or writing a block from disk takes onthe order of one million times longer than a memory access. Based on this fact, wecan reasonably expect th<strong>at</strong> the records contained in a single block can be sorted byan internal sorting algorithm such as Quicksort in less time than is required to reador write the block.Under good conditions, reading from a file in sequential order is more efficientthan reading blocks in r<strong>and</strong>om order. Given the significant impact of seek time ondisk access, it might seem obvious th<strong>at</strong> sequential processing is faster. However,it is important to underst<strong>and</strong> precisely under wh<strong>at</strong> circumstances sequential fileprocessing is actually faster than r<strong>and</strong>om access, because it affects our approach todesigning an external sorting algorithm.Efficient sequential access relies on seek time being kept to a minimum. Thefirst requirement is th<strong>at</strong> the blocks making up a file are in fact stored on disk insequential order <strong>and</strong> close together, preferably filling a small number of contiguoustracks. At the very least, the number of extents making up the file should be small.Users typically do not have much control over the layout of their file on disk, butwriting a file all <strong>at</strong> once in sequential order to a disk drive with a high percentageof free space increases the likelihood of such an arrangement.The second requirement is th<strong>at</strong> the disk drive’s I/O head remain positionedover the file throughout sequential processing. This will not happen if there iscompetition of any kind for the I/O head. For example, on a multi-user time-sharedcomputer the sorting process might compete for the I/O head with the processesof other users. Even when the sorting process has sole control of the I/O head, itis still likely th<strong>at</strong> sequential processing will not be efficient. Imagine the situ<strong>at</strong>ionwhere all processing is done on a single disk drive, with the typical arrangementof a single bank of read/write heads th<strong>at</strong> move together over a stack of pl<strong>at</strong>ters. Ifthe sorting process involves reading from an input file, altern<strong>at</strong>ed with writing to anoutput file, then the I/O head will continuously seek between the input file <strong>and</strong> theoutput file. Similarly, if two input files are being processed simultaneously (suchas during a merge process), then the I/O head will continuously seek between thesetwo files.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!