17.01.2013 Views

Data Structures and Algorithm Analysis in C - SVS

Data Structures and Algorithm Analysis in C - SVS

Data Structures and Algorithm Analysis in C - SVS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Structures</strong>, <strong>Algorithm</strong> <strong>Analysis</strong>: CHAPTER 7: SORTING<br />

directly addressable. Shellsort compares elements a[ i ] <strong>and</strong> a[i - hk ] <strong>in</strong> one time<br />

unit. Heapsort compares elements a[i] <strong>and</strong> a[i * 2] <strong>in</strong> one time unit. Quicksort,<br />

with median-of-three partition<strong>in</strong>g, requires compar<strong>in</strong>g a[left], a[center], <strong>and</strong> a<br />

[right] <strong>in</strong> a constant number of time units. If the <strong>in</strong>put is on a tape, then all<br />

these operations lose their efficiency, s<strong>in</strong>ce elements on a tape can only be<br />

accessed sequentially. Even if the data is on a disk, there is still a practical<br />

loss of efficiency because of the delay required to sp<strong>in</strong> the disk <strong>and</strong> move the<br />

disk head.<br />

To see how slow external accesses really are, create a r<strong>and</strong>om file that is large,<br />

but not too big to fit <strong>in</strong> ma<strong>in</strong> memory. Read the file <strong>in</strong> <strong>and</strong> sort it us<strong>in</strong>g an<br />

efficient algorithm. The time it takes to sort the <strong>in</strong>put is certa<strong>in</strong> to be<br />

<strong>in</strong>significant compared to the time to read the <strong>in</strong>put, even though sort<strong>in</strong>g is an O<br />

(n log n) operation <strong>and</strong> read<strong>in</strong>g the <strong>in</strong>put is only O(n).<br />

7.11.2. Model for External Sort<strong>in</strong>g<br />

The wide variety of mass storage devices makes external sort<strong>in</strong>g much more devicedependent<br />

than <strong>in</strong>ternal sort<strong>in</strong>g. The algorithms that we will consider work on<br />

tapes, which are probably the most restrictive storage medium. S<strong>in</strong>ce access to an<br />

element on tape is done by w<strong>in</strong>d<strong>in</strong>g the tape to the correct location, tapes can be<br />

efficiently accessed only <strong>in</strong> sequential order (<strong>in</strong> either direction).<br />

We will assume that we have at least three tape drives to perform the sort<strong>in</strong>g. We<br />

need two drives to do an efficient sort; the third drive simplifies matters. If<br />

only one tape drive is present, then we are <strong>in</strong> trouble: any algorithm will<br />

require (n 2 ) tape accesses.<br />

7.11.3. The Simple <strong>Algorithm</strong><br />

页码,37/49<br />

The basic external sort<strong>in</strong>g algorithm uses the merge rout<strong>in</strong>e from mergesort.<br />

Suppose we have four tapes, Ta1 , Ta2 , Tb1 , Tb2 , which are two <strong>in</strong>put <strong>and</strong> two output<br />

tapes. Depend<strong>in</strong>g on the po<strong>in</strong>t <strong>in</strong> the algorithm, the a <strong>and</strong> b tapes are either<br />

<strong>in</strong>put tapes or output tapes. Suppose the data is <strong>in</strong>itially on Ta1 . Suppose<br />

further that the <strong>in</strong>ternal memory can hold (<strong>and</strong> sort) m records at a time. A<br />

natural first step is to read m records at a time from the <strong>in</strong>put tape, sort the<br />

records <strong>in</strong>ternally, <strong>and</strong> then write the sorted records alternately to T b1 <strong>and</strong> T b2 .<br />

We will call each set of sorted records a run. When this is done, we rew<strong>in</strong>d all<br />

the tapes. Suppose we have the same <strong>in</strong>put as our example for Shellsort.<br />

If m = 3, then after the runs are constructed, the tapes will conta<strong>in</strong> the data<br />

<strong>in</strong>dicated <strong>in</strong> the follow<strong>in</strong>g figure.<br />

mk:@MSITStore:K:\<strong>Data</strong>.<strong>Structures</strong>.<strong>and</strong>.<strong>Algorithm</strong>.<strong>Analysis</strong>.<strong>in</strong>.C.chm::/...<br />

2006-1-27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!