11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

300 Chap. 8 File Processing <strong>and</strong> External Sorting4096 bytes in size, with the first 4 bytes used to store the block ID correspondingto th<strong>at</strong> buffer. Use the first BufferPool abstract class given inSection 8.3 as the basis for your implement<strong>at</strong>ion.8.4 Implement an external sort based on replacement selection <strong>and</strong> multiwaymerging as described in this chapter. Test your program both on files withsmall records <strong>and</strong> on files with large records. For wh<strong>at</strong> size record do youfind th<strong>at</strong> key sorting would be worthwhile?8.5 Implement a Quicksort for large files on disk by replacing all array access inthe normal Quicksort applic<strong>at</strong>ion with access to a virtual array implementedusing a buffer pool. Th<strong>at</strong> is, whenever a record in the array would be read orwritten by Quicksort, use a call to a buffer pool function instead. Comparethe running time of this implement<strong>at</strong>ion with implement<strong>at</strong>ions for externalsorting based on mergesort as described in this chapter.8.6 Section 8.5.1 suggests th<strong>at</strong> an easy modific<strong>at</strong>ion to the basic 2-way mergesortis to read in a large chunk of d<strong>at</strong>a into main memory, sort it with Quicksort,<strong>and</strong> write it out for initial runs. Then, a st<strong>and</strong>ard 2-way merge is used ina series of passes to merge the runs together. However, this makes use ofonly two blocks of working memory <strong>at</strong> a time. Each block read is essentiallyr<strong>and</strong>om access, because the various files are read in an unknown order, eventhough each of the input <strong>and</strong> output files is processed sequentially on eachpass. A possible improvement would be, on the merge passes, to divideworking memory into four equal sections. One section is alloc<strong>at</strong>ed to eachof the two input files <strong>and</strong> two output files. All reads during merge passeswould be in full sections, r<strong>at</strong>her than single blocks. While the total numberof blocks read <strong>and</strong> written would be the same as a regular 2-way Mergesort, itis possible th<strong>at</strong> this would speed processing because a series of blocks th<strong>at</strong> arelogically adjacent in the various input <strong>and</strong> output files would be read/writteneach time. Implement this vari<strong>at</strong>ion, <strong>and</strong> compare its running time againsta st<strong>and</strong>ard series of 2-way merge passes th<strong>at</strong> read/write only a single block<strong>at</strong> a time. Before beginning implement<strong>at</strong>ion, write down your hypothesis onhow the running time will be affected by this change. After implementing,did you find th<strong>at</strong> this change has any meaningful effect on performance?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!