Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...


You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

300 Chap. 8 File Processing <strong>and</strong> External Sorting4096 bytes in size, with the first 4 bytes used to store the block ID correspondingto th<strong>at</strong> buffer. Use the first BufferPool abstract class given inSection 8.3 as the basis for your implement<strong>at</strong>ion.8.4 Implement an external sort based on replacement selection <strong>and</strong> multiwaymerging as described in this chapter. Test your program both on files withsmall records <strong>and</strong> on files with large records. For wh<strong>at</strong> size record do youfind th<strong>at</strong> key sorting would be worthwhile?8.5 Implement a Quicksort for large files on disk by replacing all array access inthe normal Quicksort applic<strong>at</strong>ion with access to a virtual array implementedusing a buffer pool. Th<strong>at</strong> is, whenever a record in the array would be read orwritten by Quicksort, use a call to a buffer pool function instead. Comparethe running time of this implement<strong>at</strong>ion with implement<strong>at</strong>ions for externalsorting based on mergesort as described in this chapter.8.6 Section 8.5.1 suggests th<strong>at</strong> an easy modific<strong>at</strong>ion to the basic 2-way mergesortis to read in a large chunk of d<strong>at</strong>a into main memory, sort it with Quicksort,<strong>and</strong> write it out for initial runs. Then, a st<strong>and</strong>ard 2-way merge is used ina series of passes to merge the runs together. However, this makes use ofonly two blocks of working memory <strong>at</strong> a time. Each block read is essentiallyr<strong>and</strong>om access, because the various files are read in an unknown order, eventhough each of the input <strong>and</strong> output files is processed sequentially on eachpass. A possible improvement would be, on the merge passes, to divideworking memory into four equal sections. One section is alloc<strong>at</strong>ed to eachof the two input files <strong>and</strong> two output files. All reads during merge passeswould be in full sections, r<strong>at</strong>her than single blocks. While the total numberof blocks read <strong>and</strong> written would be the same as a regular 2-way Mergesort, itis possible th<strong>at</strong> this would speed processing because a series of blocks th<strong>at</strong> arelogically adjacent in the various input <strong>and</strong> output files would be read/writteneach time. Implement this vari<strong>at</strong>ion, <strong>and</strong> compare its running time againsta st<strong>and</strong>ard series of 2-way merge passes th<strong>at</strong> read/write only a single block<strong>at</strong> a time. Before beginning implement<strong>at</strong>ion, write down your hypothesis onhow the running time will be affected by this change. After implementing,did you find th<strong>at</strong> this change has any meaningful effect on performance?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!