CS137: Electronic Design Automation Today Sequential ... - Caltech

CS137: Electronic Design Automation Today Sequential ... - Caltech

CS137:Electronic Design AutomationDay 12: February 6, 2006SortingTodaySequential Sorting• Building on Parallel Prefix• Systolic–Sort– Priority Queue• Streaming Sort• Mesh Sort (Shear Sort)• Sorting Networks• Parallel Merge SortCALTECH CS137 Winter2006 -- DeHon1CALTECH CS137 Winter2006 -- DeHon2Sequential SortSequential Merge Sort• What’s your favorite sequential sort?• Runtime?• Observe: can merge two sorted list oflength N in O(N) time• Start with N lists of length 1• Merge to for N/2 lists of length 2• Merge to form N/4 lists of length 4• …how many times?• Each merge?CALTECH CS137 Winter2006 -- DeHon3CALTECH CS137 Winter2006 -- DeHon4Sequential Merge Sort• Observe: can merge two sorted list oflength N in O(N) time• Merge successively longer lists• log(N) merges• Each takes time O(N)• Sort in: O(N log(N))Parallel SortingprefixCALTECH CS137 Winter2006 -- DeHon5CALTECH CS137 Winter2006 -- DeHon61

Day 9CALTECH CS137 Winter2006 -- DeHonRank Finding• Looking for I’th ordered element• Do a prefix-sum on high-bit only– Know m=number of things > 01111111…• High-low search on result– I.e. if number > I, recurse on half withleading zero– If number < I, search for (I-m)’th element inhalf with high-bit true• Find I’th element in log 2 (N) time7CALTECH CS137 Winter2006 -- DeHonRank-based Sort• In O(log 2 (N)) time on N processors can findthe I’th element• Use separate groups of N processors to findthe 1 st , 2 nd , 3 rd , … element in parallel• Also count the number of such elements inO(log(N)) time using parallel prefix– Give each unique offset• Send each element to its correct position• O(log 2 (N)) sorting algorithm with O(N 2 )processors8Rank Sort Analysis• Area N 2• Time log 2 (N)• Work: (N log(N)) 2 square of sequential workSystolicOne Dimensional ArrayCALTECH CS137 Winter2006 -- DeHon9CALTECH CS137 Winter2006 -- DeHon10Sort as Data ArrivesLinear Systolic Sort• Often receive data as a sequential stream• Can I sort the data as it arrive?• Build a systolic solution?– Use only local interconnect• Often receive data as a sequential stream• Can I sort the data as it arrive?• Build a systolic solution?– Use only local interconnectCell traps largest valueCALTECH CS137 Winter2006 -- DeHon11CALTECH CS137 Winter2006 -- DeHon12[Basic approach from Leighton]2

Linear Systolic Sort AnalysisPriority Queue• Area N•Time N• Work: N 2• Insert top• Extract Largest• With O(N) cells• O(1) Extract• Allows interleave insert/deleteCALTECH CS137 Winter2006 -- DeHon13CALTECH CS137 Winter2006 -- DeHon14Priority Queue IdeaPriority Queue Cell• Trap Largest– Like Linear Sort• Largest always atfront– Always immediatelyavailable• On extract– Shift upLargestNewvalue• If (Cin=insert)AlocallargestBoutsmallest• If (Cin=extract)AlocalAinBoutBin• CoutCinCALTECH CS137 Winter2006 -- DeHonNext Largest15CALTECH CS137 Winter2006 -- DeHon16Streaming SortStreaming Merge Sort• Can we sort streaming data withO(log(N)) hardware?• How do you sort efficiently in SCORE?– Pipe-and-filter System Architecture?CALTECH CS137 Winter2006 -- DeHon17CALTECH CS137 Winter2006 -- DeHon183

Build Merge TreeStreaming Sort• Merge Sort streamObserve: early merges run at lowerfrequency than later…After log(N) merges, output stream is sorted.CALTECH CS137 Winter2006 -- DeHon19CALTECH CS137 Winter2006 -- DeHon20Streaming SortStreaming Sort Analysis•Buffer lengths grow by 2× each stage.•Total memory: 2×(N/2) + 2×(N/4) + 2×(N/8) +…≤2N• Area log(N) compare/switch– O(N) memory– [also true of sequential case]• Time O(N)• Work: O(N log(N))– Work efficientCALTECH CS137 Winter2006 -- DeHon21CALTECH CS137 Winter2006 -- DeHon22Mesh SortMesh Sort• Start with N items in √N× √N mesh• Sort into specified order• Nearest-neighbor communication onlyCALTECH CS137 Winter2006 -- DeHon23CALTECH CS137 Winter2006 -- DeHon244

Observation 1Shearsort• Can sort m things on linear array inO(m) time– Perform Parallel Bubble sort in m steps– i.e. alternate odd/even swap pairings• Algorithm: alternate sorting rows andcolumns for log(N)+1 steps– i.e. sort rows on odd steps; columns oneven steps– Sort odd rows ascending, even rowsdescending– Can use even/odd swapping forrow/column sorts•O(√N log(N))CALTECH CS137 Winter2006 -- DeHon25CALTECH CS137 Winter2006 -- DeHon26Simplifying LemmaShearsort Works?• 0-1 Sorting Lemma: If an obliviouscomparison-exchange algorithm sortsall input sets consisting of solely 0’s and1’s, then it sorts all input sets witharbitrary values– proof in Leighton• Odd/even swapping is an obliviouscomparison-exchangeCALTECH CS137 Winter2006 -- DeHon27• General form after column sort:–0 rows– Mixed (dirty) rows–1 rows• Consider all row pairs:–3 cases• More zeros, more ones, equal number– Row sort puts all zeros on one side, ones on other– Column sort one of the pair ends up allones/zeros– Therefore, each row/column sort cuts the numberof “dirty” rows in halfCALTECH CS137 Winter2006 -- DeHon28Shearsort Works?• Consider all row pairs:–3 cases• More zeros, more ones, equal number– Row sort puts all zeros on one side, ones on other– Column sort one of the pair ends up allones/zerso– Therefore, each row/column sort cuts the numberof “dirty” rows in halfRounding up Steps• Each sort m=√N steps• log(√N ) row/column sorts to removedirty rows• 2 log(√N ) =log(N)• Total steps: √N log(N)10001000 row 00000011 column 0000000010101001 1111000011110011Dirty Rows after column sort29CALTECH CS137 Winter2006 -- DeHonCALTECH CS137 Winter2006 -- DeHon305

A0A1A2A3B3B2B1B0CALTECH CS137 Winter2006 -- DeHonSystematic Construction:Step 1: Merge Network• Recursively swap large/small elementsfrom halves of network– Merge in log(N) steps37CALTECH CS137 Winter2006 -- DeHonSystematic Construction:Sorting Network• Perform recursive merging– log(N) merge networks• Of depth log(N), log(N)-1…– Depth: O(log 2 (N))– Area: O(N log 2 (N))• Can be used in pipelined fashion– Only using O(N) hardware exclusively perstep38Parallel Merge SortParallel Merge Sort• With O(N) processors• Sort in O(log 2 (N)) steps• Sequentially executing the O(log 2 (N))pairwise swaps of the sorting network• Randomized algorithm– Works in O(log(N)) steps• With high probability• …see LeightonCALTECH CS137 Winter2006 -- DeHon39CALTECH CS137 Winter2006 -- DeHon40Admin• Wednesday, Friday: NC• Project: two things due in two weeks– Sequential baseline– Proposed plan of attackCALTECH CS137 Winter2006 -- DeHon417

More magazines by this user
Similar magazines