22.03.2013 Views

Parallel Implementation of a Sandpile Model - Oak Ridge National ...

Parallel Implementation of a Sandpile Model - Oak Ridge National ...

Parallel Implementation of a Sandpile Model - Oak Ridge National ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Parallel</strong> <strong>Implementation</strong> <strong>of</strong> a<br />

<strong>Sandpile</strong> <strong>Model</strong><br />

Nathaniel D. Sizemore<br />

Westminster College<br />

Vickie E. Lynch<br />

<strong>Oak</strong> <strong>Ridge</strong> <strong>National</strong> Laboratory<br />

Fusion Energy Division


Goals for the Project<br />

<strong>Sandpile</strong> models are useful for modeling a<br />

variety <strong>of</strong> processes ranging from power<br />

transmission systems to the transport <strong>of</strong><br />

particles in plasmas used in fusion reactions.<br />

However, on PCs and workstations, the<br />

simulations <strong>of</strong>ten take too long to run. The<br />

goal for the project was to reduce the model’s<br />

run time.<br />

Fusion Energy Division


What is a <strong>Sandpile</strong>?<br />

• <strong>Sandpile</strong> models are a subset <strong>of</strong> cellular<br />

automata (such as Conway’s Life<br />

simulations)<br />

• The simulation runs by applying a set <strong>of</strong><br />

rules to each cell over a given number <strong>of</strong><br />

time steps.<br />

Fusion Energy Division


How a <strong>Sandpile</strong> Works<br />

• Particles are “dropped” into the simulation.<br />

• When a cell grows high enough to be<br />

unstable, it falls over, giving part <strong>of</strong> its mass<br />

to neighboring cells. This is called a flip.<br />

• Flips can cause other flips -- this process is<br />

called an avalanche.<br />

Fusion Energy Division


• Z n = h n - h n±1<br />

• If Z n Z crit then<br />

➛h n = h n - N f<br />

“Rules <strong>of</strong> the Game”<br />

➛h n ±1 = H n ±1 + N f<br />

Fusion Energy Division


How does it work?<br />

Fusion Energy Division


<strong>Parallel</strong>ization<br />

• Ways to speed up a job<br />

– Work harder (faster machines)<br />

– Work smarter(optimizing)<br />

– Find more workers<br />

• <strong>Parallel</strong>ization helps speed the program up<br />

by using many processors to work on the<br />

problem<br />

Fusion Energy Division


Optimizing the Serial Code<br />

• Reduced costly file I/O by combining<br />

several output files into one<br />

• Loops combined where possible<br />

• Instead <strong>of</strong> storing the sandpile in several<br />

arrays, a single array <strong>of</strong> a C++ cell class<br />

was used<br />

• Cell objects contained pointers to their<br />

neighbors, similar to a doubly-linked list<br />

Fusion Energy Division


Going from Serial to <strong>Parallel</strong><br />

• Each processor was given L/p <strong>of</strong> the<br />

sandpile<br />

• MPI was used, so global functions (such as<br />

determining the mass <strong>of</strong> the sandpile) were<br />

already optimized<br />

Fusion Energy Division


mcurie -- the Cray T3E<br />

Fusion Energy Division


mcurie -- the Cray T3E<br />

• 696 DEC Alpha 450 MHz processing<br />

elements (nodes)<br />

• 900 Mflops/node peak performance<br />

• Distributed memory: 256 Mb/node<br />

Fusion Energy Division


Comparing Serial Algorithms<br />

execution time (min)<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

BAC vs. NDS Serial Algorithms<br />

0<br />

- 500 1,000 1,500 2,000 2,500 3,000 3,500<br />

Number <strong>of</strong> cells<br />

BAC serial code<br />

NDS serial code<br />

time steps = 1,000,000<br />

Fusion Energy Division


Run Time Increases Linearly with Number <strong>of</strong> Time Steps<br />

execution time (min)<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

Time Steps vs. Execution Time<br />

0<br />

0.00E+00 2.00E+05 4.00E+05 6.00E+05 8.00E+05 1.00E+06 1.20E+06<br />

length <strong>of</strong> simulation (time steps)<br />

L=3200<br />

BAC serial code<br />

NDS serial code<br />

sandpileP.x (p=5)<br />

Fusion Energy Division


execution time (min)<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

Serial vs. <strong>Parallel</strong><br />

Execution TIme Comparison<br />

0<br />

- 500 1,000 1,500 2,000 2,500 3,000 3,500<br />

number <strong>of</strong> cells<br />

BAC (on Mac)<br />

NDS Serial (on Mac)<br />

NDS Serial (on T3E)<br />

parallel; p=21<br />

time steps = 1,000,000<br />

Fusion Energy Division


Optimum Number <strong>of</strong> Processors<br />

execution time<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

Processor Number vs. Execution Time<br />

0 5 10 15 20 25 30 35 40 45<br />

number <strong>of</strong> processors<br />

L=1000<br />

L=3200<br />

Fusion Energy Division


Conclusions<br />

<strong>Parallel</strong>ization was a very effective means <strong>of</strong><br />

improving the run times <strong>of</strong> sandpile<br />

simulations. However, at some point the<br />

communication between processes actually<br />

degrades performance. A rule <strong>of</strong> thumb<br />

suggested for this code is to choose p such<br />

that each processor handles 250-350 cells.<br />

Fusion Energy Division


Future Improvements<br />

•Adding tracer particles to track grains from<br />

insertion to ejection<br />

•further optimization <strong>of</strong> the parallel source<br />

code<br />

•Using PVM to have the program itself<br />

calculate and create the optimum number <strong>of</strong><br />

processes for a given sandpile<br />

Fusion Energy Division

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!