Parallel Implementation of a Sandpile Model - Oak Ridge National ...
Parallel Implementation of a Sandpile Model - Oak Ridge National ...
Parallel Implementation of a Sandpile Model - Oak Ridge National ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Parallel</strong> <strong>Implementation</strong> <strong>of</strong> a<br />
<strong>Sandpile</strong> <strong>Model</strong><br />
Nathaniel D. Sizemore<br />
Westminster College<br />
Vickie E. Lynch<br />
<strong>Oak</strong> <strong>Ridge</strong> <strong>National</strong> Laboratory<br />
Fusion Energy Division
Goals for the Project<br />
<strong>Sandpile</strong> models are useful for modeling a<br />
variety <strong>of</strong> processes ranging from power<br />
transmission systems to the transport <strong>of</strong><br />
particles in plasmas used in fusion reactions.<br />
However, on PCs and workstations, the<br />
simulations <strong>of</strong>ten take too long to run. The<br />
goal for the project was to reduce the model’s<br />
run time.<br />
Fusion Energy Division
What is a <strong>Sandpile</strong>?<br />
• <strong>Sandpile</strong> models are a subset <strong>of</strong> cellular<br />
automata (such as Conway’s Life<br />
simulations)<br />
• The simulation runs by applying a set <strong>of</strong><br />
rules to each cell over a given number <strong>of</strong><br />
time steps.<br />
Fusion Energy Division
How a <strong>Sandpile</strong> Works<br />
• Particles are “dropped” into the simulation.<br />
• When a cell grows high enough to be<br />
unstable, it falls over, giving part <strong>of</strong> its mass<br />
to neighboring cells. This is called a flip.<br />
• Flips can cause other flips -- this process is<br />
called an avalanche.<br />
Fusion Energy Division
• Z n = h n - h n±1<br />
• If Z n Z crit then<br />
➛h n = h n - N f<br />
“Rules <strong>of</strong> the Game”<br />
➛h n ±1 = H n ±1 + N f<br />
Fusion Energy Division
How does it work?<br />
Fusion Energy Division
<strong>Parallel</strong>ization<br />
• Ways to speed up a job<br />
– Work harder (faster machines)<br />
– Work smarter(optimizing)<br />
– Find more workers<br />
• <strong>Parallel</strong>ization helps speed the program up<br />
by using many processors to work on the<br />
problem<br />
Fusion Energy Division
Optimizing the Serial Code<br />
• Reduced costly file I/O by combining<br />
several output files into one<br />
• Loops combined where possible<br />
• Instead <strong>of</strong> storing the sandpile in several<br />
arrays, a single array <strong>of</strong> a C++ cell class<br />
was used<br />
• Cell objects contained pointers to their<br />
neighbors, similar to a doubly-linked list<br />
Fusion Energy Division
Going from Serial to <strong>Parallel</strong><br />
• Each processor was given L/p <strong>of</strong> the<br />
sandpile<br />
• MPI was used, so global functions (such as<br />
determining the mass <strong>of</strong> the sandpile) were<br />
already optimized<br />
Fusion Energy Division
mcurie -- the Cray T3E<br />
Fusion Energy Division
mcurie -- the Cray T3E<br />
• 696 DEC Alpha 450 MHz processing<br />
elements (nodes)<br />
• 900 Mflops/node peak performance<br />
• Distributed memory: 256 Mb/node<br />
Fusion Energy Division
Comparing Serial Algorithms<br />
execution time (min)<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
BAC vs. NDS Serial Algorithms<br />
0<br />
- 500 1,000 1,500 2,000 2,500 3,000 3,500<br />
Number <strong>of</strong> cells<br />
BAC serial code<br />
NDS serial code<br />
time steps = 1,000,000<br />
Fusion Energy Division
Run Time Increases Linearly with Number <strong>of</strong> Time Steps<br />
execution time (min)<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
Time Steps vs. Execution Time<br />
0<br />
0.00E+00 2.00E+05 4.00E+05 6.00E+05 8.00E+05 1.00E+06 1.20E+06<br />
length <strong>of</strong> simulation (time steps)<br />
L=3200<br />
BAC serial code<br />
NDS serial code<br />
sandpileP.x (p=5)<br />
Fusion Energy Division
execution time (min)<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
Serial vs. <strong>Parallel</strong><br />
Execution TIme Comparison<br />
0<br />
- 500 1,000 1,500 2,000 2,500 3,000 3,500<br />
number <strong>of</strong> cells<br />
BAC (on Mac)<br />
NDS Serial (on Mac)<br />
NDS Serial (on T3E)<br />
parallel; p=21<br />
time steps = 1,000,000<br />
Fusion Energy Division
Optimum Number <strong>of</strong> Processors<br />
execution time<br />
9<br />
8<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
Processor Number vs. Execution Time<br />
0 5 10 15 20 25 30 35 40 45<br />
number <strong>of</strong> processors<br />
L=1000<br />
L=3200<br />
Fusion Energy Division
Conclusions<br />
<strong>Parallel</strong>ization was a very effective means <strong>of</strong><br />
improving the run times <strong>of</strong> sandpile<br />
simulations. However, at some point the<br />
communication between processes actually<br />
degrades performance. A rule <strong>of</strong> thumb<br />
suggested for this code is to choose p such<br />
that each processor handles 250-350 cells.<br />
Fusion Energy Division
Future Improvements<br />
•Adding tracer particles to track grains from<br />
insertion to ejection<br />
•further optimization <strong>of</strong> the parallel source<br />
code<br />
•Using PVM to have the program itself<br />
calculate and create the optimum number <strong>of</strong><br />
processes for a given sandpile<br />
Fusion Energy Division