28.12.2014 Views

TGQR 2010Q2 Report.pdf - Teragridforum.org

TGQR 2010Q2 Report.pdf - Teragridforum.org

TGQR 2010Q2 Report.pdf - Teragridforum.org

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Amanda Hughes, a PhD student of Dr. Van Briesen, had developed a MATLAB code that could,<br />

on her laptop, run 330 MC simulations before running out of memory. Over the last quarter, her<br />

original serial MATLAB code was ported to the MATLAB Star-P platform on Pople and<br />

parallelized. As part of the porting process:<br />

1) The original code structure was thoroughly reviewed.<br />

2) The code was restructured so that the parallelizable parts of the code were consolidated into a<br />

single for loop.<br />

3) During the restructuring, additional vectorization and other code tweaks were performed to<br />

improve the serial performance.<br />

4) Finally the code was parallelized with Star-P by moving the contents of the parallelizable for<br />

loop into a functions and doing a ppeval on the function.<br />

This has now enabled simulations with up to 10000 MC runs. The scaling of computational time<br />

with increasing # cores was found to be very satisfactory too.<br />

Note that because the code was restructured to put everything to be parallelized into a single for<br />

loop, it will be trivial to run this in parallel on *other parallel MATLAB platforms*. For example,<br />

if using the Mathworks Parallel Computing Toolbox, instead of a ppeval, the for loop just needs<br />

to be replaced by the parfor loop.<br />

PI: Brasseur (Penn State, BioPhysics).Villous Motility as a Critical Mechanism for Efficient<br />

Nutrient Absorption in the Small Intestine. Continuing through 03/11. Lonnie Crosby (NICS), for<br />

computational work, and Amit Chourasia (SDSC), for visualization work, are involved in this<br />

ASTA project. In this quarter effort focused on computational work by Lonnie. This group's code,<br />

Intestine3D, initially produced a large amount of metadata traffic on the file system. This large<br />

traffic volume was due to the I/O pattern employed by the application in which files are opened<br />

and closed for each read/write operation. Twelve files were identified which could benefit from<br />

remaining open during the course of the application's runtime. This approach was able to<br />

decrease the application's runtime by 33%.<br />

A full profile of application performance was performed and sent to members of the project. This<br />

profile identified an inherent load imbalance in the application. Addressing this load imbalance<br />

could improve performance by an estimated 10%. Additionally, the implementation of MPI-IO<br />

parallel I/O in order to produce concatenated binary files instead of individual (one per process or<br />

group) ANSCI files can improve application performance by about 10%. Suggestions and<br />

instructions for implementing MPI-IO parallel-IO were sent to members of the project.<br />

PI. Engel (U. North Carolina, Physics). Systematics of Nuclear Surgace Vibrations in Deformed<br />

Nuclei. Continuing through 03/11. ASTA staff for this project are Meng-Shiou Wu (NICS) and<br />

Victor Eijkhout (TACC). During the first quarter of this ASTA project focus was on gathering<br />

detail specifications and understanding the code structure of the project. Meng-Shiou has been<br />

working with the group in order to explore possible approaches to improve the code efficiency on<br />

Kraken. Jointly with the PI’s group they have identified and discussed why their approach to use<br />

ScaLapack was not working, and what are possible choices for them to conduct diagonalization<br />

on a multi-core XT5 node. Both shared memory approach (use threaded libraries) and distributed<br />

memory approach (use sub-communicator and re-design memory management in their code)<br />

were discussed. Several scientific libraries that support multi-core architecture were tested<br />

(Libsci of Cray, AMD's ACML and ATLAS), but very limited or no performance improvement<br />

was observed. Currently work is on integration of their code with a code segment provided by<br />

another research team that use a master-slave style programming to utilize MPI's subcommunicator<br />

for the project.<br />

53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!