09.05.2014 Views

FY2010 - Oak Ridge National Laboratory

FY2010 - Oak Ridge National Laboratory

FY2010 - Oak Ridge National Laboratory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Director’s R&D Fund—<br />

Ultrascale Computing and Data Science<br />

Perumalla, K. S. 2010. µπ: A Scalable and Transparent System for Simulating MPI Programs. ICST<br />

International Conference on Simulation Tools and Techniques. Torremolinos, Italy.<br />

Perumalla, K. S., and C. Carothers. 2010. Compiler-based Automation Approaches to Reverse<br />

Computation. Reverse Computation Workshop (in conjunction with IEEE/ACM/SCS PADS'10),<br />

Atlanta, GA, USA, IEEE Computer Society.<br />

Perumalla, K. S., and S. K. Seal (2010). Reversible Parallel Discrete Event Execution of Large-scale<br />

Epidemic Outbreak Models. IEEE/ACM/SCS International Workshop on Principles of Advanced and<br />

Distributed Simulation. Atlanta, GA, USA, IEEE Computer Society (Best Paper Finalist).<br />

05550<br />

Computational Biology Toolbox for Ultrascale Computing<br />

Igor B. Jouline, Bhanu Rekepalli, Andrey A. Gorin, and Christian Halloy<br />

Project Description<br />

Insufficient capability to translate the exponentially growing genomic data into useful knowledge is the<br />

single most pressing grand challenge in biology. The goal of this project is to dramatically improve<br />

biological function prediction by building new and improved models for mining genomic data. This goal<br />

will be achieved by using most sensitive data mining tools organized in a robust, massively parallel<br />

computational infrastructure. We will port these tools to a Cray XT5 supercomputer and adopt their usage<br />

for developing cloud computing, thus enabling mining not only the existing genomic data, but also the<br />

future data sets that will be larger by orders of magnitude.<br />

There are two types of the project deliverables: (i) a newly developed toolbox containing most useful<br />

computational biology software implemented for Cray supercomputers and (ii) a set of new and improved<br />

models for biological function prediction that will become available worldwide through major national<br />

and international databases. By investing in this project, ORNL will seize the opportunity to become a<br />

leader in ultrascale computational biology and will position our team strategically to successfully compete<br />

for major funding from the <strong>National</strong> Institutes of Health (NIH) and DOE.<br />

Mission Relevance<br />

This project aims at establishing ORNL as a world leader in dynamic knowledge discovery based on<br />

capabilities for handling diverse genomic data. It will also contribute to developing focused research<br />

communities in biology, because the computational biology toolbox developed by the project will be used<br />

by a large community of biomedical scientists. This project also addresses the major problems of the<br />

DOE (bioenergy) and NIH (human health), because improved biological function prediction is urgently<br />

needed to solve these problems.<br />

Results and Accomplishments<br />

Nearly all deliverables planned for the Year 1 have been met or exceeded.<br />

BLAST. We optimized the BLAST code. First we installed the BLAST tool on the Kraken supercomputer<br />

and profiled the code to understand the I/O. The database broadcasting to all the nodes in the job was<br />

optimized and then the I/O was optimized in two phases. First, a buffer was created in which all the input<br />

query sequences were stored and a dynamic load balancing algorithm was designed to distribute the work<br />

optimally between all the cores of the node. Second, the outputs from each core were put into separate<br />

90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!