FY2010 - Oak Ridge National Laboratory
FY2010 - Oak Ridge National Laboratory
FY2010 - Oak Ridge National Laboratory
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Director’s R&D Fund—<br />
Ultrascale Computing and Data Science<br />
Perumalla, K. S. 2010. µπ: A Scalable and Transparent System for Simulating MPI Programs. ICST<br />
International Conference on Simulation Tools and Techniques. Torremolinos, Italy.<br />
Perumalla, K. S., and C. Carothers. 2010. Compiler-based Automation Approaches to Reverse<br />
Computation. Reverse Computation Workshop (in conjunction with IEEE/ACM/SCS PADS'10),<br />
Atlanta, GA, USA, IEEE Computer Society.<br />
Perumalla, K. S., and S. K. Seal (2010). Reversible Parallel Discrete Event Execution of Large-scale<br />
Epidemic Outbreak Models. IEEE/ACM/SCS International Workshop on Principles of Advanced and<br />
Distributed Simulation. Atlanta, GA, USA, IEEE Computer Society (Best Paper Finalist).<br />
05550<br />
Computational Biology Toolbox for Ultrascale Computing<br />
Igor B. Jouline, Bhanu Rekepalli, Andrey A. Gorin, and Christian Halloy<br />
Project Description<br />
Insufficient capability to translate the exponentially growing genomic data into useful knowledge is the<br />
single most pressing grand challenge in biology. The goal of this project is to dramatically improve<br />
biological function prediction by building new and improved models for mining genomic data. This goal<br />
will be achieved by using most sensitive data mining tools organized in a robust, massively parallel<br />
computational infrastructure. We will port these tools to a Cray XT5 supercomputer and adopt their usage<br />
for developing cloud computing, thus enabling mining not only the existing genomic data, but also the<br />
future data sets that will be larger by orders of magnitude.<br />
There are two types of the project deliverables: (i) a newly developed toolbox containing most useful<br />
computational biology software implemented for Cray supercomputers and (ii) a set of new and improved<br />
models for biological function prediction that will become available worldwide through major national<br />
and international databases. By investing in this project, ORNL will seize the opportunity to become a<br />
leader in ultrascale computational biology and will position our team strategically to successfully compete<br />
for major funding from the <strong>National</strong> Institutes of Health (NIH) and DOE.<br />
Mission Relevance<br />
This project aims at establishing ORNL as a world leader in dynamic knowledge discovery based on<br />
capabilities for handling diverse genomic data. It will also contribute to developing focused research<br />
communities in biology, because the computational biology toolbox developed by the project will be used<br />
by a large community of biomedical scientists. This project also addresses the major problems of the<br />
DOE (bioenergy) and NIH (human health), because improved biological function prediction is urgently<br />
needed to solve these problems.<br />
Results and Accomplishments<br />
Nearly all deliverables planned for the Year 1 have been met or exceeded.<br />
BLAST. We optimized the BLAST code. First we installed the BLAST tool on the Kraken supercomputer<br />
and profiled the code to understand the I/O. The database broadcasting to all the nodes in the job was<br />
optimized and then the I/O was optimized in two phases. First, a buffer was created in which all the input<br />
query sequences were stored and a dynamic load balancing algorithm was designed to distribute the work<br />
optimally between all the cores of the node. Second, the outputs from each core were put into separate<br />
90