09.05.2014 Views

FY2010 - Oak Ridge National Laboratory

FY2010 - Oak Ridge National Laboratory

FY2010 - Oak Ridge National Laboratory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Director’s R&D Fund—<br />

Ultrascale Computing and Data Science<br />

and the international fusion reactor, ITER (which if successful could help ensure the energy supply of<br />

humanity for centuries to come). By enabling a predictive capability, this project may have direct impact<br />

on the success of ITER and the U.S. magnetic fusion program and may contribute to ensure a good<br />

scientific/technological return on U.S. investment in such an experiment. Relevance to scientific<br />

discovery and innovation stems from the strong connection with two offices of the DOE Office of<br />

Science: the Office of Fusion Energy Science and the Office of Applied Scientific Computing Research.<br />

Success in this project will contribute to the core goals of both offices and will contribute to U.S.<br />

scientific prominence in the world.<br />

Results and Accomplishments<br />

Several key milestones have been met in this project to date. These include (1) mathematical proof of<br />

elimination of finite-grid instability and exact energy conservation with implicit PIC; (2) generalization of<br />

the delta-f energy-conserving approach to full-f; and (3) the assessment of particle subcycling and orbit<br />

averaging in implicit PIC. The first accomplishment is a key development, as it demonstrates our premise<br />

that fully implicit, energy-conserving PIC can be free of deleterious numerical instabilities, both spatial<br />

and temporal. This paves the road to truly efficient kinetic modeling of plasmas, which is the main goal of<br />

this project. The second accomplishment enables the treatment of completely general temperature<br />

profiles, which was a major limitation of earlier work. This is another important aspect of this project,<br />

necessary for the development of a first-principles predictive capability. The third and final<br />

accomplishment proves our premise that the accurate integration of orbits in phase space is of the essence<br />

for a reliable long-term simulation using implicit time-stepping techniques. In particular, we have proved<br />

that a careful numerical treatment of particle orbits is directly related to good momentum conservation<br />

properties, and that lack of care in this regard results in late-time solution degradation. Put together, these<br />

accomplishments place this effort on a very solid scientific foundation and set the stage for strong future<br />

impact.<br />

05387<br />

Soft-Error Resilience for Future-Generation High-Performance<br />

Computing Systems<br />

Christian Engelmann and Sudharshan S. Vazhkudai<br />

Project Description<br />

The premise of this project is that soft errors, that is, uncorrected bit flips in computer chip logic caused<br />

by thermal and voltage variations as well as natural radiation, will be the main cause of interruptions in<br />

future high-performance computing (HPC) systems due to smaller circuit sizes, lower voltages, and<br />

increased component count. Based on the exa-scale roadmap, vendors have to find the right balance<br />

between resilience and power consumption. While they will offer extensive soft error detection support to<br />

avoid silent data corruption (SDC), soft error correction will be limited to save power. This has two<br />

consequences. First, fewer soft errors will be masked by the hardware. Second, the risk of SDC remains<br />

as its prevention is still an active area of research. This project targets two different solutions to alleviate<br />

the issue of soft errors: (1) checkpoint storage virtualization to improve checkpoint/restart times and<br />

(2) software redundancy to eliminate rollback/recovery.<br />

84

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!