FY2010 - Oak Ridge National Laboratory
FY2010 - Oak Ridge National Laboratory
FY2010 - Oak Ridge National Laboratory
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Director’s R&D Fund—<br />
Ultrascale Computing and Data Science<br />
and the international fusion reactor, ITER (which if successful could help ensure the energy supply of<br />
humanity for centuries to come). By enabling a predictive capability, this project may have direct impact<br />
on the success of ITER and the U.S. magnetic fusion program and may contribute to ensure a good<br />
scientific/technological return on U.S. investment in such an experiment. Relevance to scientific<br />
discovery and innovation stems from the strong connection with two offices of the DOE Office of<br />
Science: the Office of Fusion Energy Science and the Office of Applied Scientific Computing Research.<br />
Success in this project will contribute to the core goals of both offices and will contribute to U.S.<br />
scientific prominence in the world.<br />
Results and Accomplishments<br />
Several key milestones have been met in this project to date. These include (1) mathematical proof of<br />
elimination of finite-grid instability and exact energy conservation with implicit PIC; (2) generalization of<br />
the delta-f energy-conserving approach to full-f; and (3) the assessment of particle subcycling and orbit<br />
averaging in implicit PIC. The first accomplishment is a key development, as it demonstrates our premise<br />
that fully implicit, energy-conserving PIC can be free of deleterious numerical instabilities, both spatial<br />
and temporal. This paves the road to truly efficient kinetic modeling of plasmas, which is the main goal of<br />
this project. The second accomplishment enables the treatment of completely general temperature<br />
profiles, which was a major limitation of earlier work. This is another important aspect of this project,<br />
necessary for the development of a first-principles predictive capability. The third and final<br />
accomplishment proves our premise that the accurate integration of orbits in phase space is of the essence<br />
for a reliable long-term simulation using implicit time-stepping techniques. In particular, we have proved<br />
that a careful numerical treatment of particle orbits is directly related to good momentum conservation<br />
properties, and that lack of care in this regard results in late-time solution degradation. Put together, these<br />
accomplishments place this effort on a very solid scientific foundation and set the stage for strong future<br />
impact.<br />
05387<br />
Soft-Error Resilience for Future-Generation High-Performance<br />
Computing Systems<br />
Christian Engelmann and Sudharshan S. Vazhkudai<br />
Project Description<br />
The premise of this project is that soft errors, that is, uncorrected bit flips in computer chip logic caused<br />
by thermal and voltage variations as well as natural radiation, will be the main cause of interruptions in<br />
future high-performance computing (HPC) systems due to smaller circuit sizes, lower voltages, and<br />
increased component count. Based on the exa-scale roadmap, vendors have to find the right balance<br />
between resilience and power consumption. While they will offer extensive soft error detection support to<br />
avoid silent data corruption (SDC), soft error correction will be limited to save power. This has two<br />
consequences. First, fewer soft errors will be masked by the hardware. Second, the risk of SDC remains<br />
as its prevention is still an active area of research. This project targets two different solutions to alleviate<br />
the issue of soft errors: (1) checkpoint storage virtualization to improve checkpoint/restart times and<br />
(2) software redundancy to eliminate rollback/recovery.<br />
84