30.01.2014 Views

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Aachen</strong> Institute for Adv<strong>an</strong>ced Study in Computational Engineering Science (AICES)<br />

Funded by Deutsche Forschungsgemeinschaft (DFG).<br />

AICES is a doctoral program established un<strong>der</strong> the auspices of the Excellence Initiative of the<br />

Germ<strong>an</strong> state <strong>an</strong>d fe<strong>der</strong>al governments to meet the future research challenges in<br />

computational engineering science. Currently, two members of our group conduct their thesis<br />

projects funded through AICES Ph.D. fellowships. One thesis project examines the timedependent<br />

behavior of parallel applications <strong>an</strong>d aims at making the perform<strong>an</strong>ce <strong>an</strong>alysis<br />

more scalable with respect to the length of execution. To be able to apply a previously<br />

devised algorithm for the sem<strong>an</strong>tic compression of time-series call-path profiles also to C++<br />

codes, where direct instrumentation may cause inacceptably high overhead, a hybrid profiling<br />

technique was developed that captures key communication metrics via direct instrumentation,<br />

while user-code profiling is accomplished via low-overhead sampling. The other project<br />

investigates load <strong>an</strong>d communication imbal<strong>an</strong>ce in parallel codes to better un<strong>der</strong>st<strong>an</strong>d the<br />

formation of perform<strong>an</strong>ce-degrading wait states. First, a terminology was introduced to<br />

classify wait states based on their propagation behavior. Building on earlier work by Meira,<br />

Jr. et al., a scalable method was then designed that identifies program wait states, classifies<br />

them according to the above-mentioned terminology, <strong>an</strong>d attributes their cost in terms of<br />

resource waste to their original cause. By replaying event traces in parallel, it is now possible<br />

to identify the processes <strong>an</strong>d call paths responsible for the most severe waiting times even for<br />

runs with very large numbers of processes. This work won the best paper award of the<br />

International Conference on Parallel Processing (ICPP) <strong>2010</strong> in S<strong>an</strong> Diego, California.<br />

Virtual Institute – High Productivity Supercomputing (VI-HPS)<br />

Funded by the Helmholtz Association <strong>an</strong>d carried out in cooperation with Forschungszentrum<br />

Jülich, <strong>RWTH</strong> <strong>Aachen</strong> University (Institute for Scientific Computing), TU Dresden,<br />

University of Tennessee, TU Munich, <strong>an</strong>d University of Stuttgart.<br />

The mission of this virtual institute is to improve the quality <strong>an</strong>d accelerate the development<br />

process of complex simulation programs in science <strong>an</strong>d engineering that are being designed<br />

for the most adv<strong>an</strong>ced parallel computer systems. For this purpose, we develop <strong>an</strong>d integrate<br />

state-of-the-art programming tools for high-perform<strong>an</strong>ce computing that assist domain<br />

scientists in diagnosing programming errors <strong>an</strong>d optimizing the perform<strong>an</strong>ce of their<br />

applications. In these efforts, we place special emphasis on scalability <strong>an</strong>d ease of use.<br />

Besides the purely technical development of such tools, the virtual institute also offers<br />

training workshops with practical exercises to make more users aware of the benefits they c<strong>an</strong><br />

achieve by using the tools. During the past year, three tuning workshops with h<strong>an</strong>ds-on<br />

sessions were org<strong>an</strong>ized in Munich, Amsterdam, <strong>an</strong>d the King Abdullah University of Science<br />

<strong>an</strong>d Technology in Saudi Arabia. In addition, two conference tutorials also with h<strong>an</strong>ds-on<br />

exercises were held at the IEEE Conference on Cluster Computing in Heraklion, Greece <strong>an</strong>d<br />

at the ACM/IEEE Conference on Supercomputing (SC10) in New Orle<strong>an</strong>s, USA.<br />

436

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!