18.01.2015 Views

TGQR 2010Q4 Report.pdf - Teragridforum.org

TGQR 2010Q4 Report.pdf - Teragridforum.org

TGQR 2010Q4 Report.pdf - Teragridforum.org

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

PI Name<br />

Joseph Hargitai,<br />

Albert Einstein<br />

College of Medicine<br />

Catherine Cooper,<br />

Washington State U.<br />

RP(s)<br />

Involved<br />

Name of<br />

software<br />

Nature of improvement<br />

SDSC Hadoop SDSC staff worked to integrate Hadoop with the<br />

global filesystem, local SSDs, and the scheduling<br />

infrastructure. The PI is currently testing the<br />

implementation.<br />

TACC Underworld TACC staff member Yaakoub El Khamra worked<br />

with the research team and the code developer to<br />

develop a version of the code that works on Ranger.<br />

This version of the code is the bleeding-edge<br />

version and modules are still being modified and<br />

tested. The PI is planning to submit an ASTA<br />

request in order to received continued support.<br />

Neutron Science<br />

TeraGrid Gateway<br />

ORNL Amber Consulted with user on desired Amber version.<br />

Initiated arrangements to install Amber11 on NSTG<br />

cluster<br />

Given the strong interest by our user community in utilizing the potential of GPGPU systems, it is<br />

noteworthy to mention, in particular, the assistance given by TACC staff members to three<br />

research groups on Longhorn:<br />

5.3.2<br />

• Lucas Wilcox (PI Omar Ghattas) ran a GPU-accelerated discontinuous Galerkin (DG)<br />

seismic wave propagation code on up to 478 GPUs of longhorn, achieving excellent<br />

strong scaling and near-perfect weak scaling. His largest run on 478 GPUs used a mesh<br />

with 6 billion nodes and 56 billion degrees of freedom.<br />

• Tom Fogal (PI Hank Childs) ran a GPU-accelerated volume rendering code on an 8192-<br />

cubed helium flame combustion dataset on 256 GPUs. This is among the largest data<br />

ever to be directly visualized, and this GPU-based code achieved performance on 256<br />

GPUs that would require 100,000s of CPU cores to match (they performed a similar run<br />

on Jaguar for comparison).<br />

• David LeBard ran a GPU-accelerated fast-analysis molecular dynamics code on 128<br />

GPUs of Longhorn, achieving between 10x to 100x over CPU-based versions.<br />

More Frequent User Support Issues/Questions<br />

Among the system or site specific user issues referred to the RPs for resolution, the most frequent<br />

had to do with login/access and account management issues (e.g. TeraGrid Portal password<br />

versus resource specific passwords, password resets, security credentials, adding and removing<br />

users, locating environment variables, allocations on machines leaving and joining the TeraGrid).<br />

In decreasing order of frequency, issues with job queuing (e.g. which queues to use for various<br />

types of jobs, throughput issues, scripting and execution issues); software availability and use<br />

(finding, setting up, building, optimizing and running); system availability and performance<br />

problems (e.g. memory limit questions, compute node failures, I/O timeouts, requests to kill<br />

jobs); file systems (e.g. permissions and quotas, corrupted files), and wide area data transfer and<br />

archive systems questions were also encountered.<br />

39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!