29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

Figure 11.2: An AMR volume rendering <strong>of</strong> FLASH Astrophysics simulation using vl3.<br />

VASP errors, and updating the database.<br />

The workflow manager was run for three weeks within an allocated sub-queue on <strong>Magellan</strong> consisting<br />

<strong>of</strong> 80 nodes. During this time the group ran approximately 20,000 VASP calculations and successfully<br />

determined the relaxed ground state energy for 9000 inorganic crystal structures. The number <strong>of</strong> cores for<br />

each VASP MPI calculations was chosen to be consistent with the number <strong>of</strong> cores on a <strong>Magellan</strong> node. The<br />

project found 8 cores to be a good match for addressing the wide distribution <strong>of</strong> computational demands<br />

(e.g., number <strong>of</strong> electrons, k-point mesh) inherent to high-throughput computing.<br />

There were several challenges running high-throughput quantum mechanical MPI calculations on <strong>Magellan</strong>.<br />

For example, the workflow manager running on the login node would <strong>of</strong>ten get killed due to resource<br />

limits. Another problem was the robustness <strong>of</strong> the workflow manager. This was related to updating the<br />

state in the MasterQueue database that would corrupt the database. The virtual cluster on <strong>Magellan</strong> was an<br />

essential component to compute in an atomic manner and as a result, dramatically simplified the workflow.<br />

This allowed us to avoid bundling several VASP calculations into a single Portable Batch System (PBS)<br />

script. This bundling process generates another layer <strong>of</strong> complexity whose sole purpose is to work within<br />

the constraints <strong>of</strong> the queue policies <strong>of</strong> supercomputing centers, which limit the number <strong>of</strong> jobs per user.<br />

<strong>Magellan</strong> was an excellent solution for high-throughput computing for the Materials Project. The virtual<br />

cluster that NERSC staff provided allowed the users to leverage the computing resources in a flexible way, as<br />

per the needs <strong>of</strong> the application, enabling the group to focus on the challenging materials science problems<br />

<strong>of</strong> data storage and data validation.<br />

The Materials Project workflow is an example <strong>of</strong> a class <strong>of</strong> next-generation projects that require access<br />

to a large number <strong>of</strong> resources for a specific period <strong>of</strong> time for high-throughput computing. There is limited<br />

or no support available for such workloads at DOE centers today: job queue policies limit the number <strong>of</strong><br />

jobs allowed in the queue; there are no tools to enable such high-throughput jobs to seamlessly access a large<br />

number <strong>of</strong> resources; and machine firewall policies <strong>of</strong>ten restrict or constrain processes such as workflow<br />

managers and connections to external databases. Cloud technologies provide a way to enable customized<br />

environments for specific time periods while maintaining isolation to eliminate any impact to other services<br />

at the center.<br />

11.1.5 E. coli<br />

During the weekend <strong>of</strong> June 3–5, 2011, hundreds <strong>of</strong> scientists worldwide were involved in a spontaneous<br />

and decentralized effort to analyze two strains <strong>of</strong> E. coli implicated in an outbreak <strong>of</strong> food poisoning in<br />

Germany. Both strains had been sequenced only hours before and released over the internet. An Argonne<br />

and Virginia tech team worked throughout the night and through the weekend to annotate the genomes<br />

and to compare them to known E. coli strains. During the roughly 48-hour period, many E. coli strains<br />

were annotated using Argonne’s RAST annotation system, which used the <strong>Magellan</strong> testbed at ALCF to<br />

105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!