Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
Figure 11.2: An AMR volume rendering <strong>of</strong> FLASH Astrophysics simulation using vl3.<br />
VASP errors, and updating the database.<br />
The workflow manager was run for three weeks within an allocated sub-queue on <strong>Magellan</strong> consisting<br />
<strong>of</strong> 80 nodes. During this time the group ran approximately 20,000 VASP calculations and successfully<br />
determined the relaxed ground state energy for 9000 inorganic crystal structures. The number <strong>of</strong> cores for<br />
each VASP MPI calculations was chosen to be consistent with the number <strong>of</strong> cores on a <strong>Magellan</strong> node. The<br />
project found 8 cores to be a good match for addressing the wide distribution <strong>of</strong> computational demands<br />
(e.g., number <strong>of</strong> electrons, k-point mesh) inherent to high-throughput computing.<br />
There were several challenges running high-throughput quantum mechanical MPI calculations on <strong>Magellan</strong>.<br />
For example, the workflow manager running on the login node would <strong>of</strong>ten get killed due to resource<br />
limits. Another problem was the robustness <strong>of</strong> the workflow manager. This was related to updating the<br />
state in the MasterQueue database that would corrupt the database. The virtual cluster on <strong>Magellan</strong> was an<br />
essential component to compute in an atomic manner and as a result, dramatically simplified the workflow.<br />
This allowed us to avoid bundling several VASP calculations into a single Portable Batch System (PBS)<br />
script. This bundling process generates another layer <strong>of</strong> complexity whose sole purpose is to work within<br />
the constraints <strong>of</strong> the queue policies <strong>of</strong> supercomputing centers, which limit the number <strong>of</strong> jobs per user.<br />
<strong>Magellan</strong> was an excellent solution for high-throughput computing for the Materials Project. The virtual<br />
cluster that NERSC staff provided allowed the users to leverage the computing resources in a flexible way, as<br />
per the needs <strong>of</strong> the application, enabling the group to focus on the challenging materials science problems<br />
<strong>of</strong> data storage and data validation.<br />
The Materials Project workflow is an example <strong>of</strong> a class <strong>of</strong> next-generation projects that require access<br />
to a large number <strong>of</strong> resources for a specific period <strong>of</strong> time for high-throughput computing. There is limited<br />
or no support available for such workloads at DOE centers today: job queue policies limit the number <strong>of</strong><br />
jobs allowed in the queue; there are no tools to enable such high-throughput jobs to seamlessly access a large<br />
number <strong>of</strong> resources; and machine firewall policies <strong>of</strong>ten restrict or constrain processes such as workflow<br />
managers and connections to external databases. Cloud technologies provide a way to enable customized<br />
environments for specific time periods while maintaining isolation to eliminate any impact to other services<br />
at the center.<br />
11.1.5 E. coli<br />
During the weekend <strong>of</strong> June 3–5, 2011, hundreds <strong>of</strong> scientists worldwide were involved in a spontaneous<br />
and decentralized effort to analyze two strains <strong>of</strong> E. coli implicated in an outbreak <strong>of</strong> food poisoning in<br />
Germany. Both strains had been sequenced only hours before and released over the internet. An Argonne<br />
and Virginia tech team worked throughout the night and through the weekend to annotate the genomes<br />
and to compare them to known E. coli strains. During the roughly 48-hour period, many E. coli strains<br />
were annotated using Argonne’s RAST annotation system, which used the <strong>Magellan</strong> testbed at ALCF to<br />
105