29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

3.1 Collaborations and Synergistic Activities<br />

<strong>Magellan</strong> research was conducted as a collaboration across both ALCF and NERSC, as well as leveraging<br />

the expertise <strong>of</strong> other collaborators and projects. At a high-level, <strong>Magellan</strong> collaboration activities can be<br />

classified into three categories:<br />

• <strong>Magellan</strong> Core Research. <strong>Magellan</strong> staff actively performed the research necessary to answer the<br />

research questions outlined in the report with respect to understanding the applicability <strong>of</strong> cloud<br />

computing for scientific applications.<br />

• Synergistic Research Activities. <strong>Magellan</strong> staff participated in a number <strong>of</strong> key collaborations in<br />

research related to cloud computing.<br />

• <strong>Magellan</strong> Resource Enabling User Research. <strong>Magellan</strong> resources were used extensively by scientific<br />

users in their simulations and data analysis. <strong>Magellan</strong> staff <strong>of</strong>ten provided key user support in<br />

facilitating this research.<br />

We outline the key <strong>Magellan</strong> collaborations in this section. Additional collaborations are highlighted<br />

throughout the report as well. Specifically the application case studies are highlighted in Chapter 11.<br />

Cloud Benchmarking. The benchmarking <strong>of</strong> commercial cloud platforms was performed in collaboration<br />

with the IT department at Lawrence Berkeley National Laboratory (LBNL), which manages some <strong>of</strong><br />

the mid- range computing clusters for scientific users; the Advanced Technology Group at NERSC, which<br />

studies the requirements <strong>of</strong> current and emerging NERSC applications to find hardware design choices and<br />

programming models; and the Advanced Computing for <strong>Science</strong> (ACS) <strong>Department</strong> in the Computational<br />

Research Division (CRD) at LBNL, which seeks to create s<strong>of</strong>tware and tools to enable science on diverse<br />

resource platforms. Access to commercial cloud platforms was also possible through collaboration with the<br />

IT division and the University <strong>of</strong> California Center for Information Technology Research in the Interest <strong>of</strong><br />

Society (CITRIS), which had existing contracts with different providers. This early benchmarking effort<br />

resulted in a publication at CloudCom 2010 [46] that was awarded Best Paper.<br />

MapReduce/Hadoop Evaluation. Scientists are struggling with a tsunami <strong>of</strong> data across many domains.<br />

Emerging sensor networks, more capable instruments, and ever increasing simulation scales are generating<br />

data at a rate that exceeds our ability to effectively manage, curate, analyze, and share it. This is exacerbated<br />

by the limited understanding and expertise on the hardware resources and s<strong>of</strong>tware infrastructure required<br />

for handling these diverse data volumes. A project funded through the Laboratory Directed Research and<br />

Development (LDRD) program at Lawrence Berkeley Laboratory is looking at the role that many new,<br />

potentially disruptive, technologies can play in accelerating discovery. <strong>Magellan</strong> resources were used for this<br />

evaluation. Additionally, staff worked closely with a summer student on the project evaluating the specific<br />

application patterns that might benefit from Hadoop. <strong>Magellan</strong> staff also worked closely with the Grid Computing<br />

Research Laboratory at SUNY, Binghamton in a comparative benchmarking study <strong>of</strong> MapReduce<br />

implementations and an alternate implementation <strong>of</strong> MapReduce that can work in HPC environments. This<br />

collaboration resulted in two publications in Grid 2011 [25, 24].<br />

Collaboration with Joint Genome Institute. The <strong>Magellan</strong> project leveraged the partnership between<br />

the Joint Genome Institute (JGI) and NERSC to benchmark the IMG and MGM pipelines on a variety <strong>of</strong><br />

platforms. Project personnel were also involved in pilot projects for the Systems Biology Knowledge Base.<br />

They provided expertise with technologies such as HBASE which were useful in guiding the deployments<br />

on <strong>Magellan</strong>. <strong>Magellan</strong> resources were also deployed for use by JGI as a pro<strong>of</strong>-<strong>of</strong>-concept <strong>of</strong> Hardware-as-a-<br />

Service. JGI also made extensive use <strong>of</strong> the Hadoop cluster.<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!