Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
3.1 Collaborations and Synergistic Activities<br />
<strong>Magellan</strong> research was conducted as a collaboration across both ALCF and NERSC, as well as leveraging<br />
the expertise <strong>of</strong> other collaborators and projects. At a high-level, <strong>Magellan</strong> collaboration activities can be<br />
classified into three categories:<br />
• <strong>Magellan</strong> Core Research. <strong>Magellan</strong> staff actively performed the research necessary to answer the<br />
research questions outlined in the report with respect to understanding the applicability <strong>of</strong> cloud<br />
computing for scientific applications.<br />
• Synergistic Research Activities. <strong>Magellan</strong> staff participated in a number <strong>of</strong> key collaborations in<br />
research related to cloud computing.<br />
• <strong>Magellan</strong> Resource Enabling User Research. <strong>Magellan</strong> resources were used extensively by scientific<br />
users in their simulations and data analysis. <strong>Magellan</strong> staff <strong>of</strong>ten provided key user support in<br />
facilitating this research.<br />
We outline the key <strong>Magellan</strong> collaborations in this section. Additional collaborations are highlighted<br />
throughout the report as well. Specifically the application case studies are highlighted in Chapter 11.<br />
Cloud Benchmarking. The benchmarking <strong>of</strong> commercial cloud platforms was performed in collaboration<br />
with the IT department at Lawrence Berkeley National Laboratory (LBNL), which manages some <strong>of</strong><br />
the mid- range computing clusters for scientific users; the Advanced Technology Group at NERSC, which<br />
studies the requirements <strong>of</strong> current and emerging NERSC applications to find hardware design choices and<br />
programming models; and the Advanced Computing for <strong>Science</strong> (ACS) <strong>Department</strong> in the Computational<br />
Research Division (CRD) at LBNL, which seeks to create s<strong>of</strong>tware and tools to enable science on diverse<br />
resource platforms. Access to commercial cloud platforms was also possible through collaboration with the<br />
IT division and the University <strong>of</strong> California Center for Information Technology Research in the Interest <strong>of</strong><br />
Society (CITRIS), which had existing contracts with different providers. This early benchmarking effort<br />
resulted in a publication at CloudCom 2010 [46] that was awarded Best Paper.<br />
MapReduce/Hadoop Evaluation. Scientists are struggling with a tsunami <strong>of</strong> data across many domains.<br />
Emerging sensor networks, more capable instruments, and ever increasing simulation scales are generating<br />
data at a rate that exceeds our ability to effectively manage, curate, analyze, and share it. This is exacerbated<br />
by the limited understanding and expertise on the hardware resources and s<strong>of</strong>tware infrastructure required<br />
for handling these diverse data volumes. A project funded through the Laboratory Directed Research and<br />
Development (LDRD) program at Lawrence Berkeley Laboratory is looking at the role that many new,<br />
potentially disruptive, technologies can play in accelerating discovery. <strong>Magellan</strong> resources were used for this<br />
evaluation. Additionally, staff worked closely with a summer student on the project evaluating the specific<br />
application patterns that might benefit from Hadoop. <strong>Magellan</strong> staff also worked closely with the Grid Computing<br />
Research Laboratory at SUNY, Binghamton in a comparative benchmarking study <strong>of</strong> MapReduce<br />
implementations and an alternate implementation <strong>of</strong> MapReduce that can work in HPC environments. This<br />
collaboration resulted in two publications in Grid 2011 [25, 24].<br />
Collaboration with Joint Genome Institute. The <strong>Magellan</strong> project leveraged the partnership between<br />
the Joint Genome Institute (JGI) and NERSC to benchmark the IMG and MGM pipelines on a variety <strong>of</strong><br />
platforms. Project personnel were also involved in pilot projects for the Systems Biology Knowledge Base.<br />
They provided expertise with technologies such as HBASE which were useful in guiding the deployments<br />
on <strong>Magellan</strong>. <strong>Magellan</strong> resources were also deployed for use by JGI as a pro<strong>of</strong>-<strong>of</strong>-concept <strong>of</strong> Hardware-as-a-<br />
Service. JGI also made extensive use <strong>of</strong> the Hadoop cluster.<br />
10