Magellan Final Report - Office of Science - U.S. Department of Energy

More documents

Recommendations

Info

Magellan Final Report at a virtual OpenCL (VOCL) framework that could support the transparent utilization of local or remote GPUs. This framework, based on the OpenCL programming model, exposes physical GPUs as decoupled virtual resources that can be transparently managed independent of the application execution. The performance of VOCL for four real-world applications was evaluated as part of the project, looking at various computation and memory access intensities. The work showed that compute-intensive applications can execute with relatively small amounts of overhead within the VOCL framework. Virtualization Overhead Benchmarking. The benchmarking of virtualization overheads using both Eucalyptus and OpenStack was performed in collaboration with the Mathematics and Computer Science Division (MCS) at Argonne, which does algorithm development and software design in core areas such as optimization, explores new technologies such as distributed computing and bioinformatics, and performs numerical simulations in challenging areas such as climate modeling, the Advanced Integration Group at ALCF, which designs, develops, benchmarks, and deploys new technology and tools, and the Performance Engineering Group at ALCF, which works to ensure the effective use of applications on ALCF systems and emerging systems. This work is detailed in Chapter 9. SuperNova Factory. Magellan project personnel were part of the team of researchers from LBNL who received the Best Paper Award at ScienceCloud 2010. The paper describes the feasibility of porting the Nearby Supernova Factory pipeline to the Amazon Web Services environment and offers detailed performance results and lessons learned from various design options. MOAB Provisioning. We worked closely with Adaptive Computing’s MOAB team to test both bare-metal provisioning and virtual machine provisioning through the MOAB batch queue interface at NERSC. Our early evaluation provides an alternate model for providing cloud services to HPC center users allowing them to benefit from customized environments while leveraging many of the services they are already used to such as high bandwidth low latency interconnects, access to high performance file systems, access to archival storage. Juniper 10GigE. Recent cloud offerings such as Amazon’s Cluster Compute instances are based on 10GigE networking infrastructure. The Magellan team at NERSC worked closely with Juniper to evaluate their 10GigE infrastructure on a subset of the Magellan testbed. Detailed benchmarking evaluation using both bare-metal and virtualization was performed and is detailed in Chapter 9. IBM GPFS-SNC. Hadoop and Hadoop Distributed File System show the importance of data locality in file systems when handling workloads with large data volumes. However HDFS does not have a POSIX interface which is a significant challenge for legacy scientific applications. Alternate storage architecture implementations such as IBM’s General Parallel File System - Shared Nothing Cluster (GPFS-SNC), a distributed shared-nothing file system architecture, provides many of the features of HDFS such as data locality and data replication while preserving the POSIX IO interface. The Magellan team at NERSC worked closely with the IBM Almaden research team to install and test an early version of GPFS-SNC on Magellan hardware. Storage architectures such as GPFS-SNC hold promise for scientific applications but a more detailed benchmarking effort will be needed which is outside of the scope of Magellan. User Education and Support. User education and support have been critical to the success of the project. Both sites were actively involved in providing user education at workshops and through other forums. Additionally, Magellan project personnel engaged heavily with user groups to help them in their evaluation of the cloud infrastructure. In addition, the NERSC project team did an initial requirements gathering survey. At the end of the project, user experiences from both sites were gathered through a survey and case studies, that are described in Chapter 11. 12
Magellan Final Report 3.2 Advanced Networking Initiative SC11 Core + Booths SC11-ANI LBL Booth Canarie 100G Ciena provided 6500 9 X 100G Ciena Ciena ESnet provided provided provided ALU 7750-SR CPL CPL w/ 10 100GEs in SCinet NOC Convention Westin Managed by SCinet Center Building 3x40 SEAT OME 6500 wave carried across CPL system. No REGEN in SEAT or on either CPL system. ANL-ANI ANL 100G Nx10 2x10 Magellan ESnet SC11 WAN Version 18.0 11/22/2011 10:56:26 AM Joe Metzger 100G ESnet MANLAN 4x10 100G Internet2 T1600 NEWY AOFA CalTech Booth 100G AOFA-ANI 100G BOIS ANL EQX 100G CHIC WASH SUNN 100G 100G RENO SALT ECHO GOOD KANS STLO 100G 100G Internet2 T1600 CHIC SALT-ANI ESnet 2x10 12x10 STAR 100G STAR-ANI INDY I2 Dark Fiber 2x10 I2 Dark ESnet Fiber 40GE Optics TBD Starlight Ciena 5410 CLEV 100G IU LOUI 100G CHAT CHAT Level3 100G Internet2 T1600 WASH 100G ORNL ORNL Level3 WIX NEWG NASA MAX Goddard 100G Regen Sites SUNN-ANI NERSC-ANI NASA/iCAIR 4x10g Caltec S4810 Force10 In Starlight 100G Add/Drop Site NERSC 100G 100G 2x10 ESnet Michigan Lambda Rail Adva ANI Wave Internet2 Wave ESnet ALU 7750-SR12 SC Wave NERSC Nx10 Magellan DWDM Line Side 100G LR4 Client Side Internet2 Brocade MLXe 100G LR10 Client Side N by 10GE Tuesday, November 22, 2011 ESnet5-2011-11-22 ANI and SC11 Internet2 1 Juniper T1600 Figure 3.1: Network diagram for the Advanced Networking Initiative during SC11. This includes the Magellan systems at NERSC and ANL(ALCF). The Advanced Networking Initiative, ANI, is another American Recovery and Reinvestment Act funded ASCR project. The goal of ANI is to help advance the availability of 100Gb networking technology. The project has three major sub-projects: deployment of a 100Gb prototype network, the creation of a network testbed, and the ANI research projects. The latter will attempt to utilize the network and testbed to demonstrate how 100Gb networking technology can help advance scientific projects in areas such as climate research and High-Energy Physics. The Magellan project is supporting ANI by providing critical end points on the network to act as data producers and consumers. ANI research projects such as Climate 100 and OSG used Magellan for demonstrations at SC11 in November 2011. These demonstrations achieved speeds up to 95 Gbps. Please see the ANI Milestone report for additional details about this project and the demonstrations. 3.3 Summary of Project Activities The remainder of this report is structured around the key project activities and is focused on the research questions outlined in Chapter 1. • The NERSC survey and requirements gathering from users is summarized in Chapter 4. • Chapter 5 details the testbed configuration at the two sites. The flexible hardware and software stack that is aimed at addressing the suitability of cloud computing to meet the unique needs of science users is highlighted. • In recent years, a number of private cloud software solutions have emerged. Magellan started with Eucalyptus 1.6.2, but over the course of the project worked with Eucalyptus 2.0, OpenStack, and Nimbus. The features and our experiences with each of these stacks are compared and contrasted in Chapter 6. 13
Page 1 and 2: The Magellan Report on Cloud Comput
Page 3 and 4: Executive Summary The goal of Magel
Page 5 and 6: Key Findings The goal of the Magell
Page 7 and 8: Magellan Final Report Finding 8. DO
Page 9 and 10: Magellan Final Report role in addre
Page 11 and 12: Contents Executive Summary Key Find
Page 13 and 14: Magellan Final Report 9.7 Discussio
Page 15 and 16: Chapter 1 Overview Cloud computing
Page 17 and 18: Magellan Final Report • The Argon
Page 19 and 20: Chapter 2 Background The term “cl
Page 21 and 22: Magellan Final Report 2.1.4 Hardwar
Page 23 and 24: Magellan Final Report Table 3.1: Ke
Page 25: Magellan Final Report Little Magell
Page 29 and 30: Chapter 4 Application Characteristi
Page 31 and 32: Magellan Final Report Table 4.1: Pe
Page 33 and 34: Magellan Final Report Output data
Page 35 and 36: Magellan Final Report of the pipeli
Page 37 and 38: Chapter 5 Magellan Testbed As part
Page 39 and 40: Magellan Final Report Figure 5.1: P
Page 41 and 42: Magellan Final Report Figure 5.2: P
Page 43 and 44: Magellan Final Report NERSC deploye
Page 45 and 46: Magellan Final Report Figure 6.1: A
Page 47 and 48: Magellan Final Report greater than
Page 49 and 50: Magellan Final Report specific QoS
Page 51 and 52: Magellan Final Report configuration
Page 53 and 54: Magellan Final Report 7.4 Summary U
Page 55 and 56: Magellan Final Report Firewalls are
Page 57 and 58: Magellan Final Report Aside from le
Page 59 and 60: Magellan Final Report 9.1 Understan
Page 61 and 62: Magellan Final Report grid) on 256
Page 63 and 64: Magellan Final Report Table 9.1: HP
Page 65 and 66: Magellan Final Report 25  Ping 
Page 67 and 68: Magellan Final Report 100  12 
Page 69 and 70: Magellan Final Report case of GTC,
Page 71 and 72: Magellan Final Report 1.4 IB TCPo
Page 73 and 74: Magellan Final Report only affects
Page 75 and 76: Magellan Final Report Figure 9.11:
Page 77 and 78:
Magellan Final Report charted as a
Page 79 and 80:
Magellan Final Report Evaluation Cr
Page 81 and 82:
Magellan Final Report Write Perform
Page 83 and 84:
Magellan Final Report 3500 3000 G
Page 85 and 86:
Magellan Final Report Histogram Plo
Page 87 and 88:
Magellan Final Report SATA devices.
Page 89 and 90:
Magellan Final Report MB/s Virident
Page 91 and 92:
Magellan Final Report and the perfo
Page 93 and 94:
Magellan Final Report (a) Hosts (b)
Page 95 and 96:
Magellan Final Report Routing IP pa
Page 97 and 98:
Chapter 10 MapReduce Programming Mo
Page 99 and 100:
Magellan Final Report 10.3 Hadoop E
Page 101 and 102:
Magellan Final Report 35000  3500
Page 103 and 104:
Magellan Final Report summarize som
Page 105 and 106:
Magellan Final Report Processing ti
Page 107 and 108:
Magellan Final Report in the networ
Page 109 and 110:
Magellan Final Report Workload Patt
Page 111 and 112:
Magellan Final Report This benchmar
Page 113 and 114:
Magellan Final Report Task Tracker
Page 115 and 116:
Magellan Final Report processing ti
Page 117 and 118:
Magellan Final Report Using ESnet
Page 119 and 120:
Magellan Final Report Figure 11.2:
Page 121 and 122:
Magellan Final Report data collecte
Page 123 and 124:
Magellan Final Report comparison to
Page 125 and 126:
Magellan Final Report 11.2.5 Integr
Page 127 and 128:
Magellan Final Report very large (4
Page 129 and 130:
Magellan Final Report for optimizat
Page 131 and 132:
Magellan Final Report One of the ad
Page 133 and 134:
Magellan Final Report commercial cl
Page 135 and 136:
Magellan Final Report Table 12.2: H
Page 137 and 138:
Magellan Final Report Cost per TF t
Page 139 and 140:
Magellan Final Report Productivity.
Page 141 and 142:
Magellan Final Report compute insta
Page 143 and 144:
Chapter 13 Conclusions Cloud comput
Page 145 and 146:
Magellan Final Report Inherently, t
Page 147 and 148:
Bibliography [1] G. Aldering, G. Ad
Page 149 and 150:
Magellan Final Report [30] I. Foste
Page 151 and 152:
Magellan Final Report [67] M. Palan
Page 153 and 154:
Appendix A Publications Selected Pr
Page 155 and 156:
Magellan Final Report Magellan Rese
Page 157 and 158:
Magellan Final Report Selected Mage
Page 159 and 160:
Appendix B Surveys B1
Page 161 and 162:
• Nuclear Physics - Accelarator P
Page 163 and 164:
Allow users to edit responses. What
Page 165 and 166:
Amazon Eucalyptus OpenStack Other:
Page 167 and 168:
Please list any publications/report
Page 169 and 170:
Hadoop Streaming Hadoop Native Prog
show all

Magellan Final Report - Office of Science - U.S. Department of Energy

Create successful ePaper yourself

Delete template?

Save as template?