Magellan Final Report - Office of Science - U.S. Department of Energy

More documents

Recommendations

Info

Magellan Final Report challenges. Since each node could potentially run many virtual machine instances, the file system could see a multiplier of file system clients. If each node was running eight small instances (one on each core), then the file system would have to deal with eight times more clients. Furthermore, virtual machines are often terminated instead of performing a clean shutdown. This could lead to clients frequently joining and leaving the file system cluster. File systems like GPFS and Lustre employ timeout and heartbeat methods that assume a relatively stable client pool. Clients randomly disappearing could lead to long hangs while outstanding locks held by terminated instances were timed out. There are several potential methods to address these issues. One would be to project only the portions of the file system that the user owns into the virtual machine. This could be done using protocols like NFS and re-exporting the parallel file system. This option was explored by ALCF Magellan staff, but ran into challenges with Linux kernel and I/O libraries. Another approach would be to forward the file I/O operations to a proxy server that would have the file systems mounted. The operations would then be performed on the proxy server as the user who owned the virtual machine instance. The standard file system client would enforce the access controls. There are existing file system modules that use the Linux FUSE file system interface to forward I/O over connections like SSH. The performance over SSH would be poor, but would at least provide access to data. This method was used by the ALCF/LCRC project that built an extension of the Argonne LCRC Fusion within the OpenStack cloud and worked reasonably well. Alternatively, the I/O Forwarding Scalability Layer (IOFSL) project [2] is developing a high-performance I/O forwarding layer that could potentially help. While IOFSL is focused on developing an I/O forwarding system to support ultra-scale HPC systems, the same mechanisms can be used for virtual machines. These solutions would also help mitigate the scaling and stability challenges, since they would reduce the number of real clients handled by the file system and would act as a firewall between the virtual machines and the file system. 9.8 Summary Applications with minimal communication and I/O perform well in cloud environments. Our evaluation shows that there is a virtualization overhead that is likely to get better with ongoing research. But the primary performance impact for HPC applications comes from the absence of high-bandwidth, low-latency interconnects in current virtual machines. Thus a majority of current DOE HPC applications will not run efficiently in today’s cloud environments. Our results show that the difference between interconnects is pronounced even at mid-range concurrencies. Similarly there is an I/O performance issue on virtual machines, and the absence of high-performance file systems further impacts the productivity of running scientific applications within the cloud environment. 82
Chapter 10 MapReduce Programming Model The MapReduce programming [13] was developed by Google to address its large scale data processing requirements. Just reading terabytes of data can be overwhelming. Thus, the MapReduce programming model is based on the idea of achieving linear scalability by using clusters of standard commodity computers. Nutch, an open source search project, that was facing similar scalability challenges implemented the ideas described in the MapReduce [13] and the Google File System papers [33]. The larger applicability of using MapReduce and the distributed file systems for other projects was recognized and Hadoop was split off as a separate project from Nutch in 2006. MapReduce and Hadoop have gained significant traction in the last few years as a means of processing large volumes of data. The MapReduce programming model consists of two functions familiar to functional programming, map and reduce, that are each applied in parallel to a data block. The inherent support for parallelism built into the programming model enables horizontal scaling and fault-tolerance. The opensource implementation Hadoop is now widely used by global internet companies such as Facebook, LinkedIn, Netflix, etc. Teams have developed components on top of Hadoop resulting in a rich ecosystem for web applications and data-intensive computing. A number of scientific applications have characteristics in common with MapReduce/Hadoop jobs. A class of scientific applications which employ a high degree of parallelism or need to operate on large volumes of data might benefit from MapReduce and Hadoop. However, it is not apparent that the MapReduce and Hadoop ecosystem is sufficiently flexible to be used for scientific applications without significant development overheads. Recently, there have been a number of implementations of MapReduce and similar data processing tools [21, 25, 36, 45, 59, 73, 87]. Apache Hadoop was the most popular implementation of MapReduce at the start of the Magellan project and it continuous to gain traction in various communities. It has evolved rapidly as the leading platform and has spawned an entire ecosystem of supporting products. Thus we use Hadoop, as a representative MapReduce implementation for core Magellan efforts in this area. We have also collaborated with other groups to compare different MapReduce implementations (Section 10.6.2). Additionally, Magellan staff were also involved with an alternative implementation of MapReduce (described in Section 10.6.3). In Sections 10.1, 10.2 and 10.3 we provide an overview of MapReduce, Hadoop and Hadoop Ecosystem. In Section 10.4 we describe our experiences with porting existing applications to Hadoop using the Streaming model. In Section 10.5 we describe our benchmarking effort with Hadoop and Section 10.6 describes other collaborative efforts. We present a discussion of the use of Hadoop for scientific applications in Section 10.7. Additional case studies using Hadoop are described in Chapter 11. 83
Page 1 and 2:
The Magellan Report on Cloud Comput
Page 3 and 4:
Executive Summary The goal of Magel
Page 5 and 6:
Key Findings The goal of the Magell
Page 7 and 8:
Magellan Final Report Finding 8. DO
Page 9 and 10:
Magellan Final Report role in addre
Page 11 and 12:
Contents Executive Summary Key Find
Page 13 and 14:
Magellan Final Report 9.7 Discussio
Page 15 and 16:
Chapter 1 Overview Cloud computing
Page 17 and 18:
Magellan Final Report • The Argon
Page 19 and 20:
Chapter 2 Background The term “cl
Page 21 and 22:
Magellan Final Report 2.1.4 Hardwar
Page 23 and 24:
Magellan Final Report Table 3.1: Ke
Page 25 and 26:
Magellan Final Report Little Magell
Page 27 and 28:
Magellan Final Report 3.2 Advanced
Page 29 and 30:
Chapter 4 Application Characteristi
Page 31 and 32:
Magellan Final Report Table 4.1: Pe
Page 33 and 34:
Magellan Final Report Output data
Page 35 and 36:
Magellan Final Report of the pipeli
Page 37 and 38:
Chapter 5 Magellan Testbed As part
Page 39 and 40:
Magellan Final Report Figure 5.1: P
Page 41 and 42:
Magellan Final Report Figure 5.2: P
Page 43 and 44:
Magellan Final Report NERSC deploye
Page 45 and 46: Magellan Final Report Figure 6.1: A
Page 47 and 48: Magellan Final Report greater than
Page 49 and 50: Magellan Final Report specific QoS
Page 51 and 52: Magellan Final Report configuration
Page 53 and 54: Magellan Final Report 7.4 Summary U
Page 55 and 56: Magellan Final Report Firewalls are
Page 57 and 58: Magellan Final Report Aside from le
Page 59 and 60: Magellan Final Report 9.1 Understan
Page 61 and 62: Magellan Final Report grid) on 256
Page 63 and 64: Magellan Final Report Table 9.1: HP
Page 65 and 66: Magellan Final Report 25  Ping 
Page 67 and 68: Magellan Final Report 100  12 
Page 69 and 70: Magellan Final Report case of GTC,
Page 71 and 72: Magellan Final Report 1.4 IB TCPo
Page 73 and 74: Magellan Final Report only affects
Page 75 and 76: Magellan Final Report Figure 9.11:
Page 77 and 78: Magellan Final Report charted as a
Page 79 and 80: Magellan Final Report Evaluation Cr
Page 81 and 82: Magellan Final Report Write Perform
Page 83 and 84: Magellan Final Report 3500 3000 G
Page 85 and 86: Magellan Final Report Histogram Plo
Page 87 and 88: Magellan Final Report SATA devices.
Page 89 and 90: Magellan Final Report MB/s Virident
Page 91 and 92: Magellan Final Report and the perfo
Page 93 and 94: Magellan Final Report (a) Hosts (b)
Page 95: Magellan Final Report Routing IP pa
Page 99 and 100: Magellan Final Report 10.3 Hadoop E
Page 101 and 102: Magellan Final Report 35000  3500
Page 103 and 104: Magellan Final Report summarize som
Page 105 and 106: Magellan Final Report Processing ti
Page 107 and 108: Magellan Final Report in the networ
Page 109 and 110: Magellan Final Report Workload Patt
Page 111 and 112: Magellan Final Report This benchmar
Page 113 and 114: Magellan Final Report Task Tracker
Page 115 and 116: Magellan Final Report processing ti
Page 117 and 118: Magellan Final Report Using ESnet
Page 119 and 120: Magellan Final Report Figure 11.2:
Page 121 and 122: Magellan Final Report data collecte
Page 123 and 124: Magellan Final Report comparison to
Page 125 and 126: Magellan Final Report 11.2.5 Integr
Page 127 and 128: Magellan Final Report very large (4
Page 129 and 130: Magellan Final Report for optimizat
Page 131 and 132: Magellan Final Report One of the ad
Page 133 and 134: Magellan Final Report commercial cl
Page 135 and 136: Magellan Final Report Table 12.2: H
Page 137 and 138: Magellan Final Report Cost per TF t
Page 139 and 140: Magellan Final Report Productivity.
Page 141 and 142: Magellan Final Report compute insta
Page 143 and 144: Chapter 13 Conclusions Cloud comput
Page 145 and 146: Magellan Final Report Inherently, t
Page 147 and 148:
Bibliography [1] G. Aldering, G. Ad
Page 149 and 150:
Magellan Final Report [30] I. Foste
Page 151 and 152:
Magellan Final Report [67] M. Palan
Page 153 and 154:
Appendix A Publications Selected Pr
Page 155 and 156:
Magellan Final Report Magellan Rese
Page 157 and 158:
Magellan Final Report Selected Mage
Page 159 and 160:
Appendix B Surveys B1
Page 161 and 162:
• Nuclear Physics - Accelarator P
Page 163 and 164:
Allow users to edit responses. What
Page 165 and 166:
Amazon Eucalyptus OpenStack Other:
Page 167 and 168:
Please list any publications/report
Page 169 and 170:
Hadoop Streaming Hadoop Native Prog
show all

Magellan Final Report - Office of Science - U.S. Department of Energy

Create successful ePaper yourself

Delete template?

Save as template?