Proceedings of CSAW'04 - FTP Directory Listing - University of Malta

More documents

Recommendations

Info

Low-Latency Message Passing Over Gigabit Ethernet Clusters Keith Fenech Department of Computer Science and AI, University of Malta Abstract. As Ethernet hardware bandwidth increased to Gigabit speeds it became evident that it was difficult for conventional messaging protocols to deliver this performance to the application layer. Kernel based protocols such as TCP/IP impose a significant load on the host processor in order to service incoming packets and pass them to the application layer. Under heavy loads this problem can also lead to the host processor being completely used up for processing incoming messages, thus starving host applications of CPU resources. Another problem suffered by inter-process communication using small messages is the latency imposed by memory-to-memory copying in layered protocols as well as the slow context switching times in kernel-level schedulers required for servicing incoming interrupts. All this has put pressure on messaging software which led to the development of several lower latency userlevel protocols specifically adapted to high-performance networks (see U-Net[18], EMP[16], VIA[3], QsNET[15], Active Messages[19], GM[13], FM[14]). The aim of this paper is to investigate the issues involved in building high performance cluster messaging systems. We will also review some of the more prominent work in the area as well as propose a low-overhead low-latency messaging system to be used by a cluster of commodity platforms running over Gigabit Ethernet. We propose to use the programmable Netgear GA620-T NICs and modify their firmware to design a lightweight reliable OS-bypass protocol for message passing. We propose the use of zero-copy and polling techniques in order to keep host CPU utilization to a minimum whilst obtaining the maximum bandwidth possible. 1 Introduction The performance of off-the-shelf commodity platforms has increased multi-fold over the past two decades. The same can be said for the continuous increase in demand for computational speed by modern applications, especially in areas involving scientific and engineering problem simulations and numerical modeling. In most cases such computation can be performed by multi-threaded applications running in parallel on separate processors. The increased availability of high bandwidth networking hardware which is currently capable of transmitting at gigabit/sec speeds has enabled the development of Networks of Workstations (NOWs) [5] which are turning out to be more affordable replacements to single large-scale computers. The inclination towards the use of clustered computing for high performance computation has put forward new requirements that cannot be delivered by conventional communication systems used in commodity platforms. The major challenge in using NOWs for parallel computation is that the communication architecture should be as transparent as possible to the user-level applications and must not act as a performance bottleneck. Thus, the aim is to provide a high-throughput and low-latency message passing mechanism that enables fine grain multi-threaded applications to communicate efficiently whilst keeping processing overhead on host machines to a minimum.
Low-Latency Message Passing Over Gigabit Ethernet Clusters 43 Although commodity networks are still dominated by IP over Ethernet, these protocols were not intended for networks running at gigabit speeds due to several inefficiencies that will be discussed later. This led to the development of other interconnect architectures such as Myrinet [2], SCI [10], Giganet cLAN [7], Quadrics [15] and Infiniband [1] which were targeted towards achieving higher bandwidth reliable communication without incurring the processing overheads of slower protocols. Our work will be geared towards commodity platforms running over a Gigabit Ethernet architecture, since these are one of the most common and affordable cluster architectures available. 2 High Performance Networking Issues Various performance issues arise when utilizing the standard Ethernet framework for gigabit communication within a cluster of commodity platforms. Although modern Ethernet hardware promises gigabit line-rates, this performance is hardly ever obtainable at the application layers. This is due to several overheads imposed by interrupt generation [8] as well as conventional layered communication protocols such as TCP/IP which although provide reliable messaging, also add significant protocol processing overheads to high bandwidth applications. First of all we will take a look at the steps involved in sending messages over a network using conventional NICs and generic protocols such as TCP/IP. 2.1 Conventional Ethernet NICs Conventional network cards are cheap and usually characterised by little onboard memory and limited onboard intelligence which means that most of the protocol processing must be accomplished by the host. When the Ethernet controller onboard the NIC receives Ethernet frames, it verifies that the frames are valid (using CRC) and filters them (using the MAC destination address). When a valid network packet is accepted, the card generates a hardware interrupt to notify the host that a packet has been received. Since hardware interrupts are given a high priority, the host CPU suspends its current task and invokes the interrupt handling routine that will check the network layer protocol header (in our case being IP). After verifying that the packet is valid (using a checksum) it also checks whether that particular IP datagram is destined for this node or whether it should be routed using IP routing tables. Valid IP datagrams are stored in a buffer (IP queue) on host memory and any fragmented IP datagrams are reassembled in the buffer before calling the appropriate transport layer protocol functions (such as TCP/UDP). Passing IP datagrams to upper layer protocols is done by generating a software interrupt and usually this is followed by copying data from the IP buffer to the TCP/UDP buffer. 2.2 Problems with Conventional Protocols Since traditional operating systems implement TCP/IP at the kernel level, whenever an interrupt is generated the host would incur a vertical switch from the user to the kernel level in order to service the interrupt. This means that for each packet received the host would experience several switching overheads. Also due to the layered protocol approach, each packet would require several memory-to-memory buffer copies before it reaches the application layer. Both memory-to-memory copies as well as context switching (between kernel and user space) are expensive to the host CPU which means that the higher the rate of incoming packets, the more host processing power is required to service these packets.
Page 1: TECHNICAL REPORT Report No. CSAI200
Page 5: Preface Last year, the Department o
Page 8 and 9: IV Non-Monotonic Search Strategies
Page 10 and 11: 2 Ernest Cachia and Mark Micallef 2
Page 14 and 15: 6 Ernest Cachia and Mark Micallef W
Page 18 and 19: 10 Andrew Calleja phenomenon which
Page 20 and 21: 12 Andrew Calleja of a hydrological
Page 22 and 23: 14 Andrew Calleja will take longer
Page 24 and 25: 16 Andrew Calleja volume of water t
Page 26 and 27: 18 Andrew Calleja as a thin sheet o
Page 28 and 29: 20 Michel Camilleri where a is the
Page 30 and 31: 22 Michel Camilleri If separation o
Page 32 and 33: 24 Michel Camilleri be high. One mu
Page 34 and 35: 26 Michel Camilleri 7 Prediction Te
Page 36 and 37: 28 Michel Camilleri References 1. H
Page 38 and 39: 30 Joseph Cordina 2 Model-Checking
Page 40 and 41: 32 Joseph Cordina memory location,
Page 42 and 43: 34 Joseph Cordina PROVE mutex: G ~(
Page 44 and 45: TTS Pre-processing Issues for Mixed
Page 46 and 47: 38 Paulseph-John Farrugia of affair
Page 48 and 49: 40 Paulseph-John Farrugia corpora a
Page 52 and 53: 44 Keith Fenech Servicing these int
Page 54 and 55: 46 Keith Fenech 2.4 Short messaging
Page 56 and 57: 48 Keith Fenech 4. Joseph Cordina.
Page 58 and 59: 50 Matthew Montebello - Discovery:
Page 60 and 61: Automatic Document Clustering Using
Page 62 and 63: 54 Robert Muscat Average-Link The d
Page 64 and 65: 56 Robert Muscat 3.1 Clustering Met
Page 66 and 67: 58 Robert Muscat The most critical
Page 68 and 69: Monadic Compositional Parsing with
Page 70 and 71: 62 Gordon J. Pace one :: Parser a a
Page 72 and 73: 64 Gordon J. Pace Indefinite nouns
Page 74 and 75: 66 Gordon J. Pace case (parse (star
Page 76 and 77: 68 Gordon J. Pace setAttribute :: (
Page 78 and 79: 70 Gordon J. Pace The work presente
Page 80 and 81: 72 Derrick Pisani The rank of a mat
Page 82 and 83: 74 Derrick Pisani 4 Applications in
Page 84 and 85: 76 Derrick Pisani Exposing Componen
Page 86 and 87: Corpus-Driven Bilingual Lexicon Ext
Page 88 and 89: 80 Michael Rosner Interlingua inter
Page 90 and 91: 82 Michael Rosner - Subsegments are
Page 92 and 93: 84 Michael Rosner 3.3 Word Alignmen
Page 94 and 95: Visual Modeling of OWL-S Services J
Page 96 and 97: 88 James Scicluna, Charlie Abela an
Page 98 and 99: 90 James Scicluna, Charlie Abela an
Page 100 and 101:
92 James Scicluna, Charlie Abela an
Page 102 and 103:
94 James Scicluna, Charlie Abela an
Page 104 and 105:
96 Sandro Spina and John Abela 2.1
Page 106 and 107:
98 Sandro Spina and John Abela Both
Page 108 and 109:
100 Sandro Spina and John Abela 2.
Page 110 and 111:
102 Kenneth J. Spiteri possibility
Page 112 and 113:
104 Kenneth J. Spiteri 22. Romin Ir
Page 114 and 115:
Expanding Query Terms in Context Ch
Page 116 and 117:
108 Chris Staff and Robert Muscat w
Page 118 and 119:
110 Nikolai Sultana by another rema
Page 120 and 121:
112 Nikolai Sultana Source code Sou
Page 122 and 123:
114 Nikolai Sultana A variety of to
Page 124 and 125:
116 Nikolai Sultana This visualisat
Page 126 and 127:
118 Nikolai Sultana Dealing with wi
Page 128 and 129:
120 Christine Vella 1. SharpHDL, wh
Page 130 and 131:
122 Christine Vella 3 Generic Circu
Page 132 and 133:
124 Christine Vella Empty String, w
Page 134 and 135:
126 Christine Vella Fig. 7. Circuit
Page 136 and 137:
128 Christine Vella also used Sharp
Page 138 and 139:
130 Wallace Wadge Grid technologies
Page 141:
Author Index Abela, Charlie, 86 Abe
show all

Proceedings of CSAW'04 - FTP Directory Listing - University of Malta

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?