10.07.2015 Views

An Architecture for Dynamic Allocation of Compute Cluster Bandwidth

An Architecture for Dynamic Allocation of Compute Cluster Bandwidth

An Architecture for Dynamic Allocation of Compute Cluster Bandwidth

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8backend services must be created <strong>for</strong> every transfer request,the maximum amount <strong>of</strong> latency is added to every connection.Conversely, since backend services are never idle, zero idleprovides the best possible case when minimizing idle transferresources, and infinite idle is the worst possible case. Thesetwo policies are base cases <strong>for</strong> evaluating the architecture.The N-idle policy is a hybrid <strong>of</strong> zero-idle and infinite-idle. Itattempts to optimize connection times by having backendservices immediately available <strong>for</strong> transfers while at the sametime putting an upper bound on idle resources. While thenumber <strong>of</strong> idle backend services is bounded, the number <strong>of</strong>working nodes is not, and thus the policy can scale up tosupport high transfer loads. Because the value <strong>of</strong> N isconfigurable, the site administrator can decide on anappropriate balance between idle resources and lowconnection times.Administrators may want more sophisticated policies toaddress specific needs. Our architecture provides access toin<strong>for</strong>mation that allows arbitrarily complex policies to bedefined, based on intimate knowledge <strong>of</strong> the transfer serviceand a site’s history. We leave the exploration <strong>of</strong> such policies<strong>for</strong> future work.C. MetricsWe use three criteria to evaluate the system: site bandwidthutilization, connection time, and user bandwidth. The mostimportant metric is site bandwidth utilization. Our main goal indefining this architecture is to use idle network cycles. If wecan provide transfer services with access to large amounts <strong>of</strong>bandwidth, we believe existing applications will make use <strong>of</strong> itand additional use cases will be discovered, and thus we aresuccessful. In our experiments we measure bandwidth byrunning daemons on the client machines. These daemons readthe raw network usage from the Linux /proc filesystem [39]and use that in<strong>for</strong>mation to track the number <strong>of</strong> bytes themachine has transferred over the life time <strong>of</strong> our experiments.We take samples every second and then post process theresults to calculate bandwidth over the experiments timeinterval. Because only our jobs are running on the clientmachines we know the bytes transferred are a result <strong>of</strong> ourexperiments. Our goal is to utilize all available bandwidth sowe must count bytes sent under the IP protocol stack. We arenot measuring end to end throughput. If resends are needed byTCP due to packed loss we count the resends as additionalbytes. Our results would otherwise appear to have used lessnetwork resources than were actually used. This issue has verylittle effect because all experiments are done on an underutilized LAN where packet loss is infrequent.The second metric is connection time. We define connectiontime as the latency from when a client first connects to thetransfer service until the time the transfer begins. Be<strong>for</strong>e atransfer can begin the frontend must find an available backendservice. In the worst case, this time will include the latency <strong>of</strong>the WSRF notification message updating the controller that thefrontend services state changed, the entire length <strong>of</strong> the LRM’swait queue, and the latency <strong>of</strong> the backend service request tothe LRM. This time has the potential to be high and could havea significant effect on overall system throughput. Further,while this architecture is targeted at batch transfers, sites maywish to support interactive transfer services as well, in whichcase connection times need to be low. We measure theconnection time with a plug in to globus-url-copy. The timeinterval begins immediately be<strong>for</strong>e the TCP connect functionis called, and ends when FTP login message “230 User loggedin” [40] is received.The last metric is achieved user bandwidth. Our goal is toprovide a transfer service that can exploit large amounts <strong>of</strong> idlenetwork cycles. In order <strong>for</strong> this service to be useful, we musthave applications that wish to use it. To make the serviceattractive to applications we must be able to service eachtransfer request with enough bandwidth to satisfy its needs.How much bandwidth is needed is a subjective measure.Ideally the application would be the bottleneck in the transferand not the transfer service. As 100 Mbit/s Ethernet cardsbecome outdated and Gbit/s Ethernet cards become standard,even on laptops, we aim to provide near Gbit/s transfers toclients.VI. RESULTSWe present results <strong>for</strong> five sets <strong>of</strong> experiments using oursimple LRM. The first experiment evaluates the validity <strong>of</strong> theLRM we used by comparing it to the PBS system used on theUC TeraGrid. We then measure the connection time as afunction <strong>of</strong> increasingly heavy loads. This experiment isdesigned to analyze the overhead introduced by our system.We similarly measure the connection time to see how the LRMwait queue times affect it. Following this we measure how ourvarious policies behave under lighter, and more naturallyoccurring, client loads. The final experiment shows thebandwidth that our system can consume and the throughput itcan deliver to individual clients.In all experiments we took care to prevent any one iteration<strong>of</strong> tests from affecting the next iteration. For the zero idle andinfinite idle policies this is not a problem. Those two basicpolicies are not affected by previous client access patterns.However, the n idle policy can be greatly affected by previousaccess patterns. If a previous test is still holding a backendservice, or if the backend service that was started to replace aused backend service has not yet begun, our results could becorrupted. To protect against this, we verified that the systemwas in a stable state be<strong>for</strong>e running any iteration <strong>of</strong> our tests.When one test finished we waited <strong>for</strong> all resources associatedwith previous transfers to be cleaned up and <strong>for</strong> all requestedbacked services to start be<strong>for</strong>e we began the next iteration <strong>of</strong>tests.A. Simple LRM ValidationOur first results validate our simple queuing system. Sincewe intend that our architecture work with standard and widelyused LRMs such as PBS, we must show that our systemprovides a reasonable approximation <strong>of</strong> the behavior <strong>of</strong> astandard system like PBS. The difference in which we are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!