12.07.2015 Views

Dell Power Solutions

Dell Power Solutions

Dell Power Solutions

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

HIGH-PERFORMANCE COMPUTINGPlanning Considerations forJob Scheduling in HPC ClustersAs cluster installations continue growing to satisfy ever-increasing computing demands,advanced schedulers can help improve resource utilization and quality of service. Thisarticle discusses issues related to job scheduling on clusters and introduces schedulingalgorithms to help administrators select a suitable job scheduler.BY SAEED IQBAL, PH.D.; RINKU GUPTA; AND YUNG-CHIN FANGCluster installations primarily comprise two types ofstandards-based hardware components—servers andnetworking interconnects. Clusters are divided into twomajor classes: high-throughput computing clusters andhigh-performance computing clusters. High-throughputcomputing clusters usually connect a large number ofnodes using low-end interconnects. In contrast, highperformancecomputing clusters connect more powerfulcompute nodes using faster interconnects than highthroughputcomputing clusters. Fast interconnects aredesigned to provide lower latency and higher bandwidththan low-end interconnects.These two classes of clusters have different schedulingrequirements. In high-throughput computing clusters, themain goal is to maximize throughput—that is, jobs completedper unit of time—by reducing load imbalance amongcompute nodes in the cluster. Load balancing is particularlyimportant if the cluster has heterogeneous compute nodes.In high-performance computing clusters, an additional considerationarises: the need to minimize communicationoverhead by mapping applications appropriately to theavailable compute nodes. High-throughput computing clustersare suitable for executing loosely coupled parallel ordistributed applications, because such applications do nothave high communication requirements among computenodes during execution time. High-performance computingclusters are more suitable for tightly coupled parallelapplications, which have substantial communication andsynchronization requirements.A resource management system manages the processingload by preventing jobs from competing with eachother for limited compute resources. Typically, a resourcemanagement system comprises a resource manager and ajob scheduler (see Figure 1). Most resource managers havean internal, built-in job scheduler, but system administratorscan usually substitute an external scheduler for theinternal scheduler to enhance functionality. In either case,the scheduler communicates with the resource manager toobtain information about queues, loads on compute nodes,and resource availability to make scheduling decisions.Usually, the resource manager runs several daemonson the master node and compute nodes including a schedulerdaemon, which typically runs on the master node. Theresource manager also sets up a queuing system for usersto submit jobs—and users can query the resource managerto determine the status of their jobs. In addition, a resourcemanager maintains a list of available compute resourcesand reports the status of previously submitted jobs to theuser. The resource manager helps organize submitted jobsbased on priority, resources requested, and availability.As shown in Figure 1, the scheduler receives periodicinput from the resource manager regarding job queues andavailable resources, and makes a schedule that determinesthe order in which jobs will be executed. This is done whilewww.dell.com/powersolutions Reprinted from <strong>Dell</strong> <strong>Power</strong> <strong>Solutions</strong>, February 2005. Copyright © 2005 <strong>Dell</strong> Inc. All rights reserved. POWER SOLUTIONS 133

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!