Dell Power Solutions

More documents

Recommendations

Info

HIGH-PERFORMANCE COMPUTINGmaintaining job priority in accordance with the site policy that theadministrator has established for the amount and timing of resourcesused to execute jobs. Based on that information, the scheduler decideswhich job will execute on which compute node and when.Understanding job scheduling in clustersWhen a job is submitted to a resource manager, the job waits ina queue until it is scheduled and executed. The time spent in thequeue, or wait time,depends on several factors including job priority,load on the system, and availability of requested resources.Turnaround time represents the elapsed time between when thejob is submitted and when the job is completed; turnaround timeincludes the wait time as well as the job’s actual execution time.Response time represents how fast a user receives a response fromthe system after the job is submitted.Resource utilization during the lifetime of the job represents theactual useful work that has been performed. System throughputtisdefined as the number of jobs completed per unit of time. Meanresponse time is an important performance metric for users, whoexpect minimal response time. In contrast, system administratorsare concerned with overall resource utilization because they wantto maximize system throughput and return on investment (ROI),especially in high-throughput computing clusters.In a typical production environment, many different jobs aresubmitted to clusters. These jobs can be characterized by factorssuch as the number of processors requested (also known as job size,or job width), estimated runtime, priority level, parallel or distributedexecution, and specific I/O requirements. During execution, largejobs can occupy significant portions of a cluster’s processing andmemory resources.System administrators can create several types of queues, eachwith a different priority level and quality of service (QoS). To makeintelligent schedules, however, schedulers need information regardingjob size, priority, expected execution time (indicated by the user),resource access permission (established by the administrator), andresource availability (automatically obtained by the scheduler).In high-performance computing clusters, the scheduling of paralleljobs requires special attention because parallel jobs compriseseveral subtasks. Each subtask is assigned to a unique computenode during execution and nodes constantly communicate amongUser 1User 2User 3User nUserssubmit jobsFigure 1. Typical resource management systemResource managerJob queueExternal jobschedulerInformation is sentto the resource managerInternal jobschedulerJobs areassigned tocompute nodesCompute node 1Compute node 2Compute node 3Compute node 4Compute node nthemselves during execution. The manner in which the subtasksare assigned to processors is called mapping. Because mappingaffects execution time, the scheduler must map subtasks carefully.The scheduler needs to ensure that nodes scheduled to executeparallel jobs are connected by fast interconnects to minimize theassociated communication overhead. For parallel jobs, the jobefficiency also affects resource utilization. To achieve high resourceutilization for parallel jobs, both job efficiency and advancedscheduling are required. Efficient job processing depends on effectiveapplication design.Under heavy load conditions, the capability to provide a fairportion of the cluster’s resources to each user is important. This capabilitycan be provided by using the fair-share strategy, in which thescheduler collects historical data from previously executed jobs anduses the historical data to dynamically adjust the priority of the jobsin the queue. The capability to dynamically make priority changeshelps ensure that resources are fairly distributed among users.Most job schedulers have several parameters that can beadjusted to control job queues and scheduling algorithms, thusproviding different response times and utilization percentages. Usually,high system utilization also means high average response timefor jobs—and as system utilization climbs, the average responsetime tends to increase sharply beyond a certain threshold. Thisthreshold depends on the job-processing algorithms and job profiles.In most cases, improving resource utilization and decreasing jobturnaround time are conflicting considerations. The challenge for ITorganizations is to maximize resource utilization while maintainingacceptable average response times for users.Figure 2 summarizes the desirable features of job schedulers.These features can serve as guidelines for system administrators asthey select job schedulers.Using job scheduling algorithmsThe parallel and distributed computing community has put substantialresearch effort into developing and understanding job schedulingalgorithms. Today, several of these algorithms have been implementedin both commercial and open source job schedulers. Scheduling algorithmscan be broadly divided into two classes: time-sharing andspace-sharing. Time-sharing algorithms divide time on a processorinto several discrete intervals, or slots. These slots are then assignedto unique jobs. Hence, several jobs at any given time can share thesame compute resource. Conversely, space-sharing algorithms givethe requested resources to a single job until the job completes execution.Most cluster schedulers operate in space-sharing mode.Common, simple space-sharing algorithms are first come, firstserved (FCFS); first in, first out (FIFO); round robin (RR); shortest jobfirst (SJF); and longest job first (LJF). As the names suggest, FCFSand FIFO execute jobs in the order in which they enter the queue.This is a very simple strategy to implement, and works acceptablywell with a low job load. RR assigns jobs to nodes as they arrive in134POWER SOLUTIONS Reprinted from Dell Power Solutions, February 2005. Copyright © 2005 Dell Inc. All rights reserved. February 2005
HIGH-PERFORMANCE COMPUTINGFeatureCommentsa. Jobs waitingin the queueBroad scopeThe nature of jobs submitted to a cluster can vary, so thescheduler must support batch, parallel, sequential, distributed,interactive, and noninteractive jobs with similar efficiency.Support for algorithmsThe scheduler should support numerous job-processingalgorithms—including FCFS, FIFO, SJF, LJF, advancereservation, and backfill. In addition, the scheduler should beable to switch between algorithms and apply differentalgorithms at different times—or apply different algorithmsto different queues, or both.b. The queue aftereach priority groupis sorted accordingto execution timeSorted high-priority jobsSorted low-priority jobsCapability to integratewith standard resourcemanagersSensitivity to computenode and interconnectarchitectureScalabilityThe scheduler should be able to interface with the resourcemanager in use, including common resource managers suchas Platform LSF, Sun Grid Engine, and OpenPBS (the original,open source version of Portable Batch System).The scheduler should match the appropriate compute nodearchitecture to the job profile—for example, by using computenodes that have more than one processor to provide optimalperformance for applications that can use the secondprocessor effectively.The scheduler should be capable of scaling to thousands ofnodes and processing thousands of jobs simultaneously.c. Longest job firstscheduled. Longest job firstand backfillscheduleComputenodesComputenodesTimeFair-share capabilityThe scheduler should distribute resources fairly under heavyconditions and at different times.TimeEfficiencyThe overhead associated with scheduling should be minimaland within acceptable limits. Advanced scheduling algorithmscan take time to run. To be efficient, the scheduling algorithmitself must spend less time running than the expected savingin application execution time from improved scheduling.e. Shortest job firstscheduleComputenodesDynamic capabilitySupport for preemptionThe scheduler should be able to add or remove computeresources to a job on the fly—assuming that the job canadjust and utilize the extra compute capacity.Preemption can occur at various levels; for example, jobs maybe suspended while running. Checkpointing—that is, thecapability to stop a running job, save the intermediate results,and restart the job later—can help ensure that results are notlost for very long jobs.f. Shortest job firstand backfillscheduleComputenodesTimeTimeFigure 2. Features of job schedulersFigure 3. Job scheduling algorithmsthe queue in a cyclical, round-robin manner. SJF periodically sortsthe incoming jobs and executes the shortest job first, allowing shortjobs to get a good turnaround time. However, this strategy may causedelays for the execution of long (large) jobs. In contrast, LJF commitsresources to longest jobs first. The LJF approach tends to maximizesystem utilization at the cost of turnaround time.Basic scheduling algorithms such as these can be enhancedby combining them with the use of advance reservation andbackfill techniques. Advance reservation uses execution timepredictions provided by the users to reserve resources (such asCPUs and memory) and to generate a schedule. The backfill techniqueimproves space-sharing scheduling. Given a schedule withadvance-reserved, high-priority jobs and a list of low-priority jobs,a backfill algorithm tries to fit the small jobs into schedulinggaps. This allocation does not alter the sequence of jobs previouslyscheduled, but improves system utilization by running lowpriorityjobs in between high-priority jobs. To use backfill, thescheduler requires a runtime estimate of the small jobs, which issupplied by the user when jobs are submitted.Figure 3 illustrates the use of the basic algorithms and theenhancements discussed in this article. Figure 3a shows a queuewith 11 jobs waiting; the queue has both high-priority and lowpriorityjobs. Figure 3b shows these jobs sorted according to theirestimated execution time.The example in Figure 3 assumes an eight-processor cluster andconsiders only two parameters: the number of processors and theestimated execution time. This figure shows the effects of generatingschedules using the LJF and SJF algorithms with and without backfilltechniques. Sections c through f of Figure 3 indicate that backfill canimprove schedules generated by LJF and SJF, either by increasing utilization,decreasing response time, or both. To generate the schedulesshown, the low- and high-priority jobs are sorted separately.Examining a commercial resource managerand an external job schedulerThis section introduces scheduling features of a commercial resourcemanager, Load Sharing Facility (LSF) from Platform Computing, andan open source job scheduler, Maui.www.dell.com/powersolutions Reprinted from Dell Power Solutions, February 2005. Copyright © 2005 Dell Inc. All rights reserved. POWER SOLUTIONS 135
Page 1 and 2:
DELL POWER SOLUTIONS • FEBRUARY 2
Page 3 and 4:
POWERSOLUTIONSTHE MAGAZINE FOR DIRE
Page 5 and 6:
© 2005 Quantum Corporation. All ri
Page 7 and 8:
Dave is on vacation.He’s been not
Page 9 and 10:
UTILITY=AVAILABILITY.From SAP to BE
Page 11 and 12:
EXECUTIVE INSIGHTSleading third-par
Page 13 and 14:
NEW-GENERATION SERVER TECHNOLOGYis
Page 15 and 16:
NEW-GENERATION SERVER TECHNOLOGYAno
Page 17 and 18:
NEW-GENERATION SERVER TECHNOLOGYThe
Page 19 and 20:
The industry’s preeminent source
Page 21 and 22:
NEW-GENERATION SERVER TECHNOLOGYI/O
Page 23 and 24:
Will yours be there when you need i
Page 25 and 26:
NEW-GENERATION SERVER TECHNOLOGYas
Page 27 and 28:
NEW-GENERATION SERVER TECHNOLOGYThe
Page 29 and 30:
NEW-GENERATION SERVER TECHNOLOGYpor
Page 31 and 32:
More data? Less time?No problem.Del
Page 33 and 34:
NEW-GENERATION SERVER TECHNOLOGYser
Page 35:
NEW-GENERATION SERVER TECHNOLOGYDTK
Page 38 and 39:
NEW-GENERATION SERVER TECHNOLOGYpac
Page 40 and 41:
NEW-GENERATION SERVER TECHNOLOGYman
Page 42 and 43:
NEW-GENERATION SERVER TECHNOLOGYFig
Page 44 and 45:
NEW-GENERATION SERVER TECHNOLOGYSun
Page 46 and 47:
NEW-GENERATION SERVER TECHNOLOGYTab
Page 48 and 49:
NEW-GENERATION SERVER TECHNOLOGYMan
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
SYSTEMS MANAGEMENTsuch as:Dell Upda
Page 56 and 57:
SYSTEMS MANAGEMENTThis hardware-cen
Page 58 and 59:
SYSTEMS MANAGEMENTCLI taskFigure 4.
Page 60 and 61:
SYSTEMS MANAGEMENTManaging Dell Cli
Page 62 and 63:
SYSTEMS MANAGEMENTFigure 2. OMCA De
Page 64 and 65:
SYSTEMS MANAGEMENTAgentless Monitor
Page 66 and 67:
SYSTEMS MANAGEMENTWeb serverGlobal
Page 68 and 69:
SYSTEMS MANAGEMENTorganizations can
Page 70 and 71:
STORAGE• Multi-staged disk backup
Page 72 and 73:
STORAGEExec Advanced Disk-Based Bac
Page 74 and 75:
STORAGEPrimary disk (RAID)Figure 1.
Page 76 and 77:
STORAGESTORAGEFREESubscriptionReque
Page 78 and 79:
STORAGEcentralized backup can offer
Page 80 and 81:
STORAGEoccurs. Replication can be s
Page 82 and 83:
STORAGEBackup concepts, while firml
Page 84 and 85:
SCALABLE ENTERPRISEFile Systemsfor
Page 86 and 87: SCALABLE ENTERPRISEServer 1 Server
Page 88 and 89: SCALABLE ENTERPRISEThe Promise ofUn
Page 90 and 91: SCALABLE ENTERPRISELANExternalcommu
Page 92 and 93: SCALABLE ENTERPRISErequired for dif
Page 94 and 95: SCALABLE ENTERPRISEFigure 1 shows v
Page 96 and 97: SCALABLE ENTERPRISEDeploying and Ma
Page 98 and 99: SCALABLE ENTERPRISEapplication serv
Page 100 and 101: SCALABLE ENTERPRISEExploitingAutoma
Page 102 and 103: SCALABLE ENTERPRISEFigure 2. Perfor
Page 104 and 105: SCALABLE ENTERPRISEMigrating Oracle
Page 106 and 107: SCALABLE ENTERPRISEexport and impor
Page 108 and 109: SCALABLE ENTERPRISE8> 'm:\expdata\o
Page 110 and 111: SCALABLE ENTERPRISEClientsPublic LA
Page 112 and 113: SCALABLE ENTERPRISEnodes, the clust
Page 114 and 115: HIGH-PERFORMANCE COMPUTING(Red Hat
Page 116 and 117: HIGH-PERFORMANCE COMPUTINGFor monit
Page 118 and 119: HIGH-PERFORMANCE COMPUTINGin a clus
Page 120 and 121: HIGH-PERFORMANCE COMPUTINGPerforman
Page 122 and 123: HIGH-PERFORMANCE COMPUTINGPowerEdge
Page 124 and 125: HIGH-PERFORMANCE COMPUTING2.50Power
Page 126 and 127: HIGH-PERFORMANCE COMPUTINGApplicati
Page 129 and 130: HIGH-PERFORMANCE COMPUTINGthe incre
Page 131 and 132: HIGH-PERFORMANCE COMPUTINGCompute n
Page 133 and 134: HIGH-PERFORMANCE COMPUTINGIBRIX Fus
Page 135: HIGH-PERFORMANCE COMPUTINGPlanning
Page 139 and 140: HIGH-PERFORMANCE COMPUTINGUnderstan
Page 141 and 142: HIGH-PERFORMANCE COMPUTINGprovides
Page 143 and 144: Oracle DatabaseWorld’s #1 Databas
show all

Dell Power Solutions

Create successful ePaper yourself

Delete template?

Save as template?