25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Suspending Conditions<br />

Viewing suspend reason<br />

Resuming suspended jobs<br />

The bjobs -lp command shows the load threshold that caused <strong>LSF</strong> to<br />

suspend a job, together with the scheduling parameters.<br />

The use of STOP_COND affects the suspending reasons as displayed by the<br />

bjobs command. If STOP_COND is specified in the queue and the loadStop<br />

thresholds are not specified, the suspending reasons for each individual load<br />

index will not be displayed.<br />

Jobs are suspended to prevent overloading hosts, to prevent batch jobs from<br />

interfering with interactive use, or to allow a more urgent job to run. When the<br />

host is no longer overloaded, suspended jobs should continue running.<br />

When <strong>LSF</strong> automatically resumes a job, it invokes the RESUME action. The<br />

default action for RESUME is to send the signal SIGCONT.<br />

If there are any suspended jobs on a host, <strong>LSF</strong> checks the load levels in each<br />

dispatch turn.<br />

If the load levels are within the scheduling thresholds for the queue and the<br />

host, and all the resume conditions for the queue (RESUME_COND in<br />

lsb.queues) are satisfied, the job is resumed.<br />

If RESUME_COND is not defined, then the loadSched thresholds are used to<br />

control resuming of jobs: all the loadSched thresholds must be satisfied for<br />

the job to be resumed. The loadSched thresholds are ignored if<br />

RESUME_COND is defined.<br />

Jobs from higher priority queues are checked first. To prevent overloading the<br />

host again, only one job is resumed in each dispatch turn.<br />

Specifying resume condition<br />

Viewing resume thresholds<br />

Use RESUME_COND in lsb.queues to specify the condition that must be<br />

satisfied on a host if a suspended job is to be resumed.<br />

Only the select section of the resource requirement string is considered when<br />

resuming a job. All other sections are ignored.<br />

The bjobs -l command displays the scheduling thresholds that control when<br />

a job is resumed.<br />

364<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!