25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 27<br />

Load Thresholds<br />

MEM=100/10<br />

SWAP=200/30<br />

translates into a loadSched condition of mem>=100 && swap>=200 and a<br />

loadStop condition of mem < 10 || swap < 30.<br />

Theory ◆ The r15s, r1m, and r15m CPU run queue length conditions are compared<br />

to the effective queue length as reported by lsload -E, which is<br />

normalised for multiprocessor hosts. Thresholds for these parameters<br />

should be set at appropriate levels for single processor hosts.<br />

◆ Configure load thresholds consistently across queues. If a low priority<br />

queue has higher suspension thresholds than a high priority queue, then<br />

jobs in the higher priority queue will be suspended before jobs in the low<br />

priority queue.<br />

Configuring load thresholds at host level<br />

A shared resource cannot be used as a load threshold in the Hosts section of<br />

the lsf.cluster.cluster_name file.<br />

Configuring suspending conditions at queue level<br />

The condition for suspending a job can be specified using the queue-level<br />

STOP_COND parameter. It is defined by a resource requirement string. Only<br />

the select section of the resource requirement string is considered when<br />

stopping a job. All other sections are ignored.<br />

This parameter provides similar but more flexible functionality for loadStop.<br />

If loadStop thresholds have been specified, then a job will be suspended if<br />

either the STOP_COND is TRUE or the loadStop thresholds are exceeded.<br />

Example This queue will suspend a job based on the idle time for desktop machines<br />

and based on availability of swap and memory on compute servers. Assume<br />

cs is a Boolean resource defined in the lsf.shared file and configured in the<br />

lsf.cluster.cluster_name file to indicate that a host is a compute server:<br />

Begin Queue<br />

.<br />

STOP_COND= select[((!cs && it < 5) || (cs && mem < 15 && swap < 50))]<br />

.<br />

End Queue<br />

Viewing host-level and queue-level suspending conditions<br />

The suspending conditions are displayed by the bhosts -l and bqueues -l<br />

commands.<br />

Viewing job-level suspending conditions<br />

The thresholds that apply to a particular job are the more restrictive of the host<br />

and queue thresholds, and are displayed by the bjobs -l command.<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 363

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!