25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

How <strong>LSF</strong> Uses Resources<br />

Viewing job resource usage<br />

Viewing load on a host<br />

Chapter 8<br />

Understanding Resources<br />

Jobs submitted through the <strong>LSF</strong> system will have the resources they use<br />

monitored while they are running. This information is used to enforce resource<br />

usage limits and load thresholds as well as for fairshare scheduling.<br />

<strong>LSF</strong> collects information such as:<br />

◆ Total CPU time consumed by all processes in the job<br />

◆ Total resident memory usage in KB of all currently running processes in a<br />

job<br />

◆ Total virtual memory usage in KB of all currently running processes in a job<br />

◆ Currently active process group ID in a job<br />

◆ Currently active processes in a job<br />

On UNIX, job-level resource usage is collected through a special process called<br />

PIM (Process Information Manager). PIM is managed internally by <strong>LSF</strong>.<br />

The -l option of the bjobs command displays the current resource usage of<br />

the job. The usage information is sampled by PIM every 30 seconds and<br />

collected by sbatchd at a maximum frequency of every SBD_SLEEP_TIME<br />

(configured in the lsb.params file) and sent to mbatchd. The update is done<br />

only if the value for the CPU time, resident memory usage, or virtual memory<br />

usage has changed by more than 10 percent from the previous update, or if a<br />

new process or process group has been created.<br />

Use bhosts -l to check the load levels on the host, and adjust the suspending<br />

conditions of the host or queue if necessary. The bhosts -l command gives<br />

the most recent load values used for the scheduling of jobs. A dash (-) in the<br />

output indicates that the particular threshold is not defined.<br />

% bhosts -l hostB<br />

HOST: hostB<br />

STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV<br />

ok 20.00 2 2 0 0 0 0 0<br />

CURRENT LOAD USED FOR SCHEDULING:<br />

r15s r1m r15m ut pg io ls t tmp swp mem<br />

Total 0.3 0.8 0.9 61% 3.8 72 26 0 6M 253M 297M<br />

Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M<br />

LOAD THRESHOLD USED FOR SCHEDULING:<br />

r15s r1m r15m ut pg io ls it tmp swp me<br />

m<br />

loadSched - - - - - - - - - - -<br />

loadStop - - - - - - - - - - -<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 143

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!