25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Configuring Job Control Actions<br />

Configuring Job Control Actions<br />

Several situations may require overriding or augmenting the default actions for<br />

job control. For example:<br />

◆ Notifying users when their jobs are suspended, resumed, or terminated<br />

◆ An application holds resources (for example, licenses) that are not freed<br />

by suspending the job. The administrator can set up an action to be<br />

performed that causes the license to be released before the job is<br />

suspended and re-acquired when the job is resumed.<br />

◆ The administrator wants the job checkpointed before being:<br />

❖ Suspended when a run window closes<br />

❖ Killed when the RUNLIMIT is reached<br />

◆ A distributed parallel application must receive a catchable signal when the<br />

job is suspended, resumed or terminated to propagate the signal to remote<br />

processes.<br />

To override the default actions for the SUSPEND, RESUME, and TERMINATE<br />

job controls, specify the JOB_CONTROLS parameter in the queue definition in<br />

lsb.queues.<br />

JOB_CONTROLS parameter (lsb.queues)<br />

signal<br />

CHKPNT<br />

command<br />

The JOB_CONTROLS parameter has the following format:<br />

Begin Queue<br />

...<br />

JOB_CONTROLS = SUSPEND[signal | CHKPNT | command] \<br />

RESUME[signal | command] \<br />

TERMINATE[signal | CHKPNT | command]<br />

...<br />

End Queue<br />

When <strong>LSF</strong> needs to suspend, resume, or terminate a job, it invokes one of the<br />

following actions as specified by SUSPEND, RESUME, and TERMINATE.<br />

A UNIX signal name (for example, SIGTSTP or SIGTERM). The specified signal<br />

is sent to the job.<br />

The same set of signals is not supported on all UNIX systems. To display a list<br />

of the symbolic names of the signals (without the SIG prefix) supported on<br />

your system, use the kill -l command.<br />

Checkpoint the job. Only valid for SUSPEND and TERMINATE actions.<br />

◆ If the SUSPEND action is CHKPNT, the job is checkpointed and then<br />

stopped by sending the SIGSTOP signal to the job automatically.<br />

◆ If the TERMINATE action is CHKPNT, then the job is checkpointed and<br />

killed automatically.<br />

A /bin/sh command line. Do not quote the command line inside an action<br />

definition.<br />

See the <strong>Platform</strong> <strong>LSF</strong> Reference for information about the lsb.queues file.<br />

392<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!