25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sending a Signal to a Job<br />

Sending a Signal to a Job<br />

<strong>LSF</strong> uses signals to control jobs, to enforce scheduling policies, or in response<br />

to user requests. The principal signals <strong>LSF</strong> uses are SIGSTOP to suspend a job,<br />

SIGCONT to resume a job, and SIGKILL to terminate a job.<br />

Occasionally, you may want to override the default actions. For example,<br />

instead of suspending a job, you might want to kill or checkpoint it. You can<br />

override the default job control actions by defining the JOB_CONTROLS<br />

parameter in your queue configuration. Each queue can have its separate job<br />

control actions.<br />

You can also send a signal directly to a job. You cannot send arbitrary signals<br />

to a pending job; most signals are only valid for running jobs. However, <strong>LSF</strong><br />

does allow you to kill, suspend and resume pending jobs.<br />

You must be the owner of a job or an <strong>LSF</strong> administrator to send signals to a job.<br />

You use the bkill -s command to send a signal to a job. If you issue bkill<br />

without the -s option, a SIGKILL signal is sent to the specified jobs to kill<br />

them. Twenty seconds before SIGKILL is sent, SIGTERM and SIGINT are sent<br />

to give the job a chance to catch the signals and clean up.<br />

On Windows, job control messages replace the SIGINT and SIGTERM signals,<br />

but only customized applications are able to process them. Termination is<br />

implemented by the TerminateProcess() system call.<br />

Signals on different platforms<br />

Sending a signal to a job<br />

<strong>LSF</strong> translates signal numbers across different platforms because different host<br />

types may have different signal numbering. The real meaning of a specific<br />

signal is interpreted by the machine from which the bkill command is issued.<br />

For example, if you send signal 18 from a SunOS 4.x host, it means SIGTSTP.<br />

If the job is running on HP-UX and SIGTSTP is defined as signal number 25,<br />

<strong>LSF</strong> sends signal 25 to the job.<br />

Run bkill -s signal job_id, where signal is either the signal name or<br />

the signal number. For example:<br />

% bkill -s TSTP 3421<br />

Job is being signaled<br />

sends the TSTP signal to job 3421.<br />

On most versions of UNIX, signal names and numbers are listed in the kill(1)<br />

or signal(2) man pages. On Windows, only customized applications are able<br />

to process job control messages specified with the -s option.<br />

122<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!