25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 22<br />

Job Requeue and Job Rerun<br />

Exclusive Job Requeue<br />

About exclusive job requeue<br />

You can configure automatic job requeue so that a failed job is not rerun on<br />

the same host.<br />

Limitations ◆ If mbatchd is restarted, this feature might not work properly, since <strong>LSF</strong><br />

forgets which hosts have been excluded. If a job ran on a host and exited<br />

with an exclusive exit code before mbatchd was restarted, the job could<br />

be dispatched to the same host again after mbatchd is restarted.<br />

◆ Exclusive job requeue does not work for MultiCluster jobs or parallel jobs<br />

◆ A job terminated by a signal is not requeued<br />

Configuring exclusive job requeue<br />

Example<br />

Set REQUEUE_EXIT_VALUES in the queue definition (lsb.queues) and define<br />

the exit code using parentheses and the keyword EXCLUDE, as shown:<br />

EXCLUDE(exit_code...)<br />

When a job exits with any of the specified exit codes, it will be requeued, but<br />

it will not be dispatched to the same host again.<br />

Begin Queue<br />

...<br />

REQUEUE_EXIT_VALUES=30 EXCLUDE(20)<br />

HOSTS=hostA hostB hostC<br />

...<br />

End Queue<br />

A job in this queue can be dispatched to hostA, hostB or hostC.<br />

If a job running on hostA exits with value 30 and is requeued, it can be<br />

dispatched to hostA, hostB, or hostC. However, if a job running on hostA<br />

exits with value 20 and is requeued, it can only be dispatched to hostB or<br />

hostC.<br />

If the job runs on hostB and exits with a value of 20 again, it can only be<br />

dispatched on hostC. Finally, if the job runs on hostC and exits with a value<br />

of 20, it cannot be dispatched to any of the hosts, so it will be pending forever.<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 303

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!