25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

About Job Requeue<br />

About Job Requeue<br />

A networked computing environment is vulnerable to any failure or temporary<br />

conditions in network services or processor resources. For example, you might<br />

get NFS stale handle errors, disk full errors, process table full errors, or network<br />

connectivity problems. Your application can also be subject to external<br />

conditions such as a software license problems, or an occasional failure due to<br />

a bug in your application.<br />

Such errors are temporary and probably will happen at one time but not<br />

another, or on one host but not another. You might be upset to learn all your<br />

jobs exited due to temporary errors and you did not know about it until 12<br />

hours later.<br />

<strong>LSF</strong> provides a way to automatically recover from temporary errors. You can<br />

configure certain exit values such that in case a job exits with one of the values,<br />

the job will be automatically requeued as if it had not yet been dispatched. This<br />

job will then be retried later. It is also possible for you to configure your queue<br />

such that a requeued job will not be scheduled to hosts on which the job had<br />

previously failed to run.<br />

300<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!