25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 23<br />

Job Checkpoint, Restart, and Migration<br />

Automatically Migrating Jobs<br />

Configuring<br />

queue migration<br />

threshold<br />

Configuring host<br />

migration<br />

threshold<br />

Requeuing migrating jobs<br />

Automatic job migration works on the premise that if a job is suspended<br />

(SSUSP) for an extended period of time, due to load conditions or any other<br />

reason, the execution host is heavily loaded. To allow the job to make progress<br />

and to reduce the load on the host, a migration threshold is configured. <strong>LSF</strong><br />

allows migration thresholds to be configured for queues and hosts. The<br />

threshold is specified in minutes.<br />

When configured on a queue, the threshold will apply to all jobs submitted to<br />

the queue. When defined at the host level, the threshold will apply to all jobs<br />

running on the host. When a migration threshold is configured on both a<br />

queue and host, the lower threshold value is used. If the migration threshold<br />

is configured to 0 (zero), the job will be migrated immediately upon<br />

suspension (SSUSP).<br />

You can use bmig at anytime to override a configured threshold.<br />

To configure a migration threshold for a queue, edit lsb.queues and specify<br />

a threshold for the MIG parameter. For example, to configure a queue to<br />

migrate suspended jobs after 30 minutes:<br />

Begin Queue<br />

...<br />

MIG=30 # Migration threshold set to 30 mins<br />

DESCRIPTION=Migrate suspended jobs after 30 mins<br />

...<br />

End Queue<br />

To configure a migration threshold for a host, edit lsb.hosts and specify a<br />

threshold for the MIG parameter for a host. For example, to configure a host<br />

to migrate suspended jobs after 30 minutes:<br />

Begin Host<br />

HOST_NAME r1m pg MIG # Keywords<br />

...<br />

hostA 5.0 18 30<br />

...<br />

End Host<br />

By default, <strong>LSF</strong> restarts or reruns a migrating job on the next available host,<br />

bypassing all pending jobs.<br />

You can configure <strong>LSF</strong> to requeue migrating jobs rather than immediately<br />

restarting them. Jobs will be requeued in PEND state and ordered according to<br />

their original submission time and priority. To requeue migrating jobs, edit<br />

lsf.conf and set LSB_MIG2PEND=1.<br />

Additionally, you can configure <strong>LSF</strong> to requeue migrating jobs to the bottom<br />

of the queue by editing lsf.conf and setting LSB_MIG2PEND=1 and<br />

LSB_REQUEUE_TO_BOTTOM=1.<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 321

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!