25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Making Jobs Checkpointable<br />

Manually<br />

At job submission<br />

After job<br />

submission<br />

Automatically<br />

Chapter 23<br />

Job Checkpoint, Restart, and Migration<br />

Making a job checkpointable involves specifying the location of a checkpoint<br />

directory to <strong>LSF</strong>. This can be done manually on the command line or<br />

automatically through configuration.<br />

Manually making a job checkpointable involves specifying the checkpoint<br />

directory on the command line. <strong>LSF</strong> will create the directory if it does not exist.<br />

A job can be made checkpointable at job submission or after submission.<br />

Use the -k "checkpoint_dir" option of bsub to specify the checkpoint<br />

directory for a job at submission. For example, to specify my_dir as the<br />

checkpoint directory for my_job:<br />

% bsub -k "my_dir" my_job<br />

Job is submitted to default queue .<br />

Use the -k "checkpoint_dir" option of bmod to specify the checkpoint<br />

directory for a job after submission. For example, to specify my_dir as the<br />

checkpoint directory for a job with job ID 123:<br />

% bmod -k "my_dir" 123<br />

Parameters of job are being changed<br />

Automatically making a job checkpointable involves submitting the job to a<br />

queue that is configured for checkpointable jobs. To configure a queue, edit<br />

lsb.queues and specify the checkpoint directory for the CHKPNT parameter<br />

on a queue. The checkpoint directory must already exist, <strong>LSF</strong> will not create<br />

the directory.<br />

For example, to configure a queue for checkpointable jobs using a directory<br />

named my_dir:<br />

Begin Queue<br />

...<br />

CHKPNT=my_dir<br />

DESCRIPTION = Make jobs checkpointable using "my_dir"<br />

...<br />

End Queue<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 315

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!