25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Enabling Periodic Checkpointing<br />

At job submission<br />

After job<br />

submission<br />

Chapter 23<br />

Job Checkpoint, Restart, and Migration<br />

Periodic checkpointing involves creating a checkpoint file at regular time<br />

intervals during the execution of your job. <strong>LSF</strong> provides the ability to enable<br />

periodic checkpointing manually on the command line and automatically<br />

through configuration. Automatic periodic checkpointing is discussed in<br />

“Automatically Checkpointing Jobs” on page 318. <strong>LSF</strong> can only perform a<br />

checkpoint for checkpointable jobs as described in “Making Jobs<br />

Checkpointable” on page 315.<br />

Manually enabling periodic checkpointing involves specifying a checkpoint<br />

period in minutes.<br />

<strong>LSF</strong> uses the -k "checkpoint_dir checkpoint_period" option of bsub to<br />

enable periodic checkpointing at job submission. For example, to periodically<br />

checkpoint my_job every 2 hours (120 minutes):<br />

% bsub -k "my_dir 120" my_job<br />

Job is submitted to default queue .<br />

<strong>LSF</strong> uses the -p period option of bchkpnt to enable periodic checkpointing<br />

after submission. When a checkpoint period is specified after submission, <strong>LSF</strong><br />

checkpoints the job immediately then checkpoints it again after the specified<br />

period of time. For example, to periodically checkpoint a job with job ID 123<br />

every 2 hours (120 minutes):<br />

% bchkpnt -p 120 123<br />

Job is being checkpointed<br />

You can also use the -p option of bchkpnt to change a checkpoint period. For<br />

example, to change the checkpoint period of a job with job ID 123 to every 4<br />

hours (240 minutes):<br />

% bchkpnt -p 240 123<br />

Job is being checkpointed<br />

Disabling periodic checkpointing<br />

To disable periodic checkpointing, specify a period of 0 (zero). For example,<br />

to disable periodic checkpointing for a job with job ID 123:<br />

% bchkpnt -p 0 123<br />

Job is being checkpointed<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 317

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!