25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Creating Custom echkpnt and erestart for Application-level Checkpointing<br />

Creating Custom echkpnt and erestart for<br />

Application-level Checkpointing<br />

Different applications may have different checkpointing implementations and<br />

custom echkpnt and erestart programs.<br />

You can write your own echkpnt and erestart programs to checkpoint your<br />

specific applications and tell <strong>LSF</strong> which program to use for which application.<br />

◆ “Writing custom echkpnt and erestart programs” on page 310<br />

◆ “Configuring <strong>LSF</strong> to recognize the custom echkpnt and erestart” on<br />

page 312<br />

Writing custom echkpnt and erestart programs<br />

Programming<br />

language<br />

Name<br />

Location<br />

Supported syntax<br />

for echkpnt<br />

You can write your own echkpnt and erestart interfaces in C or Fortran.<br />

Assign the name echkpnt.method_name and erestart.method_name,<br />

where method_name is the name that identifies this is the program for a<br />

specific application.<br />

For example, if your custom echkpnt is for my_app, you would have:<br />

echkpnt.my_app, erestart.my_app.<br />

Place echkpnt.method_name and erestart.method_name in<br />

<strong>LSF</strong>_SERVERDIR. You can specify a different directory with<br />

LSB_ECHKPNT_METHOD_DIR as an environment variable or in lsf.conf.<br />

The method name (LSB_ECHKPNT_METHOD in lsf.conf or as an<br />

environment variable) and location (LSB_ECHKPNT_METHOD_DIR)<br />

combination must be unique in the cluster. For example, you may have two<br />

echkpnt applications with the same name such as echkpnt.mymethod but<br />

what differentiates them is the different directories defined with<br />

LSB_ECHKPNT_METHOD_DIR.<br />

The checkpoint method directory should be accessible by all users who need<br />

to run the custom echkpnt and erestart programs.<br />

Your echkpnt.method_name must recognize commands in the following<br />

syntax as these are the options used by echkpnt to communicate with your<br />

echkpnt.method_name:<br />

echkpnt [-c] [-f] [-k | -s] [-d checkpoint_dir] [-x] process_group_ID<br />

Supported syntax<br />

for erestart<br />

For more details on echkpnt syntax, see the echkpnt(8) man page.<br />

Your erestart.method_name must recognize commands in the following<br />

syntax as these are the options used by erestart to communicate with your<br />

erestart.method_name .<br />

erestart [-c] [-f] checkpoint_dir<br />

For more details, see the erestart(8) man page.<br />

310<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!