25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Changing Default LIM Behavior to Improve Performance<br />

<strong>LSF</strong>_MASTER_LIST<br />

Defined<br />

When <strong>LSF</strong>_MASTER_LIST is defined, <strong>LSF</strong> only rejects candidate master hosts<br />

listed in <strong>LSF</strong>_MASTER_LIST from the cluster if:<br />

◆ The number of load indices in lsf.cluster.cluster_name or<br />

lsf.shared for master candidates is different from the number of load<br />

indices in the lsf.cluster.cluster_name or lsf.shared files of the<br />

elected master.<br />

A warning is logged in the log file lim.log.master_host_name and the<br />

cluster continues to run, but without the hosts that were rejected.<br />

If you want the hosts that were rejected to be part of the cluster, ensure the<br />

number of load indices in lsf.cluster.cluster_name and lsf.shared<br />

are identical for all master candidates and restart LIMs on the master and all<br />

master candidates:<br />

% lsadmin limrestart hostA hostB hostC<br />

<strong>LSF</strong>_MASTER_LIST defined, and master host goes down<br />

If <strong>LSF</strong>_MASTER_LIST is defined and the elected master host goes down, and if<br />

the number of load indices in lsf.cluster.cluster_name or lsf.shared<br />

for the new elected master is different from the number of load indices in the<br />

files of the master that went down, <strong>LSF</strong> will reject all master candidates that do<br />

not have the same number of load indices in their files as the newly elected<br />

master. <strong>LSF</strong> will also reject all slave-only hosts. This could cause a situation in<br />

which only the newly elected master is considered part of the cluster.<br />

A warning is logged in the log file lim.log.new_master_host_name and<br />

the cluster continues to run, but without the hosts that were rejected.<br />

To resolve this, from the current master host, restart all LIMs:<br />

% lsadmin limrestart all<br />

All slave-only hosts will be considered part of the cluster. Master candidates<br />

with a different number of load indices in their<br />

lsf.cluster.cluster_name or lsf.shared files will be rejected.<br />

When the master that was down comes back up, you will have the same<br />

situation as described in “<strong>LSF</strong>_MASTER_LIST Defined” on page 490. You will<br />

need to ensure load indices defined in lsf.cluster.cluster_name and<br />

lsf.shared for all master candidates are identical and restart LIMs on all<br />

master candidates.<br />

490<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!