25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 41<br />

Error and Event Logging<br />

Network<br />

partioning<br />

Setting an event<br />

update interval<br />

We assume that Network partitioning does not cause a cluster to split into two<br />

independent clusters, each simultaneously running mbatchd.<br />

This may happen given certain network topologies and failure modes. For<br />

example, connectivity is lost between the first master, M1, and both the file<br />

server and the secondary master, M2. Both M1 and M2 will run mbatchd<br />

service with M1 logging events to LSB_LOCALDIR and M2 logging to<br />

LSB_SHAREDIR. When connectivity is restored, the changes made by M2 to<br />

LSB_SHAREDIR will be lost when M1 updates LSB_SHAREDIR from its copy in<br />

LSB_LOCALDIR.<br />

The archived event files are only available on LSB_LOCALDIR, so in the case<br />

of network partitioning, commands such as bhist cannot access these files. As<br />

a precaution, you should periodically copy the archived files from<br />

LSB_LOCALDIR to LSB_SHAREDIR.<br />

If NFS traffic is too high and you want to reduce network traffic, use<br />

EVENT_UPDATE_INTERVAL in lsb.params to specify how often to back up<br />

the data and synchronize the LSB_SHAREDIR and LSB_LOCALDIR directories.<br />

The directories are always synchronized when data is logged to the files, or<br />

when mbatchd is started on the first <strong>LSF</strong> master host.<br />

Automatic archiving and duplicate logging<br />

Event logs<br />

Archived event logs, lsb.events.n, are not replicated to LSB_SHAREDIR. If<br />

<strong>LSF</strong> starts a new event log while the file server containing LSB_SHAREDIR is<br />

down, you might notice a gap in the historical data in LSB_SHAREDIR.<br />

Configuring duplicate logging<br />

To enable duplicate logging, set LSB_LOCALDIR in lsf.conf to a directory on<br />

the first master host (the first host configured in lsf.cluster.cluster_name)<br />

that will be used to store the primary copies of lsb.events. This directory<br />

should only exist on the first master host.<br />

1 Edit lsf.conf and set LSB_LOCALDIR to a local directory that exists only<br />

on the first master host.<br />

2 Use the commands lsadmin reconfig and badmin mbdrestart to make<br />

the changes take effect.<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 525

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!