25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 42<br />

Troubleshooting and Error Messages<br />

Batch daemons die quietly<br />

First, check the sbatchd and mbatchd error logs. Try running the following<br />

command to check the configuration.<br />

% badmin ckconfig<br />

This reports most errors. You should also check if there is any email in the <strong>LSF</strong><br />

administrator’s mailbox. If the mbatchd is running but the sbatchd dies on<br />

some hosts, it may be because mbatchd has not been configured to use those<br />

hosts.<br />

See “Host not used by <strong>LSF</strong>” on page 531.<br />

sbatchd starts but mbatchd does not<br />

Host not used by <strong>LSF</strong><br />

Check whether LIM is running. You can test this by running the lsid<br />

command. If LIM is not running properly, follow the suggestions in this chapter<br />

to fix the LIM first. It is possible that mbatchd is temporarily unavailable<br />

because the master LIM is temporarily unknown, causing the following error<br />

message.<br />

sbatchd: unknown service<br />

Check whether services are registered properly. See “Registering Service Ports”<br />

on page 85 for information about registering <strong>LSF</strong> services.<br />

If you configure a list of server hosts in the Host section of the lsb.hosts file,<br />

mbatchd allows sbatchd to run only on the hosts listed. If you try to configure<br />

an unknown host in the HostGroup or HostPartition sections of the<br />

lsb.hosts file, or as a HOSTS definition for a queue in the lsb.queues file,<br />

mbatchd logs the following message.<br />

mbatchd on host: LSB_CONFDIR/cluster/configdir/file(line #):<br />

Host hostname is not used by lsbatch;<br />

ignored<br />

If you start sbatchd on a host that is not known by mbatchd, mbatchd rejects<br />

the sbatchd. The sbatchd logs the following message and exits.<br />

This host is not used by lsbatch system.<br />

Both of these errors are most often caused by not running the following<br />

commands, in order, after adding a host to the configuration.<br />

lsadmin reconfig<br />

badmin reconfig<br />

You must run both of these before starting the daemons on the new host.<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 531

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!