02.12.2012 Views

OpenVMS Cluster Systems - OpenVMS Systems - HP

OpenVMS Cluster Systems - OpenVMS Systems - HP

OpenVMS Cluster Systems - OpenVMS Systems - HP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

IF... THEN...<br />

The startup procedures fail to<br />

complete after a period that is<br />

normal for your site.<br />

You suspect that the value for<br />

the NPAGEDYN parameter is<br />

set too low.<br />

You suspect a shortage of<br />

page file space, and another<br />

<strong>OpenVMS</strong> <strong>Cluster</strong> computer is<br />

available.<br />

The computer still cannot<br />

complete the startup procedures.<br />

C.6 Diagnosing LAN Component Failures<br />

<strong>Cluster</strong> Troubleshooting<br />

C.5 Startup Procedures Fail to Complete<br />

Try to access the procedures from another <strong>OpenVMS</strong> <strong>Cluster</strong><br />

computer and make appropriate adjustments. For example, verify<br />

that all required devices are configured and available. One cause<br />

of such a failure could be the lack of some system resource, such<br />

as NPAGEDYN or page file space.<br />

Perform a conversational bootstrap operation to increase it. Use<br />

SYSBOOT to check the current value, and then double the value.<br />

Log in on that computer and use the System Generation utility<br />

(SYSGEN) to provide adequate page file space for the problem<br />

computer.<br />

Note: Insufficent page-file space on the booting computer might<br />

cause other computers to hang.<br />

Contact your Compaq support representative.<br />

Section D.5 provides troubleshooting techniques for LAN component failures (for<br />

example, broken LAN bridges). That appendix also describes techniques for using<br />

the Local Area <strong>OpenVMS</strong> <strong>Cluster</strong> Network Failure Analysis Program.<br />

Intermittent LAN component failures (for example, packet loss) can cause<br />

problems in the NISCA transport protocol that delivers System Communications<br />

Services (SCS) messages to other nodes in the <strong>OpenVMS</strong> <strong>Cluster</strong>. Appendix F<br />

describes troubleshooting techniques and requirements for LAN analyzer tools.<br />

C.7 Diagnosing <strong>Cluster</strong> Hangs<br />

Conditions like the following can cause a <strong>OpenVMS</strong> <strong>Cluster</strong> computer to suspend<br />

process or system activity (that is, to hang):<br />

Condition Reference<br />

<strong>Cluster</strong> quorum is lost. Section C.7.1<br />

A shared cluster resource is inaccessible. Section C.7.2<br />

C.7.1 <strong>Cluster</strong> Quorum is Lost<br />

The <strong>OpenVMS</strong> <strong>Cluster</strong> quorum algorithm coordinates activity among <strong>OpenVMS</strong><br />

<strong>Cluster</strong> computers and ensures the integrity of shared cluster resources. (The<br />

quorum algorithm is described fully in Chapter 2.) Quorum is checked after any<br />

change to the cluster configuration—for example, when a voting computer leaves<br />

or joins the cluster. If quorum is lost, process and I/O activity on all computers in<br />

the cluster are blocked.<br />

Information about the loss of quorum and about clusterwide events that cause<br />

loss of quorum are sent to the OPCOM process, which broadcasts messages<br />

to designated operator terminals. The information is also broadcast to each<br />

computer’s operator console (OPA0), unless broadcast activity is explicitly<br />

disabled on that terminal. However, because quorum may be lost before OPCOM<br />

has been able to inform the operator terminals, the messages sent to OPA0 are<br />

the most reliable source of information about events that cause loss of quorum.<br />

If quorum is lost, you might add or reboot a node with additional votes.<br />

<strong>Cluster</strong> Troubleshooting C–15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!