02.12.2012 Views

OpenVMS Cluster Systems - OpenVMS Systems - HP

OpenVMS Cluster Systems - OpenVMS Systems - HP

OpenVMS Cluster Systems - OpenVMS Systems - HP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Troubleshooting the NISCA Protocol<br />

F.2 Addressing LAN Communication Problems<br />

• Retransmission<br />

A well-configured <strong>OpenVMS</strong> <strong>Cluster</strong> system should not perform excessive<br />

retransmissions between nodes. Retransmissions between any nodes<br />

that occur more frequently than once every few seconds deserve network<br />

investigation.<br />

Diagnosing failures at this level becomes more complex because the errors<br />

are usually intermittent. Moreover, even though PEDRIVER is aware when a<br />

channel is unavailable and performs error recovery based on this information, it<br />

does not provide notification when a channel failure occurs; PEDRIVER provides<br />

notification only for virtual circuit failures.<br />

However, the Local Area <strong>OpenVMS</strong> <strong>Cluster</strong> Network Failure Analysis Program<br />

(LAVC$FAILURE_ANALYSIS), available in SYS$EXAMPLES, can help you use<br />

PEDRIVER information about channel status. The LAVC$FAILURE_ANALYSIS<br />

program (documented in Appendix D) analyzes long-term channel outages, such<br />

as hard failures in LAN network components that occur during run time.<br />

This program uses tables in which you describe your LAN hardware<br />

configuration. During a channel failure, PEDRIVER uses the hardware<br />

configuration represented in the table to isolate which component might be<br />

causing the failure. PEDRIVER reports the suspected component through<br />

an OPCOM display. You can then isolate the LAN component for repair or<br />

replacement.<br />

Reference: Section F.7 addresses the kinds of problems you might find in the<br />

NISCA protocol and provides methods for diagnosing and solving them.<br />

F.2.6 Checking System Parameters<br />

Table F–3 describes several system parameters relevant to the recovery and<br />

failover time limits for LANs in an <strong>OpenVMS</strong> <strong>Cluster</strong>.<br />

Table F–3 System Parameters for Timing<br />

Parameter Use<br />

RECNXINTERVAL<br />

Defines the amount of time to wait before<br />

removing a node from the <strong>OpenVMS</strong> <strong>Cluster</strong><br />

after detection of a virtual circuit failure, which<br />

could result from a LAN bridge failure.<br />

MVTIMEOUT<br />

Defines the amount of time the <strong>OpenVMS</strong><br />

operating system tries to recover a path to a<br />

disk before returning failure messages to the<br />

application.<br />

If your network uses multiple paths and you<br />

want the <strong>OpenVMS</strong> <strong>Cluster</strong> to survive failover<br />

between LAN bridges, make sure the value of<br />

RECNXINTERVAL is greater than the time it<br />

takes to fail over those paths.<br />

Reference: The formula for calculating this<br />

parameter is discussed in Section 3.4.7.<br />

Relevant when an <strong>OpenVMS</strong> <strong>Cluster</strong><br />

configuration is set up to serve disks over<br />

either the Ethernet or FDDI. MVTIMEOUT<br />

is similar to RECNXINTERVAL except that<br />

RECNXINTERVAL is CPU to CPU, and<br />

MVTIMEOUT is CPU to disk.<br />

(continued on next page)<br />

Troubleshooting the NISCA Protocol F–7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!