OpenVMS Cluster Systems - OpenVMS Systems - HP

More documents

Recommendations

Info

Troubleshooting the NISCA Protocol F.2 Addressing LAN Communication Problems This message is displayed when PEDRIVER recently had to perform an excessively high rate of packet retransmissions on the LAN path consisting of the local device, the intervening network, and the device on the remote node. The message indicates that the LAN path has degraded and is approaching, or has reached, the point where reliable communications with the remote node are no longer possible. It is likely that the virtual circuit to the remote node will close if the losses continue. Furthermore, continued operation with high LAN packet losses can result in significant loss in performance because of the communication delays resulting from the packet loss detection timeouts and packet retransmission. The corrective steps to take are: 1. Check the local and remote LAN device error counts to see whether a problem exists on the devices. Issue the following commands on each node: $ SHOW DEVICE local-device-name $ MC SCACP SCACP> SHOW LAN device-name $ MC LANCP LANCP> SHOW DEVICE device-name/COUNT 2. If device error counts on the local devices are within normal bounds, contact your network administrators to request that they diagnose the LAN path between the devices. F.2.4 Preliminary Network Diagnosis If the symptoms and preliminary diagnosis indicate that you might have a network problem, troubleshooting LAN communication failures should start with the step-by-step procedures described in Appendix C. Appendix C helps you diagnose and solve common Ethernet and FDDI LAN communication failures during the following stages of OpenVMS Cluster activity: • When a computer or a satellite fails to boot • When a computer fails to join the OpenVMS Cluster • During run time when startup procedures fail to complete • When a OpenVMS Cluster hangs The procedures in Appendix C require that you verify a number of parameters during the diagnostic process. Because system parameter settings play a key role in effective OpenVMS Cluster communications, Section F.2.6 describes several system parameters that are especially important to the timing of LAN bridges, disk failover, and channel availability. F.2.5 Tracing Intermittent Errors Because PEDRIVER communication is based on channels, LAN network problems typically fall into these areas: • Channel formation and maintenance Channels are formed when HELLO datagram messages are received from a remote system. A failure can occur when the HELLO datagram messages are not received or when the channel control message contains the wrong data. F–6 Troubleshooting the NISCA Protocol
Troubleshooting the NISCA Protocol F.2 Addressing LAN Communication Problems • Retransmission A well-configured OpenVMS Cluster system should not perform excessive retransmissions between nodes. Retransmissions between any nodes that occur more frequently than once every few seconds deserve network investigation. Diagnosing failures at this level becomes more complex because the errors are usually intermittent. Moreover, even though PEDRIVER is aware when a channel is unavailable and performs error recovery based on this information, it does not provide notification when a channel failure occurs; PEDRIVER provides notification only for virtual circuit failures. However, the Local Area OpenVMS Cluster Network Failure Analysis Program (LAVC$FAILURE_ANALYSIS), available in SYS$EXAMPLES, can help you use PEDRIVER information about channel status. The LAVC$FAILURE_ANALYSIS program (documented in Appendix D) analyzes long-term channel outages, such as hard failures in LAN network components that occur during run time. This program uses tables in which you describe your LAN hardware configuration. During a channel failure, PEDRIVER uses the hardware configuration represented in the table to isolate which component might be causing the failure. PEDRIVER reports the suspected component through an OPCOM display. You can then isolate the LAN component for repair or replacement. Reference: Section F.7 addresses the kinds of problems you might find in the NISCA protocol and provides methods for diagnosing and solving them. F.2.6 Checking System Parameters Table F–3 describes several system parameters relevant to the recovery and failover time limits for LANs in an OpenVMS Cluster. Table F–3 System Parameters for Timing Parameter Use RECNXINTERVAL Defines the amount of time to wait before removing a node from the OpenVMS Cluster after detection of a virtual circuit failure, which could result from a LAN bridge failure. MVTIMEOUT Defines the amount of time the OpenVMS operating system tries to recover a path to a disk before returning failure messages to the application. If your network uses multiple paths and you want the OpenVMS Cluster to survive failover between LAN bridges, make sure the value of RECNXINTERVAL is greater than the time it takes to fail over those paths. Reference: The formula for calculating this parameter is discussed in Section 3.4.7. Relevant when an OpenVMS Cluster configuration is set up to serve disks over either the Ethernet or FDDI. MVTIMEOUT is similar to RECNXINTERVAL except that RECNXINTERVAL is CPU to CPU, and MVTIMEOUT is CPU to disk. (continued on next page) Troubleshooting the NISCA Protocol F–7
Page 1 and 2:
OpenVMS Cluster Systems Order Numbe
Page 3 and 4:
Contents Preface ..................
Page 5 and 6:
4 The OpenVMS Cluster Operating Env
Page 7 and 8:
6.5.4 Disk Rebuild Operation ......
Page 9 and 10:
10 Maintaining an OpenVMS Cluster S
Page 11 and 12:
D Sample Programs for LAN Control D
Page 13 and 14:
G NISCA Transport Protocol Channel
Page 15 and 16:
Tables 7-3 Clusterwide Generic Prin
Page 17:
F-6 Channel Formation .............
Page 20 and 21:
Appendix C provides troubleshooting
Page 22 and 23:
xxii bold text This typeface repres
Page 24 and 25:
Introduction to OpenVMS Cluster Sys
Page 26 and 27:
Page 28 and 29:
Page 30 and 31:
Page 32 and 33:
Page 34 and 35:
Page 36 and 37:
OpenVMS Cluster Concepts 2.1 OpenVM
Page 38 and 39:
Page 40 and 41:
OpenVMS Cluster Concepts 2.3 Ensuri
Page 42 and 43:
OpenVMS Cluster Concepts 2.3 Ensuri
Page 44 and 45:
OpenVMS Cluster Concepts 2.4 State
Page 46 and 47:
Page 48 and 49:
OpenVMS Cluster Concepts 2.6 Synchr
Page 50 and 51:
OpenVMS Cluster Concepts 2.8 Disk A
Page 53 and 54:
3 OpenVMS Cluster Interconnect Conf
Page 55 and 56:
OpenVMS Cluster Interconnect Config
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
4 The OpenVMS Cluster Operating Env
Page 69 and 70:
Table 4-1 Information Required to P
Page 71 and 72:
Table 4-1 (Cont.) Information Requi
Page 73 and 74:
Table 4-2 Installing Layered Produc
Page 75 and 76:
The OpenVMS Cluster Operating Envir
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
5 Preparing a Shared Environment In
Page 85 and 86:
Preparing a Shared Environment 5.2
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Table 5-2 Alias Collisions and Outc
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Table 5-3 (Cont.) Security Files Fi
Page 103 and 104:
Table 5-3 (Cont.) Security Files Fi
Page 105 and 106:
Page 107 and 108:
6 Cluster Storage Devices One of th
Page 109 and 110:
Figure 6-1 Dual-Ported Disks Networ
Page 111 and 112:
Figure 6-3 Configuration with Clust
Page 113 and 114:
Cluster Storage Devices 6.2 Naming
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Cluster Storage Devices 6.3 MSCP an
Page 129 and 130:
Cluster Storage Devices 6.4 MSCP I/
Page 131 and 132:
Cluster Storage Devices 6.5 Managin
Page 133 and 134:
Cluster Storage Devices 6.5 Managin
Page 135 and 136:
Cluster Storage Devices 6.6 Shadowi
Page 137 and 138:
7.1 Introduction 7 Setting Up and M
Page 139 and 140:
Setting Up and Managing Cluster Que
Page 141 and 142:
Figure 7-1 Sample Printer Configura
Page 143 and 144:
Figure 7-2 Print Queue Configuratio
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153:
Page 156 and 157:
Configuring an OpenVMS Cluster Syst
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 199 and 200:
9 Building Large OpenVMS Cluster Sy
Page 201 and 202:
Building Large OpenVMS Cluster Syst
Page 203 and 204:
Table 9-3 (Cont.) Checklist for Sat
Page 205 and 206:
Page 207 and 208:
Page 209 and 210:
Table 9-8 (Cont.) Controlling Satel
Page 211 and 212:
Table 9-8 (Cont.) Controlling Satel
Page 213 and 214:
Potential Hot Files Methods to Help
Page 215 and 216:
Page 217 and 218:
Page 219 and 220:
10 Maintaining an OpenVMS Cluster S
Page 221 and 222:
Maintaining an OpenVMS Cluster Syst
Page 223 and 224:
Example 10-1 Sample NETNODE_UPDATE.
Page 225 and 226:
Page 227 and 228:
Page 229 and 230:
• CLUSTER_SHUTDOWN • REBOOT_CHE
Page 231 and 232:
Page 233 and 234:
Page 235 and 236:
Page 237 and 238:
10.12 Restoring Cluster Quorum Main
Page 239 and 240:
Command Purpose Maintaining an Open
Page 241:
Page 244 and 245:
Cluster System Parameters A.1 Value
Page 246 and 247:
Page 248 and 249:
Page 250 and 251:
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
Page 258 and 259:
Building Common Files B.1 Building
Page 261 and 262:
C.1 Diagnosing Computer Failures C
Page 263 and 264:
Table C-1 (Cont.) Sequence of Booti
Page 265 and 266:
Step Action Cluster Troubleshooting
Page 267 and 268:
Page 269 and 270:
Cluster Troubleshooting C.3 Satelli
Page 271 and 272:
Page 273 and 274:
Table C-2 (Cont.) Alpha Booting Mes
Page 275 and 276:
IF... THEN... The startup procedure
Page 277 and 278: Possible Bugcheck Causes Recommenda
Page 279 and 280: Cluster Troubleshooting C.10 Diagno
Page 285 and 286: Table C-5 Informational and Other E
Page 287 and 288: Entry Description Cluster Troublesh
Page 293 and 294: Table C-6 (Cont.) Port Messages for
Page 299 and 300: Table C-8 (Cont.) OPA0 Messages Har
Page 301 and 302: D Sample Programs for LAN Control S
Page 303 and 304: Sample Programs for LAN Control D.3
Page 309 and 310: Location Action Sample Programs for
Page 311: Sample Programs for LAN Control D.5
Page 314 and 315: Subroutines for LAN Control E.2 Sta
Page 316 and 317: Subroutines for LAN Control E.3 Sto
Page 318 and 319: Subroutines for LAN Control E.4 Cre
Page 320 and 321: Subroutines for LAN Control E.5 Cre
Page 322 and 323: Subroutines for LAN Control E.7 Sto
Page 324 and 325: Troubleshooting the NISCA Protocol
Page 359 and 360: G NISCA Transport Protocol Channel
Page 361 and 362: NISCA Transport Protocol Channel Se
Page 363 and 364: NISCA Transport Protocol Channel Se
Page 365 and 366: A Access control lists See ACLs ACL
Page 367 and 368: CLUSTER_AUTHORIZE.DAT files (cont
Page 369 and 370: Disks system (cont’d) creating du
Page 371 and 372: HSZ subsystems, 1-4 I Installation
Page 373 and 374: MODPARAMS.DAT file (cont’d) examp
Page 375 and 376: Packets capturing data, F-26 maximu
Page 377 and 378: SAVE_FEEDBACK option, 10-10 SCA (Sy
Page 379 and 380:
System parameters (cont’d) MPDEV_
Page 381:
Voting members, 2-6 adding, 8-10, 8
show all

OpenVMS Cluster Systems - OpenVMS Systems - HP

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?