22.10.2013 Views

7.8.1.0 - Force10 Networks

7.8.1.0 - Force10 Networks

7.8.1.0 - Force10 Networks

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C-Series RPMs have one CPU, the CP. The CP on the RPM communicates with the LP via IPC. Like the<br />

the E-Series, The CP monitors the health status of the other processors by sending a heartbeat message. If<br />

any CPU fails to acknowlege a consecutive number of heartbeat messages, or the CP itself fails to send<br />

heartbeat messages (IPC timeout), the primary RPM requests a failover to the standby RPM, and FTOS<br />

displays a message similar to Message 7.<br />

Message 7 RPM Failover due to IPC Timeout<br />

%RPM1-P:CP %IPC-2-STATUS: target rp2 not responding<br />

%RPM0-S:CP %RAM-6-FAILOVER_REQ: RPM failover request from active peer: Auto failover on<br />

failure<br />

%RPM0-S:CP %RAM-6-ELECTION_ROLE: RPM0 is transitioning to Primary RPM.<br />

%RPM0-P:CP %TSM-6-SFM_SWITCHFAB_STATE: Switch Fabric: UP<br />

In addition to IPC, the CP on the each RPM sends heartbeat messages to the CP on its peer RPM via a<br />

process called Inter-RPM Communication (IRC). If the primary RPM fails to acknowledge a consecutive<br />

number of heartbeat messges (IRC timeout), the standby RPM responds by assuming the role of primary<br />

RPM, and FTOS dispays message similar to message Message 8.<br />

Message 8 RPM Failover due to IRC Timeout<br />

20:29:07: %RPM1-S:CP %IRC-4-IRC_WARNLINKDN: Keepalive packet 7 to peer RPM is lost<br />

20:29:07: %RPM1-S:CP %IRC-4-IRC_COMMDOWN: Link to peer RPM is down<br />

%RPM1-S:CP %RAM-4-MISSING_HB: Heartbeat lost with peer RPM. Auto failover on heart beat lost.<br />

%RPM1-S:CP %RAM-6-ELECTION_ROLE: RPM1 is transitioning to Primary RPM.<br />

IPC and IRC timeouts and failover behavior<br />

IPC or IRC timeouts can occur because heartbeat messages and acknowlegements are lost or arrive out of<br />

sequence, or a software or hardware failure occurs that impacts IPC or IRC. Table 20 describes the failover<br />

behavior for the possible failure scenarios.<br />

Table 20 Failover Behaviors<br />

Platform Failover Trigger Failover Behavior<br />

c e<br />

c e<br />

e<br />

CP task crash on the primary<br />

RPM<br />

CP IRC timeout for a non-task<br />

crash reason on the primary RPM<br />

RP task or kernel crash on the<br />

primary RPM<br />

The standby RPM detects the IRC time out and initiates<br />

failover, and the failed RPM reboots itself after saving a CP<br />

application core dump.<br />

The standby RPM detects IRC time out and initiates failover.<br />

FTOS saves a CP trace log, the CP IPC-related system status,<br />

and a CP application core dump. Then the failed RPM reboots<br />

itself.<br />

CP on the primary RPM detects the RP IPC timeout and<br />

notifies the standby RPM. The standby RPM initiates a failover.<br />

FTOS saves an RP application or kernel core dump, the CP<br />

trace log, and the CP IPC-related system status. Then the new<br />

primary RPM reboots the failed RPM.<br />

FTOS Configuration Guide, version <strong>7.8.1.0</strong> 315

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!