09.03.2015 Views

VSAN-Troubleshooting-Reference-Manual

VSAN-Troubleshooting-Reference-Manual

VSAN-Troubleshooting-Reference-Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />

Observations when a storage controller fails<br />

In this particular case, an LSI MegaRAID storage controller had issues when it was<br />

using old driver/firmware in Virtual SAN. The following are vmkernel.log samples<br />

taken from each of the hosts in the Virtual SAN cluster.<br />

Controller resets:<br />

2014-08-24T17:00:25.940Z cpu29:33542)megasas: Found FW in FAULT state, will reset<br />

adapter.<br />

2014-08-24T17:00:25.940Z cpu29:33542)megaraid_sas: resetting fusion adapter.<br />

I/Os fail due to controller issue (SCSI write is Cmd 0x2a):<br />

2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a<br />

(0x4136803d32c0, 0) to dev "naa.50015178f3636429" on path "vmhba0:C0:T4:L0" Failed:<br />

H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL<br />

2014-08-24T17:00:25.940Z cpu34:9429858)WARNING: NMP:<br />

nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.50015178f3636429" state in<br />

doubt; requested fast path state update...<br />

2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x41368093bd80) 0x2a,<br />

CmdSN 0x648c4f3b from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />

Possible sense data: 0x0 0x0 0x0.<br />

2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x4136e17d15c0) 0x2a,<br />

CmdSN 0x648c4ee8 from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />

Possible sense data: 0x0 0x0 0x0.<br />

2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a<br />

(0x4136e2370d40, 0) to dev "naa.50015178f3636429" on path "vmhba0:C0:T4:L0" Failed:<br />

H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL<br />

2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x41370c3043c0) 0x2a,<br />

CmdSN 0x648c4f3a from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />

Possible sense data: 0x0 0x0 0x0.<br />

2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x4136e17d4680) 0x2a,<br />

CmdSN 0x648c4eeb from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />

Possible sense data: 0x0 0x0 0x0.<br />

2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a<br />

(0x4136e07e1700, 0) to dev "naa.50015178f3636429" on path "vmhba0:C0:T4:L0" Failed:<br />

H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL<br />

2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28<br />

(0x4136e884c500, 0) to dev "naa.5000c500583c4b1f" on path "vmhba0:C0:T6:L0" Failed:<br />

H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL 3)<br />

When the MegaRAID firmware is resetting, it simply takes too long for Virtual SAN<br />

to hold off, and eventually fails because it simply took to long for the controller to<br />

come back online. The retries are visible in the logs, as well as the maximum<br />

number of kernel-level retries exceeded:<br />

2014-08-24T17:00:30.845Z cpu38:33542)megasas: Waiting for FW to come to ready state<br />

[ … ]<br />

2014-08-24T17:00:30.912Z cpu20:33167)LSOMCommon: IORETRYCompleteIO:389: Throttled:<br />

0x413701b8cc40 IO type 265 (READ) isOdered:NO since 30001 msec status Maximum kernellevel<br />

retries exceeded<br />

V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 1 6 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!