VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />
Observations when a storage controller fails<br />
In this particular case, an LSI MegaRAID storage controller had issues when it was<br />
using old driver/firmware in Virtual SAN. The following are vmkernel.log samples<br />
taken from each of the hosts in the Virtual SAN cluster.<br />
Controller resets:<br />
2014-08-24T17:00:25.940Z cpu29:33542)megasas: Found FW in FAULT state, will reset<br />
adapter.<br />
2014-08-24T17:00:25.940Z cpu29:33542)megaraid_sas: resetting fusion adapter.<br />
I/Os fail due to controller issue (SCSI write is Cmd 0x2a):<br />
2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a<br />
(0x4136803d32c0, 0) to dev "naa.50015178f3636429" on path "vmhba0:C0:T4:L0" Failed:<br />
H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL<br />
2014-08-24T17:00:25.940Z cpu34:9429858)WARNING: NMP:<br />
nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.50015178f3636429" state in<br />
doubt; requested fast path state update...<br />
2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x41368093bd80) 0x2a,<br />
CmdSN 0x648c4f3b from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />
Possible sense data: 0x0 0x0 0x0.<br />
2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x4136e17d15c0) 0x2a,<br />
CmdSN 0x648c4ee8 from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />
Possible sense data: 0x0 0x0 0x0.<br />
2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a<br />
(0x4136e2370d40, 0) to dev "naa.50015178f3636429" on path "vmhba0:C0:T4:L0" Failed:<br />
H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL<br />
2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x41370c3043c0) 0x2a,<br />
CmdSN 0x648c4f3a from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />
Possible sense data: 0x0 0x0 0x0.<br />
2014-08-24T17:00:25.940Z cpu34:9429858)ScsiDeviceIO: 2324: Cmd(0x4136e17d4680) 0x2a,<br />
CmdSN 0x648c4eeb from world 0 to dev "naa.50015178f3636429" failed H:0x8 D:0x0 P:0x0<br />
Possible sense data: 0x0 0x0 0x0.<br />
2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a<br />
(0x4136e07e1700, 0) to dev "naa.50015178f3636429" on path "vmhba0:C0:T4:L0" Failed:<br />
H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL<br />
2014-08-24T17:00:25.940Z cpu34:9429858)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28<br />
(0x4136e884c500, 0) to dev "naa.5000c500583c4b1f" on path "vmhba0:C0:T6:L0" Failed:<br />
H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL 3)<br />
When the MegaRAID firmware is resetting, it simply takes too long for Virtual SAN<br />
to hold off, and eventually fails because it simply took to long for the controller to<br />
come back online. The retries are visible in the logs, as well as the maximum<br />
number of kernel-level retries exceeded:<br />
2014-08-24T17:00:30.845Z cpu38:33542)megasas: Waiting for FW to come to ready state<br />
[ … ]<br />
2014-08-24T17:00:30.912Z cpu20:33167)LSOMCommon: IORETRYCompleteIO:389: Throttled:<br />
0x413701b8cc40 IO type 265 (READ) isOdered:NO since 30001 msec status Maximum kernellevel<br />
retries exceeded<br />
V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 1 6 1