VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />
2014-08-24T17:00:30.912Z cpu33:7027198)WARNING: LSOM: LSOMEventNotify:4570: <strong>VSAN</strong><br />
device 52378176-a9da-7bce-0526-cdf1d863b3b5 is under permanent error.<br />
2014-08-24T17:00:30.912Z cpu33:7027198)WARNING: LSOM: RCVmfsIoCompletion:99:<br />
Throttled: VMFS IO failed. Wake up 0x4136af9a69c0 with status Maximum kernel-level<br />
retries exceeded<br />
2014-08-24T17:00:30.912Z cpu33:7027198)WARNING: LSOM: RCDrainAfterBERead:5070:<br />
Changing the status of child state from Success to Maximum kernel-level retries<br />
exceeded<br />
Eventually, the firmware reset on the controller completed, but by this time, it is too<br />
late and Virtual SAN had already marked the disks as failed:<br />
2014-08-24T17:00:49.279Z cpu21:33542)megasas: FW now in Ready state<br />
2014-08-24T17:00:49.299Z cpu21:33542)megasas:IOC Init cmd success<br />
2014-08-24T17:00:49.320Z cpu36:33542)megaraid_sas: Reset successful.<br />
When a controller ‘wedges’ like this, Virtual SAN will retry I/O for a finite amount of<br />
time. In this case, it took a full 24 seconds (2400ms) for the adapter to come back<br />
online after resetting. This was too long for Virtual SAN, which meant that the<br />
maximum retries threshold had been exceeded. This in turn led to Virtual SAN<br />
marking the disks as DEGRADED.<br />
Virtual SAN is responding as designed here. The problem is that the firmware<br />
crashed. This particular issue was resolved by using recommended versions of<br />
MegaRAID driver and firmware as per the VMware Compatibility Guide.<br />
Storage controller replacement<br />
In general, controller replacement should be for the same make and model and<br />
administrators should not be swapping a pass-through controller with a RAID 0<br />
controller or vice-versa.<br />
Expectations when a drive is reporting errors<br />
In this scenario, a disk drive is reporting errors due to bad blocks. If a read I/O<br />
accessing component data on behalf of a virtual machine fails in such a manner,<br />
Virtual SAN will check other replicas of the component to satisfy the read. If another<br />
mirror can satisfy the read, Virtual SAN will attempt to write the good data onto the<br />
disk that reported the bad read. If this procedure succeeds, the disk does not enter a<br />
DEGRADED state. Note that this is not the behavior if we get a read error when<br />
accessing Virtual SAN metadata. If a read of metadata, or any write fails, Virtual SAN<br />
will mark all components of the disk as DEGRADED. It treats the disk as failed and<br />
the data is no longer usable. Upon entering the DEGRADED state, Virtual SAN will<br />
restore I/O flow immediately (by taking the bad component out of the active set of<br />
the effected object) and try to re-protect the object by creating a new replica of the<br />
component somewhere else in the cluster.<br />
V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 1 6 2