VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />
What happens when the entire cluster network fails?<br />
A failure of this nature will result is a complete network partition, with each host<br />
residing in its own partition. To an isolated host, it will look like all the other hosts<br />
have failed. Behavior is similar to what was described in the previous section, but in<br />
this case, since no quorum can be achieved for any object, no rebuilding takes place.<br />
When the network issue is resolved, Virtual SAN will first establish a new cluster<br />
and components will start to resync. Since the whole cluster was down, there won’t<br />
have been many changes, but Virtual SAN ensures that components are<br />
synchronized against the latest, most up to date copy of a component.<br />
What happens when a storage I/O controller fails?<br />
If a controller in a single controller configuration fails with a permanent failure, then<br />
every disk group on the host will be impacted. This condition should be reasonably<br />
easy to diagnose. The behavior of Virtual SAN when a storage I/O controller fails<br />
will be similar to having all-flash cache devices and all disks fail in all disk groups. As<br />
described above, components will be marked as DEGRADED in this situation<br />
(permanent error) and component rebuilding should be immediate.<br />
It might be difficult to determine if there is a flash cache device failure or a storage<br />
I/O controller failure in a single controller configuration, where there is also only<br />
one disk group with one flash device in the host. Both failures impact the whole disk<br />
group. The VMkernel log files on the ESXi host may be able to assist with locating<br />
the root cause. Also, leveraging third party tools from the controller vendor to query<br />
the status of the hardware might be helpful.<br />
If there are multiple disk groups on a host with a single controller, and all devices in<br />
both disk groups are impacted, then you might assume that the common controller<br />
is a root cause.<br />
If there is a single disk group on a host with a single controller, and all devices in<br />
that disk group are impacted, additional research will be necessary to determine if<br />
the storage I/O controller is the culprit, or if it is the flash cache device that is at<br />
fault.<br />
Lastly, if there are multiple controllers on the host, and only the devices sitting<br />
behind one controller is impacted, then you might assume that this controller is a<br />
root cause.<br />
V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 71