09.03.2015 Views

VSAN-Troubleshooting-Reference-Manual

VSAN-Troubleshooting-Reference-Manual

VSAN-Troubleshooting-Reference-Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />

What happens when the entire cluster network fails?<br />

A failure of this nature will result is a complete network partition, with each host<br />

residing in its own partition. To an isolated host, it will look like all the other hosts<br />

have failed. Behavior is similar to what was described in the previous section, but in<br />

this case, since no quorum can be achieved for any object, no rebuilding takes place.<br />

When the network issue is resolved, Virtual SAN will first establish a new cluster<br />

and components will start to resync. Since the whole cluster was down, there won’t<br />

have been many changes, but Virtual SAN ensures that components are<br />

synchronized against the latest, most up to date copy of a component.<br />

What happens when a storage I/O controller fails?<br />

If a controller in a single controller configuration fails with a permanent failure, then<br />

every disk group on the host will be impacted. This condition should be reasonably<br />

easy to diagnose. The behavior of Virtual SAN when a storage I/O controller fails<br />

will be similar to having all-flash cache devices and all disks fail in all disk groups. As<br />

described above, components will be marked as DEGRADED in this situation<br />

(permanent error) and component rebuilding should be immediate.<br />

It might be difficult to determine if there is a flash cache device failure or a storage<br />

I/O controller failure in a single controller configuration, where there is also only<br />

one disk group with one flash device in the host. Both failures impact the whole disk<br />

group. The VMkernel log files on the ESXi host may be able to assist with locating<br />

the root cause. Also, leveraging third party tools from the controller vendor to query<br />

the status of the hardware might be helpful.<br />

If there are multiple disk groups on a host with a single controller, and all devices in<br />

both disk groups are impacted, then you might assume that the common controller<br />

is a root cause.<br />

If there is a single disk group on a host with a single controller, and all devices in<br />

that disk group are impacted, additional research will be necessary to determine if<br />

the storage I/O controller is the culprit, or if it is the flash cache device that is at<br />

fault.<br />

Lastly, if there are multiple controllers on the host, and only the devices sitting<br />

behind one controller is impacted, then you might assume that this controller is a<br />

root cause.<br />

V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!