VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
VSAN-Troubleshooting-Reference-Manual
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />
7. Understanding expected failure behavior<br />
When doing failure testing with Virtual SAN, or indeed, when failures occur on the<br />
cluster during production, it is important to understand the expected behavior for<br />
different failure scenarios.<br />
Disk is pulled unexpectedly from ESXi host<br />
The Virtual SAN Administrators Guide provides guidelines on how to correctly<br />
decommission a device from a disk group. VMware recommends following tried and<br />
tested procedures for removing devices from Virtual SAN.<br />
In Virtual SAN hybrid configurations, when a magnetic disk is unexpectedly pulled<br />
from an ESXi hosts (where it is being used to contribute storage to Virtual SAN)<br />
without first decommissioning the disk, all the Virtual SAN components that reside<br />
on the disk go ABSENT and are inaccessible. The same holds true for Virtual SAN allflash<br />
configurations when a flash device in the capacity layer is pulled.<br />
The ABSENT state is chosen over DEGRADED because Virtual SAN knows the disk is<br />
not lost, but rather just removed. If the disk gets reinserted into the server before<br />
the 60-minute timeout, no harm is done, and Virtual SAN simply syncs it back up.<br />
Virtual SAN returns to full redundancy without wasting resources on an expensive<br />
rebuild.<br />
Expected behaviors:<br />
If the VM has a policy that includes NumberOfFailuresToTolerate=1 or greater,<br />
the VM’s objects will still be accessible from another ESXi host in the Virtual<br />
SAN cluster.<br />
The disk state is marked as ABSENT and can be verified via vSphere web<br />
client UI.<br />
All in-flight I/O is halted while Virtual SAN reevaluates the availability of the<br />
object without the failed component as part of the active set of components.<br />
All components residing on this disk will be marked as ABSENT in the UI.<br />
If Virtual SAN concludes that the object is still available (based on available<br />
full mirror copy and witness/votes), in-flight I/O is restarted.<br />
The typical time from physical removal of the drive, Virtual SAN processing<br />
this event, marking the component ABSENT halting and restoring I/O flow is<br />
approximately 5-7 seconds.<br />
If the same disk is placed back on the same host within 60 minutes, no new<br />
components will be re-built.<br />
If 60 minutes pass, and the original disk have not been reinserted in the host,<br />
components on the removed disk will be built elsewhere in the cluster, if<br />
V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 62