09.03.2015 Views

VSAN-Troubleshooting-Reference-Manual

VSAN-Troubleshooting-Reference-Manual

VSAN-Troubleshooting-Reference-Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Diagnostics and <strong>Troubleshooting</strong> <strong>Reference</strong> <strong>Manual</strong> – Virtual SAN<br />

7. Understanding expected failure behavior<br />

When doing failure testing with Virtual SAN, or indeed, when failures occur on the<br />

cluster during production, it is important to understand the expected behavior for<br />

different failure scenarios.<br />

Disk is pulled unexpectedly from ESXi host<br />

The Virtual SAN Administrators Guide provides guidelines on how to correctly<br />

decommission a device from a disk group. VMware recommends following tried and<br />

tested procedures for removing devices from Virtual SAN.<br />

In Virtual SAN hybrid configurations, when a magnetic disk is unexpectedly pulled<br />

from an ESXi hosts (where it is being used to contribute storage to Virtual SAN)<br />

without first decommissioning the disk, all the Virtual SAN components that reside<br />

on the disk go ABSENT and are inaccessible. The same holds true for Virtual SAN allflash<br />

configurations when a flash device in the capacity layer is pulled.<br />

The ABSENT state is chosen over DEGRADED because Virtual SAN knows the disk is<br />

not lost, but rather just removed. If the disk gets reinserted into the server before<br />

the 60-minute timeout, no harm is done, and Virtual SAN simply syncs it back up.<br />

Virtual SAN returns to full redundancy without wasting resources on an expensive<br />

rebuild.<br />

Expected behaviors:<br />

If the VM has a policy that includes NumberOfFailuresToTolerate=1 or greater,<br />

the VM’s objects will still be accessible from another ESXi host in the Virtual<br />

SAN cluster.<br />

The disk state is marked as ABSENT and can be verified via vSphere web<br />

client UI.<br />

All in-flight I/O is halted while Virtual SAN reevaluates the availability of the<br />

object without the failed component as part of the active set of components.<br />

All components residing on this disk will be marked as ABSENT in the UI.<br />

If Virtual SAN concludes that the object is still available (based on available<br />

full mirror copy and witness/votes), in-flight I/O is restarted.<br />

The typical time from physical removal of the drive, Virtual SAN processing<br />

this event, marking the component ABSENT halting and restoring I/O flow is<br />

approximately 5-7 seconds.<br />

If the same disk is placed back on the same host within 60 minutes, no new<br />

components will be re-built.<br />

If 60 minutes pass, and the original disk have not been reinserted in the host,<br />

components on the removed disk will be built elsewhere in the cluster, if<br />

V M W A R E S T O R A G E B U D O C U M E N T A T I O N / 62

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!