23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 18<br />

<strong>Lustre</strong> Recovery<br />

This chapter describes how to recover <strong>Lustre</strong>, and includes the following sections:<br />

■<br />

■<br />

Recovering <strong>Lustre</strong><br />

Types of Failure<br />

<strong>Lustre</strong> offers substantial recovery support to deal with node or network failure, and<br />

returns the cluster to a reliable, functional state. When <strong>Lustre</strong> is in recovery mode, it<br />

means that the servers (MDS/OSS), judge there is a stop of filesystem in an unclean<br />

state. In other words, unsaved data may be in the client cache. To save this data, the<br />

filesystem re-starts in recovery mode and makes the clients write the data to disk.<br />

18.1 Recovering <strong>Lustre</strong><br />

In <strong>Lustre</strong> recovery mode, the servers attempt to contact all clients and request they<br />

replay their transactions.<br />

If all clients are contacted and they are recoverable (they have not rebooted), then<br />

recovery proceeds and the filesystem comes back with the cached client-side data<br />

safely saved to disk.<br />

If one or more clients are not able to reconnect (due to hardware failures or client<br />

reboots), then the recovery process times out, which causes all clients to be expelled.<br />

In this case, if there is any unsaved data in the client cache, it is not saved to disk<br />

and is lost. This is an unfortunate side effect of allowing <strong>Lustre</strong> to keep data<br />

consistent on disk.<br />

18-1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!