23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.1.5 Roles of Nodes in a Failover<br />

A failover pair of nodes can be configured in two ways – active / active and<br />

active / passive. An active node actively serves data while a passive node is idle,<br />

standing by to take over in the event of a failure. In the following example, using<br />

two OSTs (both of which are attached to the same shared disk device), the following<br />

failover configurations are possible:<br />

■ active / passive - This configuration has two nodes out of which only one is<br />

actively serving data all the time.<br />

In case of a failure, the other node takes over.If the active node fails, the OST in<br />

use by the active node will be taken over by the passive node, which now<br />

becomes active. This node serves most services that were on the failed node.<br />

■<br />

active / active - This configuration has two nodes actively serving data all the<br />

time. In case of a failure, one node takes over for the other.<br />

To configure this for the shared disk, the shared disk must provide multiple<br />

partitions; each OST is the primary server for one partition and the secondary<br />

server for the other partition. The active / passive configuration doubles the<br />

hardware cost without improving performance, and is seldom used for OST<br />

servers.<br />

8.2 OST Failover<br />

The OST has two operating modes: failover and failout. The default mode is<br />

failover. In this mode, the clients reconnect after a failure, and the transactions,<br />

which were in progress, are completed. Data on the OST is written synchronously,<br />

and the client replays uncommitted transactions after the failure.<br />

In the failout mode, when any communication error occurs, the client attempts to<br />

reconnect, but is unable to continue with the transactions that were in progress<br />

during the failure. Also, if the OST actually fails, data that has not been written to<br />

the disk (still cached on the client) is lost. Applications usually see an EIO for<br />

operations done on that OST until the connection is reestablished. However, the<br />

LOV layer on the client avoids using that OST. Hence, the operations such as file<br />

creates and fsstat still succeed. The failover mode is the current default, while the<br />

failout mode is seldom used.<br />

Chapter 8 Failover 8-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!