23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

20.1.3 Adaptive Timeouts in <strong>Lustre</strong><br />

<strong>Lustre</strong> <strong>1.6</strong>.5 introduces an adaptive mechanism to set RPC timeouts. This feature<br />

causes servers to track actual RPC completion times, and to report estimated<br />

completion times for future RPCs back to clients. The clients use these estimates to<br />

set their future RPC timeout values. If server request processing slows down for any<br />

reason, the RPC completion estimates increase, and the clients allow more time for<br />

RPC completion.<br />

If RPCs queued on the server approach their timeouts, then the server sends an early<br />

reply to the client, telling the client to allow more time. In this manner, clients avoid<br />

RPC timeouts and disconnect/reconnect cycles. Conversely, as a server speeds up,<br />

RPC timeout values decrease, allowing faster detection of non-responsive servers<br />

and faster attempts to reconnect to a server's failover partner.<br />

Caution – In <strong>Lustre</strong> <strong>1.6</strong>.5, adaptive timeouts are disabled by default in order not to<br />

require users applying this maintenance release to use adaptive timeouts. Adaptive<br />

timeouts will be enabled, by default, in <strong>Lustre</strong> 1.8.<br />

In previous <strong>Lustre</strong> versions, the static obd_timeout (/proc/sys/lustre/timeout)<br />

value was used as the maximum completion time for all RPCs; this value also<br />

affected the client-server ping interval and initial recovery timer. Now, with adaptive<br />

timeouts, obd_timeout is only used for the ping interval and initial recovery<br />

estimate. When a client reconnects during recovery, the server uses the client's<br />

timeout value to reset the recovery wait period; i.e., the server learns how long the<br />

client had been willing to wait, and takes this into account when adjusting the<br />

recovery period.<br />

Chapter 20 <strong>Lustre</strong>Proc 20-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!