23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

28.5 Handling Timeouts<br />

Timeouts are the most common cause of hung applications. After a timeout<br />

involving an MDS or failover OST, applications attempting to access the<br />

disconnected resource wait until the connection gets established.<br />

When a client performs any remote operation, it gives the server a reasonable<br />

amount of time to respond. If a server does not reply either due to a down network,<br />

hung server, or any other reason, a timeout occurs which requires a recovery.<br />

If a timeout occurs, a message (similar to this one), appears on the console of the<br />

client, and in /var/log/messages:<br />

<strong>Lustre</strong>Error: 26597:(client.c:810:ptlrpc_expire_one_request()) @@@ timeout<br />

req@a2d45200 x5886/t0 o38->mds_svc_UUID@NID_mds_UUID:12 lens 168/64 ref 1 fl<br />

RPC:/0/0 rc 0<br />

28-22 <strong>Lustre</strong> <strong>1.6</strong> <strong>Operations</strong> <strong>Manual</strong> • September 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!