23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Note – Changing adaptive timeouts status at runtime may cause transient timeout,<br />

reconnect, recovery, etc.<br />

20.1.3.2 Interpreting Adaptive Timeout Information<br />

Adaptive timeout information can be read from /proc/fs/lustre/*/timeouts files<br />

(for each service and client) or with the lctl tool.<br />

This is an example from /proc/fs/lustre/*/timeouts files:<br />

cfs21:~# cat /proc/fs/lustre/ost/OSS/ost_io/timeouts<br />

This is an example using the lctl tool:<br />

$ lctl get_param -n ost.*.ost_io.timeouts<br />

This is the sample output:<br />

service : cur 33 worst 34 (at 1193427052, 0d0h26m40s ago) 1 1 33 2<br />

The ost_io service on this node is currently reporting an estimate of 33 seconds.<br />

The worst RPC service time was 34 seconds, and it happened 26 minutes ago.<br />

The output also provides a history of service times. In the example, there are 4 "bins"<br />

of adaptive_timeout_history, with the maximum RPC time in each bin<br />

reported. In 0-150 seconds, the maximum RPC time was 1, with the same result in<br />

150-300 seconds. From 300-450 seconds, the worst (maximum) RPC time was 33<br />

seconds, and from 450-600s the worst time was 2 seconds. The current estimated<br />

service time is the maximum value of the 4 bins (33 seconds in this example).<br />

Service times (as reported by the servers) are also tracked in the client OBDs:<br />

cfs21:~# cat /proc/fs/lustre/osc/lustre-OST0001-osc-ce129800/timeouts<br />

last reply : 1193428639, 0d0h00m00s ago<br />

network : cur 1 worst 2 (at 1193427053, 0d0h26m26s ago) 1 1 1 1<br />

portal 6 : cur 33 worst 34 (at 1193427052, 0d0h26m27s ago) 33 33 33 2<br />

portal 28 : cur 1 worst 1 (at 1193426141, 0d0h41m38s ago) 1 1 1 1<br />

portal 7 : cur 1 worst 1 (at 1193426141, 0d0h41m38s ago) 1 0 1 1<br />

portal 17 : cur 1 worst 1 (at 1193426177, 0d0h41m02s ago) 1 0 0 1<br />

In this case, RPCs to portal 6, the OST_IO_PORTAL (see<br />

lustre/include/lustre/lustre_idl.h), shows the history of what the<br />

ost_io portal has reported as the service estimate.<br />

20-8 <strong>Lustre</strong> <strong>1.6</strong> <strong>Operations</strong> <strong>Manual</strong> • September 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!