23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

live_router_check_interval, dead_router_check_interval, auto_down,<br />

check_routers_before_use and router_ping_timeout<br />

In a routed <strong>Lustre</strong> setup with nodes on different networks such as TCP/IP and Elan,<br />

the router checker checks the status of a router. Currently, only the clients using the<br />

sock LND and Elan LND avoid failed routers. We are working on extending this<br />

behavior to include all types of LNDs. The auto_down parameter enables/disables<br />

(1/0) the automatic marking of router state.<br />

The live_router_check_interval parameter specifies a time interval in seconds<br />

after which the router checker will ping the live routers.<br />

In the same way, you can set the dead_router_check_interval parameter for<br />

checking dead routers.<br />

You can set the timeout for the router checker to check the live or dead routers by<br />

setting the router_ping_timeout parmeter. The Router pinger sends a ping<br />

message to a dead/live router once every dead/live_router_check_interval<br />

seconds, and if it does not get a reply message from the router within<br />

router_ping_timeout seconds, it considers the router to be down.<br />

The last parameter is check_routers_before_use, which is off by default. If it is<br />

turned on, you must also give dead_router_check_interval a positive integer value.<br />

The router checker gets the following variables for each router:<br />

■<br />

■<br />

Last time that it was disabled<br />

Duration of time for which it is disabled<br />

The initial time to disable a router should be one minute (enough to plug in a cable<br />

after removing it). If the router is administratively marked as "up", then the router<br />

checker clears the timeout. When a route is disabled (and possibly new), the "sent<br />

packets" counter is set to 0. When the route is first re-used (that is an elapsed disable<br />

time is found), the sent packets counter is incremented to 1, and incremented for all<br />

further uses of the route. If the route has been used for 100 packets successfully, then<br />

the sent-packets counter should be with a value of 100. Set the timeout to 0 (zero), so<br />

future errors no longer double the timeout.<br />

Note – The router_ping_timeout is consistent with the default LND timeouts.<br />

You may have to increase it on very large clusters if the LND timeout is also<br />

increased. For larger clusters, we suggest increasing the check interval.<br />

Chapter 5 Configuring the <strong>Lustre</strong> Network 5-7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!