23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.2.3 Downed Routers<br />

There are two mechanisms to update the health status of a peer or a router:<br />

■<br />

■<br />

LNET can actively check health status of all routers and mark them as dead or<br />

alive automatically. By default, this is off. To enable it set auto_down and if<br />

desired check_routers_before_use. This initial check may cause a pause<br />

equal to router_ping_timeout at system startup, if there are dead routers in<br />

the system.<br />

When there is a communication error, all LNDs notify LNET that the peer (not<br />

necessarily a router) is down. This mechanism is always on, and there is no<br />

parameter to turn it off. However, if you set the LNET module parameter<br />

auto_down to 0, LNET ignores all such peer-down notifications.<br />

Several key differences in both mechanisms:<br />

■ The router pinger only checks routers for their health, while LNDs notices all<br />

dead peers, regardless of whether they are a router or not.<br />

■ The router pinger actively checks the router health by sending pings, but LNDs<br />

only notice a dead peer when there is network traffic going on.<br />

■ The router pinger can bring a router from alive to dead or vice versa, but LNDs<br />

can only bring a peer down.<br />

Chapter 5 Configuring the <strong>Lustre</strong> Network 5-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!