23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

18.2.4 Network Partition<br />

The partition can be transient. <strong>Lustre</strong> recovery occurs in following sequence:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

Clients can detect "harmless partition" upon reconnecting. Dropped-reply cases<br />

require ReplyReconstruction.<br />

Servers evict clients.<br />

ClientUpcall may try other routers. The arbitrary configuration change is possible<br />

the message ’Failed Recovery - ENOTCONN’ is given for evicted clients.<br />

Process invalidates all entries and locks. Eventually, the filesystem finishes<br />

recovering and returns to normal operation. You may check the progress of <strong>Lustre</strong><br />

recovery by looking at the recovery_status proc entry for each device on the OSSs,<br />

for example: cat /proc/fs/lustre/obdfilter/ost1/recovery_status<br />

The filesystem may get stuck in recovery if any servers are down or if any servers<br />

have thrown a <strong>Lustre</strong> bug (LBUG); check /proc/fs/lustre/health_check.<br />

18-4 <strong>Lustre</strong> <strong>1.6</strong> <strong>Operations</strong> <strong>Manual</strong> • September 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!