23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

In many cases, the extent of corruption is small (some unlinked files or<br />

directories, or perhaps some parts of an inode table have been wiped out). If<br />

there are serious filesystem problems, e2fsck may need to use a backup<br />

superblock (reports if it does). This causes all of the "group summary"<br />

information to be incorrect. In and of itself, this is not a serious error as this<br />

information is redundant and e2fsck can reconstruct this data. If the primary<br />

superblock is not valid, then there is some corruption at the start of the device<br />

and some amount of data may be lost. The data is somewhat protected from<br />

beginning-of-device corruption (which is one of the more common cases) because<br />

of the large journal placed at the start of the filesystem.<br />

The amount of time taken to run such a check is usually 4 hours for a 1 TB MDS<br />

device or a 2 TB OST device, but varies with the number of files and the amount<br />

of data in the filesystem. If there are severe problems with the filesystem, it can<br />

take 8-12 hours to complete the check.<br />

Depending on the type of corruption, it is sometimes helpful to use debugfs to<br />

examine the filesystem directly and learn more about the corruption.<br />

[root@mds]# script /root/debugfs.sda<br />

[root@mds]# debugfs /dev/sda<br />

debugfs 1.35-lfsk8 (05-Feb-2005)<br />

debugfs> stats<br />

{shows superblock and group summary information}<br />

debugfs> ls<br />

{shows directory listing}<br />

debugfs> stat <br />

{shows inode information for inode number }<br />

debugfs> stat name<br />

{shows inode information for inode "name"}<br />

debugfs> cd dir<br />

{change into directory "dir", "ROOT" is start of <strong>Lustre</strong>-visible namespace}<br />

debugfs> quit<br />

Once you have assessed the damage (possibly with the assistance of <strong>Lustre</strong><br />

Support, depending on the nature of the corruption), then fixing it is the next<br />

step. Often, it is prudent to make a backup of the filesystem metadata (time and<br />

space permitting) in case there is a problem or if it is unclear whether e2fsck will<br />

make the correct action (in most cases it will). To make a metadata backup, run:<br />

[root@mds]# e2image /dev/sda /bigplace/sda.e2image<br />

Appendix D <strong>Lustre</strong> Knowledge Base D-31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!