23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

23.2.8 Adding Debugging to the <strong>Lustre</strong> Source Code<br />

In the <strong>Lustre</strong> source code, the debug infrastructure provides a number of macros<br />

which aid in debugging or reporting serious errors. All of these macros depend on<br />

having the DEBUG_SUBSYSTEM variable set at the top of the file:<br />

#define DEBUG_SUBSYSTEM S_PORTALS<br />

■ LBUG: A panic-style assertion in the kernel which causes <strong>Lustre</strong> to dump its<br />

circular log to the /tmp/lustre-log file. This file can be retrieved after a reboot.<br />

LBUG freezes the thread to allow capture of the panic stack. A system reboot is<br />

needed to clear the thread.<br />

■ LASSERT: Validates a given expression as true, otherwise calls LBUG. The failed<br />

expression is printed on the console, although the values that make up the<br />

expression are not printed.<br />

■ LASSERTF: Similar to LASSERT but allows a free-format message to be printed,<br />

like printf/printk.<br />

■ CDEBUG: The basic, most commonly used debug macro that takes just one more<br />

argument than standard printf - the debug type. This message adds to the debug<br />

log with the debug mask set accordingly. Later, when a user retrieves the log for<br />

troubleshooting, they can filter based on this type.<br />

CDEBUG(D_INFO, "This is my debug message: the number is %d\n", number).<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

CERROR: Behaves similarly to CDEBUG, but unconditionally prints the message<br />

in the debug log and to the console. This is appropriate for serious errors or fatal<br />

conditions:<br />

CERROR("Something very bad has happened, and the return code is %d.\n", rc);<br />

ENTRY and EXIT: Add messages to aid in call tracing (takes no arguments).<br />

When using these macros, cover all exit conditions to avoid confusion when the<br />

debug log reports that a function was entered, but never exited.<br />

LDLM_DEBUG and LDLM_DEBUG_NOLOCK: Used when tracing MDS and VFS<br />

operations for locking. These macros build a thin trace that shows the protocol<br />

exchanges between nodes.<br />

DEBUG_REQ: Prints information about the given ptlrpc_request structure.<br />

OBD_FAIL_CHECK: Allows insertion of failure points into the <strong>Lustre</strong> code. This<br />

is useful to generate regression tests that can hit a very specific sequence of<br />

events. This works in conjunction with "sysctl -w lustre.fail_loc={fail_loc}" to set a<br />

specific failure point for which a given OBD_FAIL_CHECK will test.<br />

OBD_FAIL_TIMEOUT: Similar to OBD_FAIL_CHECK. Useful to simulate hung,<br />

blocked or busy processes or network devices. If the given fail_loc is hit,<br />

OBD_FAIL_TIMEOUT waits for the specified number of seconds.<br />

23-10 <strong>Lustre</strong> <strong>1.6</strong> <strong>Operations</strong> <strong>Manual</strong> • September 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!