27.12.2014 Views

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

F–Troubleshooting<br />

<strong>QLogic</strong> MPI Troubleshooting<br />

NOTE:<br />

It is important that /dev/shm be writable by all users, or else error<br />

messages like the ones in this section can be expected. Also, non-<strong>QLogic</strong><br />

MPIs that use PSM may be more prone to stale shared memory files when<br />

processes are abnormally terminated.<br />

gdb Gets SIG32 Signal Under mpirun -debug with the<br />

PSM Receive Progress Thread Enabled<br />

When you run mpirun -debug and the PSM receive progress thread is enabled,<br />

gdb (the GNU debugger) reports the following error:<br />

(gdb) run<br />

Starting program: /usr/bin/osu_bcast < /dev/null [Thread debugging<br />

using libthread_db enabled] [New Thread 46912501386816 (LWP<br />

13100)] [New Thread 1084229984 (LWP 13103)] [New Thread 1094719840<br />

(LWP 13104)]<br />

Program received signal SIG32, Real-time event 32.<br />

[Switching to Thread 1084229984 (LWP 22106)] 0x00000033807c0930 in<br />

poll () from /lib64/libc.so.6<br />

This signal is generated when the main thread cancels the progress thread. To fix<br />

this problem, disable the receive progress thread when debugging an MPI<br />

program. Add the following line to $HOME/.mpirunrc:<br />

export PSM_RCVTHREAD=0<br />

NOTE:<br />

Remove the above line from $HOME/.mpirunrc after you debug an MPI<br />

program. If this line is not removed, the PSM receive progress thread will be<br />

permanently disabled. To check if the receive progress thread is enabled,<br />

look for output similar to the following when using the mpirun -verbose<br />

flag:<br />

idev-17:0.env PSM_RCVTHREAD Recv thread flags<br />

0 disables thread) => 0x1<br />

The value 0x1 indicates that the receive thread is currently enabled. A value<br />

of 0x0 indicates that the receive thread is disabled.<br />

F-24 D000046-005 B

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!