27.12.2014 Views

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

F–Troubleshooting<br />

<strong>QLogic</strong> MPI Troubleshooting<br />

General Error Messages<br />

The following message may be generated by ipath_checkout or mpirun:<br />

PSM found 0 available contexts on InfiniPath device<br />

The most likely cause is that the cluster has processes using all the available<br />

PSM contexts.<br />

Error Messages Generated by mpirun<br />

The following sections describe the mpirun error messages. These messages<br />

are in one of these categories:<br />

• Messages from the <strong>QLogic</strong> MPI (InfiniPath) library<br />

• MPI messages<br />

• Messages relating to the InfiniPath driver and InfiniBand links<br />

Messages generated by mpirun follow this format:<br />

program_name: message<br />

function_name: message<br />

Messages can also have different prefixes, such as ipath_ or psm_, which<br />

indicate where in the software the errors are occurring.<br />

Messages from the <strong>QLogic</strong> MPI (InfiniPath) Library<br />

Messages from the <strong>QLogic</strong> MPI (InfiniPath) library appear in the mpirun output.<br />

The following example contains rank values received during connection setup that<br />

were higher than the number of ranks (as indicated in the mpirun startup code):<br />

sender rank rank is out of range (notification)<br />

sender rank rank is out of range (ack)<br />

The following are error messages that indicate internal problems and must be<br />

reported to Technical Support.<br />

unknown frame type type<br />

[n] Src lid error: sender: x, exp send: y<br />

Frame receive from unknown sender. exp. sender = x, came from y<br />

Failed to allocate memory for eager buffer addresses: str<br />

The following error messages usually indicate a hardware or connectivity<br />

problem:<br />

Failed to get IB Unit LID for any unit<br />

Failed to get our IB LID<br />

Failed to get number of Infinipath units<br />

In these cases, try to reboot. If that does not work, call Technical Support.<br />

D000046-005 B F-25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!