QLogic OFED+ Host Software User Guide, Rev. B
QLogic OFED+ Host Software User Guide, Rev. B
QLogic OFED+ Host Software User Guide, Rev. B
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
F–Troubleshooting<br />
<strong>QLogic</strong> MPI Troubleshooting<br />
MPI Stats<br />
This message occurs when a cable is disconnected, a switch is rebooted, or when<br />
there are other problems with the link. The job continues retrying until the<br />
quiescence interval expires. See the mpirun -q option for information on<br />
quiescence.<br />
If a hardware problem occurs, an error similar to this displays:<br />
infinipath: [error strings ] Hardware error<br />
In this case, the MPI program terminates. The error string may provide additional<br />
information about the problem. To further determine the source of the problem,<br />
examine syslog on the node reporting the problem.<br />
Using the -print-stats option to mpirun provides a listing to stderr of<br />
various MPI statistics. Here is example output for the -print-stats option<br />
when used with an eight-rank run of the HPCC benchmark, using the following<br />
command:<br />
$ mpirun -np 8 -ppn 1 -m machinefile -M ./hpcc<br />
STATS: MPI Statistics Summary (max,min @ rank)<br />
STATS: Eager count sent (max=171.94K @ 0, min=170.10K @ 3, med=170.20K @ 5)<br />
STATS: Eager bytes sent (max=492.56M @ 5, min=491.35M @ 0, med=491.87M @ 1)<br />
STATS: Rendezvous count sent (max= 5735 @ 0, min= 5729 @ 3, med= 5731 @ 7)<br />
STATS: Rendezvous bytes sent (max= 1.21G @ 4, min= 1.20G @ 2, med= 1.21G @ 0)<br />
STATS: Expected count received(max=173.18K @ 4, min=169.46K @ 1, med=172.71K @ 7)<br />
STATS: Expected bytes received(max= 1.70G @ 1, min= 1.69G @ 2, med= 1.70G @ 7)<br />
STATS: Unexpect count received(max= 6758 @ 0, min= 2996 @ 4, med= 3407 @ 2)<br />
STATS: Unexpect bytes received(max= 1.48M @ 0, min=226.79K @ 5, med=899.08K @ 2)<br />
By default, -M assumes -M=mpi and that the user wants only mpi level statistics.<br />
The man page shows various other low-level categories of statistics that are<br />
provided. Here is another example:<br />
$ mpirun -np 8 -ppn 1 -m machinefile -M=mpi,ipath hpcc<br />
STATS: MPI Statistics Summary (max,min @ rank)<br />
STATS: Eager count sent (max=171.94K @ 0, min=170.10K @ 3, med=170.22K @ 1)<br />
STATS: Eager bytes sent (max=492.56M @ 5, min=491.35M @ 0, med=491.87M @ 1)<br />
STATS: Rendezvous count sent (max= 5735 @ 0, min= 5729 @ 3, med= 5731 @ 7)<br />
STATS: Rendezvous bytes sent (max= 1.21G @ 4, min= 1.20G @ 2, med= 1.21G @ 0)<br />
STATS: Expected count received(max=173.18K @ 4, min=169.46K @ 1, med=172.71K @ 7)<br />
STATS: Expected bytes received(max= 1.70G @ 1, min= 1.69G @ 2, med= 1.70G @ 7)<br />
STATS: Unexpect count received(max= 6758 @ 0, min= 2996 @ 4, med= 3407 @ 2)<br />
STATS: Unexpect bytes received(max= 1.48M @ 0, min=226.79K @ 5, med=899.08K @ 2)<br />
STATS: InfiniPath low-level protocol stats<br />
STATS: pio busy count (max=190.01K @ 0, min=155.60K @ 1, med=160.76K @ 5)<br />
STATS: scb unavail exp count (max= 9217 @ 0, min= 7437 @ 7, med= 7727 @ 4)<br />
STATS: tid update count (max=292.82K @ 6, min=290.59K @ 2, med=292.55K @ 4)<br />
STATS: interrupt thread count (max= 941 @ 0, min= 335 @ 7, med= 439 @ 2)<br />
STATS: interrupt thread success(max= 0.00 @ 3, min= 0.00 @ 1, med= 0.00 @ 0)<br />
F-30 D000046-005 B