27.12.2014 Views

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

F–Troubleshooting<br />

<strong>QLogic</strong> MPI Troubleshooting<br />

MPI Stats<br />

This message occurs when a cable is disconnected, a switch is rebooted, or when<br />

there are other problems with the link. The job continues retrying until the<br />

quiescence interval expires. See the mpirun -q option for information on<br />

quiescence.<br />

If a hardware problem occurs, an error similar to this displays:<br />

infinipath: [error strings ] Hardware error<br />

In this case, the MPI program terminates. The error string may provide additional<br />

information about the problem. To further determine the source of the problem,<br />

examine syslog on the node reporting the problem.<br />

Using the -print-stats option to mpirun provides a listing to stderr of<br />

various MPI statistics. Here is example output for the -print-stats option<br />

when used with an eight-rank run of the HPCC benchmark, using the following<br />

command:<br />

$ mpirun -np 8 -ppn 1 -m machinefile -M ./hpcc<br />

STATS: MPI Statistics Summary (max,min @ rank)<br />

STATS: Eager count sent (max=171.94K @ 0, min=170.10K @ 3, med=170.20K @ 5)<br />

STATS: Eager bytes sent (max=492.56M @ 5, min=491.35M @ 0, med=491.87M @ 1)<br />

STATS: Rendezvous count sent (max= 5735 @ 0, min= 5729 @ 3, med= 5731 @ 7)<br />

STATS: Rendezvous bytes sent (max= 1.21G @ 4, min= 1.20G @ 2, med= 1.21G @ 0)<br />

STATS: Expected count received(max=173.18K @ 4, min=169.46K @ 1, med=172.71K @ 7)<br />

STATS: Expected bytes received(max= 1.70G @ 1, min= 1.69G @ 2, med= 1.70G @ 7)<br />

STATS: Unexpect count received(max= 6758 @ 0, min= 2996 @ 4, med= 3407 @ 2)<br />

STATS: Unexpect bytes received(max= 1.48M @ 0, min=226.79K @ 5, med=899.08K @ 2)<br />

By default, -M assumes -M=mpi and that the user wants only mpi level statistics.<br />

The man page shows various other low-level categories of statistics that are<br />

provided. Here is another example:<br />

$ mpirun -np 8 -ppn 1 -m machinefile -M=mpi,ipath hpcc<br />

STATS: MPI Statistics Summary (max,min @ rank)<br />

STATS: Eager count sent (max=171.94K @ 0, min=170.10K @ 3, med=170.22K @ 1)<br />

STATS: Eager bytes sent (max=492.56M @ 5, min=491.35M @ 0, med=491.87M @ 1)<br />

STATS: Rendezvous count sent (max= 5735 @ 0, min= 5729 @ 3, med= 5731 @ 7)<br />

STATS: Rendezvous bytes sent (max= 1.21G @ 4, min= 1.20G @ 2, med= 1.21G @ 0)<br />

STATS: Expected count received(max=173.18K @ 4, min=169.46K @ 1, med=172.71K @ 7)<br />

STATS: Expected bytes received(max= 1.70G @ 1, min= 1.69G @ 2, med= 1.70G @ 7)<br />

STATS: Unexpect count received(max= 6758 @ 0, min= 2996 @ 4, med= 3407 @ 2)<br />

STATS: Unexpect bytes received(max= 1.48M @ 0, min=226.79K @ 5, med=899.08K @ 2)<br />

STATS: InfiniPath low-level protocol stats<br />

STATS: pio busy count (max=190.01K @ 0, min=155.60K @ 1, med=160.76K @ 5)<br />

STATS: scb unavail exp count (max= 9217 @ 0, min= 7437 @ 7, med= 7727 @ 4)<br />

STATS: tid update count (max=292.82K @ 6, min=290.59K @ 2, med=292.55K @ 4)<br />

STATS: interrupt thread count (max= 941 @ 0, min= 335 @ 7, med= 439 @ 2)<br />

STATS: interrupt thread success(max= 0.00 @ 3, min= 0.00 @ 1, med= 0.00 @ 0)<br />

F-30 D000046-005 B

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!