QLogic OFED+ Host Software User Guide, Rev. B
QLogic OFED+ Host Software User Guide, Rev. B
QLogic OFED+ Host Software User Guide, Rev. B
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
E–Integration with a Batch Queuing System<br />
Lock Enough Memory on Nodes when Using SLURM<br />
The following command terminates all processes using the <strong>QLogic</strong> interconnect:<br />
# /sbin/fuser -k /dev/ipath<br />
For more information, see the man pages for fuser(1) and lsof(8).<br />
Note that hard and explicit program termination, such as kill -9 on the mpirun<br />
Process ID (PID), may result in <strong>QLogic</strong> MPI being unable to guarantee that the<br />
/dev/shm shared memory file is properly removed. As many stale files<br />
accumulate on each node, an error message can appear at startup:<br />
node023:6.Error creating shared memory object in shm_open(/dev/shm<br />
may have stale shm files that need to be removed):<br />
If this occurs, administrators should clean up all stale files by using this command:<br />
# rm -rf /dev/shm/psm_shm.*<br />
See “Error Creating Shared Memory Object” on page F-23 for more information.<br />
Lock Enough Memory on Nodes when Using<br />
SLURM<br />
This section is identical to information provided in “Lock Enough Memory on<br />
Nodes When Using a Batch Queuing System” on page F-22. It is repeated here<br />
for your convenience.<br />
<strong>QLogic</strong> MPI requires the ability to lock (pin) memory during data transfers on each<br />
compute node. This is normally done via /etc/initscript, which is created or<br />
modified during the installation of the infinipath RPM (setting a limit of<br />
128 MB, with the command ulimit -l 131072).<br />
Some batch systems, such as SLURM, propagate the user’s environment from<br />
the node where you start the job to all the other nodes. For these batch systems,<br />
you may need to make the same change on the node from which you start your<br />
batch jobs.<br />
If this file is not present or the node has not been rebooted after the infinipath<br />
RPM has been installed, a failure message similar to one of the following will be<br />
generated.<br />
The following message displays during installation:<br />
$ mpirun -np 2 -m ~/tmp/sm mpi_latency 1000 1000000<br />
iqa-19:0.ipath_userinit: mmap of pio buffers at 100000 failed:<br />
Resource temporarily unavailable<br />
iqa-19:0.Driver initialization failure on /dev/ipath<br />
iqa-20:1.ipath_userinit: mmap of pio buffers at 100000 failed:<br />
Resource temporarily unavailable<br />
iqa-20:1.Driver initialization failure on /dev/ipath<br />
D000046-005 B E-5