27.12.2014 Views

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

QLogic OFED+ Host Software User Guide, Rev. B

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

E–Integration with a Batch Queuing System<br />

Lock Enough Memory on Nodes when Using SLURM<br />

The following command terminates all processes using the <strong>QLogic</strong> interconnect:<br />

# /sbin/fuser -k /dev/ipath<br />

For more information, see the man pages for fuser(1) and lsof(8).<br />

Note that hard and explicit program termination, such as kill -9 on the mpirun<br />

Process ID (PID), may result in <strong>QLogic</strong> MPI being unable to guarantee that the<br />

/dev/shm shared memory file is properly removed. As many stale files<br />

accumulate on each node, an error message can appear at startup:<br />

node023:6.Error creating shared memory object in shm_open(/dev/shm<br />

may have stale shm files that need to be removed):<br />

If this occurs, administrators should clean up all stale files by using this command:<br />

# rm -rf /dev/shm/psm_shm.*<br />

See “Error Creating Shared Memory Object” on page F-23 for more information.<br />

Lock Enough Memory on Nodes when Using<br />

SLURM<br />

This section is identical to information provided in “Lock Enough Memory on<br />

Nodes When Using a Batch Queuing System” on page F-22. It is repeated here<br />

for your convenience.<br />

<strong>QLogic</strong> MPI requires the ability to lock (pin) memory during data transfers on each<br />

compute node. This is normally done via /etc/initscript, which is created or<br />

modified during the installation of the infinipath RPM (setting a limit of<br />

128 MB, with the command ulimit -l 131072).<br />

Some batch systems, such as SLURM, propagate the user’s environment from<br />

the node where you start the job to all the other nodes. For these batch systems,<br />

you may need to make the same change on the node from which you start your<br />

batch jobs.<br />

If this file is not present or the node has not been rebooted after the infinipath<br />

RPM has been installed, a failure message similar to one of the following will be<br />

generated.<br />

The following message displays during installation:<br />

$ mpirun -np 2 -m ~/tmp/sm mpi_latency 1000 1000000<br />

iqa-19:0.ipath_userinit: mmap of pio buffers at 100000 failed:<br />

Resource temporarily unavailable<br />

iqa-19:0.Driver initialization failure on /dev/ipath<br />

iqa-20:1.ipath_userinit: mmap of pio buffers at 100000 failed:<br />

Resource temporarily unavailable<br />

iqa-20:1.Driver initialization failure on /dev/ipath<br />

D000046-005 B E-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!