19.01.2015 Views

MOLPRO

MOLPRO

MOLPRO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 RUNNING <strong>MOLPRO</strong> 4<br />

also some parts of the code that can take advantage of shared memory parallelism through the<br />

OpenMP protocol, although these are somewhat limited, and this facility is not at present recommended.<br />

It should be noted that there remain some parts of the code that are not, or only<br />

partly, parallelized, and therefore run with replicated work. Additionally, some of those parts<br />

which have been parallelized rely on fast inter-node communications, and can be very inefficient<br />

across ordinary networks. Therefore some caution and experimentation is needed to avoid waste<br />

of resources in a multiuser environment.<br />

<strong>MOLPRO</strong> effects interprocess cooperation through the ppidd library, which, depending on how<br />

it was configured and built, draws on either the Global Arrays parallel toolkit or pure MPI.<br />

ppidd is described in Comp. Phys. Commun. 180, 2673-2679 (2009). Global Arrays handles<br />

distributed data objects using whatever one-sided remote memory access facilities are provided<br />

and supported. In the case of the MPI implementation, there is a choice of using either MPI-2<br />

one-sided memory access, or devoting some of the processes to act as data ‘helpers’. It is generally<br />

found that performance is significantly better, and competitive with Global Arrays, if at<br />

least one dedicated helper is used, and in some cases it is advisable to specify more. The scalable<br />

limit is to devote one core on each node in a typical multi-core cluster machine, but in most<br />

cases it is possible to manage with fewer, thereby making more cores available for computation.<br />

This aspect of configuration can be tuned through the *-helper-server options described<br />

below.<br />

Molpro can be compiled in three different ways:<br />

1. Serial execution only. In this case, no parallelism is possible at run time.<br />

2. ‘MPP’: a number of copies of the program execute simultaneously a single task. For<br />

example, a single CCSD(T) calculation can run in parallel, with the work divided between<br />

the processors in order to achieve a reduced elapsed time.<br />

3. ‘MPPX’: a number of copies of the program run in serial executing identical independent<br />

tasks. An example of this is the calculation of gradients and frequencies by finite<br />

difference: for the initial wavefunction calculation, the calculation is replicated on all<br />

processes, but thereafter each process works in serial on a different displaced geometry.<br />

At present, this is implemented only for numerical gradients and Hessians.<br />

Which of these three modes is available is fixed at compilation time, and is reported in the job<br />

output. The options, described below, for selecting the number and location of processors are<br />

identical for MPP and MPPX.<br />

2.2.1 Specifying parallel execution<br />

The following additional options for the molpro command may be used to specify and control<br />

parallel execution.<br />

-n | --tasks tasks/tasks per node:smp threads tasks specifies the number of parallel processes<br />

to be set up, and defaults to 1. tasks per node sets the<br />

number of GA(or MPI-2) processes to run on each node, where<br />

appropriate. The default is installation dependent. In some environments<br />

(e.g., IBM running under Loadleveler; PBS batch job),<br />

the value given by -n is capped to the maximum allowed by the<br />

environment; in such circumstances it can be useful to give a very<br />

large number as the value for -n so that the control of the number<br />

of processes is by the batch job specification. smp threads relates

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!