hpc news spring 2000 - EPCC - University of Edinburgh


hpc news spring 2000 - EPCC - University of Edinburgh

HPC news

Issue 12 April 2000



A new version of Cray's development software was

recently installed on the T3E. For us, this was notable,

because the new version of MPI which it includes was

developed by Stephen Booth here at EPCC. In the last

edition of HPC News readers can find a description of the

hardware-based optimisations which Stephen


In fact, the original MPI for the T3D, on which the T3E

version is based, was also developed at EPCC. We see

these as examples of the way that EPCC has always

promoted the spread of new computing technologies. In

this issue, and in EPCC News as well, we can see how this is

continuing, above all in EPCC's work with clusters of

commodity processors, the Javagrande Forum and in grid


Mention of the grid almost by definition implies

cooperation between sites, both in the UK and abroad.

The UKHEC collaboration between EPCC, Daresbury

Laboratory and the MRCCS at Manchester will be taking

the grid and much else forward within the UK. Some

aspects of EPCC's wider European work can also be found

in this issue.

Meanwhile EPCC's HPCI Centre, and the in-depth team

continue and broaden their support for users of HPC

services in the UK.

MPI Optimisation:




EPCC's European initiatives:





Weather forecasting


TRACS: Project report




Java and the grid



Exploring the performance of the Unified Model on various platforms

Weather forecasts of commodity clusters


Photo: Ian Bell, Bureau of

Meteorology Training Centre,

Melbourne, Australia

EPCC is supporting the activities of the University Global

Atmospheric Modelling Program (UGAMP) under the

auspices of the High Performance Computing Initiative

and through links to the Centre for Global Atmospheric

Modelling (CGAM) based at the Department of

Meteorology within the University of Reading.

The UGAMP is funded by the Natural Environment

Research Council, and aims to develop a better

understanding of the atmosphere and its links to other

systems. Accordingly, UGAMP is involved in the

modelling of large scale atmospheric processes. The role

of CGAM is to coordinate modelling developments and to

pursue research fundamental to UGAMP and it is this

work that EPCC is supporting.

The UGAMP consortium use the UK Meteorological

Office's Unified Model (UM) for large-scale atmospheric

modelling. UGAMP are keen to investigate the use and

performance on the UM on "departmental scale" HPC

platforms, including workstation clusters, Beowulf-style

PC clusters and modest SMP servers. Although a

portable version of the UM exists, the code has been

optimised to run on the Cray T3E system; as a

consequence there are a number of unanswered questions

about its performance on these different, lower-cost


The aim of the current work under HPCI is thus to

produce a theoretical model of the UM's performance

across various platforms and to determine the sustainable

costs of architectural and configuration policy. A greater

understanding is required of the performance of the UM

and of the technology it runs on now and may run on in

the future. An initial market survey of departmental

platforms and tools is being developed and EPCC is

developing methodology for assessing the performance of

the UM on a range of suitable systems. The aim is to take

steps towards a full performance model of the UM.

CGAM and EPCC will apply this methodology to a range

of platforms available include a Sun SMP server system,

networks of workstations and a 16-node Beowulf cluster

currently being built in Edinburgh by EPCC.

EPCC's HPCI centre: http://www.epcc.ed.ac.uk/hpci/

UGAMP: http://www.met.rdg.ac.uk/ugamp/ugamp.html

Natural Environment Research Council (NERC):


Centre for Global Atmospheric Modelling (CGAM)


A numerical approach can give insight that experimentation cannot provide.

Shearing of layered systems


Fig. 1: System of

dimers in the

lamellar phase

Amphiphiles are an extremely

important class of molecules,

offering numerous applications, and showing a very rich

physical behaviour. The term amphiphile is used for a class

of molecules which consist of a water-soluble

(hydrophilic) and an oil-soluble (lipophilic) part and

which are able to influence the properties of interfaces

and surfaces. Immiscible fluids like oil and water show

such an interface and we often need to add something to

such a mixture in order to get rid of the macro phase

separation tendency (e.g. when washing dishes). The

effect of adding an amphiphile like SDS (sodium dodecyl

sulphate) to a mixture of oil and water can be quite

dramatic. Only a small amount results in a drop in the

interface tension of several orders of magnitude.

The name amphiphile itself originates from ancient Greek

and means both-loving. With varying concentrations, they

exhibit many different ordered phases on large scales

compared to the molecular distances. Here we only

consider the lamellar or smectic phase, where the systems

exhibit a layered structure. Tensides and also diblock

copolymers made of different kinds of polymers glued

together, liquid crystals (LC), and LC-polymers are

among them.

It is known from many experiments that shear is likely to

have a strong influence on the structure of complex

fluids. For example, D. Roux et al. found a transition to

multilamellar vesicle phase in a quarternary mixture

under shear. They called it the onion phase. U. Wiesner et

al. found two reorientations of a lamellar diblockcopolymer

melt from parallel to perpendicular to parallel

with respect to the shear plane under oscillatory shear.

The usual smectic A hydrodynamics theory has not so far

provided any clue to the mechanism of these findings,

although there are a few promising approaches. A

numerical approach, like ours, gives insight into physical

details of the problem which an experiment cannot

provide, since one has access to every single molecule at

each time step. If numerical calculations reproduce the

experimental findings within the framework of

idealisations that have been made, one should be able to

deduce a probable mechanism.

Applying a simple picture, one can imagine a mixture of

A and B particles with amphiphiles as A-B molecules,

where A denotes the A-philic and B the A-phobic part.

We introduced a simple model which retains the essential

physics of interest while disregarding the chemical details.

Such an approximation is needed, since there are no

computers which enable us to do a simulation with

detailed modelling down to the atomistic level with the

minimum number of molecules needed.

In our present research we consider a pure melt of A-B

dimers, while other system compositions can be

implemented straightforwardly. This A-B dimer system

exhibits a phase transition from isotropic to lamellar. In

the lamellar phase the system was investigated under

shear. Figure 1 shows a lamellar system. We find a

behaviour similar to experimental situations and especially

in agreement with the analytical results of Auernhammer

et al.

14-15 September 2000, University of Edinburgh

Second European Workshop on OpenMP:


Following the success of the First European Workshop on

OpenMP at Lund University in Sweden, EPCC will be

hosting a second workshop at the University of


OpenMP has recently emerged as the definitive standard

for shared memory parallel programming. For the first

time, it is possible to write parallel programs which are

portable across the majority of shared-memory parallel

computers. This workshop will be a forum for discussion

of the latest developments in OpenMP and its

applications. Topics of interest include:

•OpenMP implementations

• Proposals for, and evaluation of, language extensions.

• Applications development experiences

• Comparisons with other approaches such as MPI and


•Benchmarking and performance studies

•Compilers, debuggers and performance analysis tools for


Further information and timetables:


Large scale structures require huge system sizes, implying considerable

computational effort. For an investigation of the system's properties under

shear, we typically simulate from 100,000 particles on 64 T3E processors up

to 2,000,000 particles on 512 processors. In order to fulfill this task we

adopted the highly-optimized parallel Molecular Dynamics code for the

simulation of monodisperse polymer melts by M. Pütz. At EPCC it was

ported to MPI and extended with a shearing module based on the algorithm

by F. Müller-Plathe. Contrary to the methods usually applied, this algorithm

does not pre-impose the shear profile. This seems to be crucial in the

investigations of the propagation of defect structures in the lamellar phase

under shear.

Further details

E-mail: soddeman@mpip-mainz.mpg.de


In 1997, Thomas Soddemann received his physics diploma at the

Albert-Ludwigs Universität Freiburg in Germany.

Since September 1997, he has been working as a PhD student in

Prof. K. Kremer's theory group at the Max-Planck Institute for

Polymer Research, Mainz, Germany. In his thesis he will investigate

layered systems under steady state non-equilibrium conditions with

large scale molecular dynamics simulations.

In early 1999 he visited the Physics Department at the University of

Edinburgh for six weeks as a participant in the TRACS scheme. There

he continued his work under the supervision of Prof Mike Cates,

while making use of EPCC's computing resources with the assistance

of J-C Desplat.

Optimised MPI


In issue 11 of HPC News, Dr

Stephen Booth of EPCC discussed the

optimisations he has made to the

T3E's MPI implementation.

This version of MPI has now been

released by Cray, and forms part of

the default version of the MPT

library installed on the EPCC T3E

(version Optimisations

include reductions in message latency

and improvements in the

performance of barriers. Stephen

discussed his work on MPI in a talk at

the Parallel Libraries and Tools

Seminar at Daresbury in November

last year. The optimisations take

advantage of some of the advanced

features of the T3E's communication


The T3E version of MPI was

originally developed for Cray at

EPCC. We are pleased to be able to

continue our association with this


Stephen Booth's paper:



The work described here was presented by Paul Graham to European Grid workshop at the ISThmus 2000Conference

in Poznan, Poland, in April 2000.


Java, HPC and the grid MARTIN

The next big advance in the area of high performance

computing (HPC) will involve connecting compute

resources, data resources, experimental instruments and

post-processing visualisation equipment into so-called

computational grids. It is widely accepted that Java will be

a useful technology for developing middleware technology

to support grid computing. However, the use of Java as a

language for high-performance codes themselves is much

more controversial.

Java, with its promise of platform independence and rapid

development time, is becoming the most widely-used

programming language. According to a recent article in

IT-Director.com, 36.8% of the sampled programming job

advertisements required Java skills.

Java's prevalence will continue to increase because it is

already the most popular teaching language at

Universities, it is available on the largest number of

platforms and is is of strategic importance to some of the

largest IT companies, including Sun, IBM and Oracle.

The Java Grande Forum (JGF) is an organisation which

believes that Java is potentially the best language for

programming HPC applications. It aims to promote the

use of the language and works with the standards

organisations to improve Java in this respect. Despite the

opportunities it presents there are still drawbacks to



One of the biggest attractions of Java is its portability.

While this could be argued for C, C++, and FORTRAN,

true portability has not been achieved in these languages,

except by expert-level programmers.

Portability is of particular benefit to HPC users because of

the short lifetime of machines, the heterogeneity

promised by grid computing and the maintenance

advantages of having only a single code to maintain.

A project carried out at EPCC with Hitachi Europe

demonstrated the potential power of Java's portability. A

simple harness was provided which allowed the same piece

of code to be run, without modification or even

recompilation, in serial, or in parallel on MPP and SMP


Network centricity

Java is a very network-centric language, which makes it

very attractive for building not only middleware, but also


network-aware applications. Such applications could

achieve a much tighter coupling with middleware layers,

and support features such as remote visualisation or

computational steering over a grid infrastructure.

Software engineering

Another important advantage of Java for any kind of

programming is that the language forces the use of good

object-orientated and software engineering practices and

encourages the efficient reuse of code. Java's memory

management and type checking help to reduce

programming errors. Industrial metrics indicate that for

all of these reasons Java has reduced development time on

major codes.

Typically this argument has not been of great interest to

academics, for whom labour is cheap (post-graduate

students etc.) and compute time expensive. However,

machines are getting faster and compute time cheaper.

Moreover academics want to publish early. Since novel

applications usually spend much more time being

developed than they do being run, reducing development

time can be the best way to produce faster results, even if

run time is increased as a result.

Problems with performance

When the use of Java for HPC is first suggested

performance concerns are often the first objection. The

virtual machine (JVM) that gives Java its powerful

portability is an interpreting system. Early JVMs were

very slow giving a performance of a tenth to a hundredth

that of equivalent optimised C or Fortran. However the

gap is closing rapidly; Just-In-Time compilers (JITs) are

bringing Java to within a half to a third. Static Java

compilers such as the one produced by IBM have achieved

70-100% of optimised native code.

Furthermore, the JITs are continuing to improve.

Optimists such as Bill Joy, one of the original developers

of Java, claim that JITs will ultimately out-perform static

compilers. The argument is that since it has much more

information about the run-time performance of the code,

a JIT can carry out optimisations which a static compiler


Sceptics are less sure about this claim, but it is clear that

the operation of a JIT is well suited to large numerical

simulations. The big problem that JITs face is that they

have to trade off runtime against compile/optimisation

time. Large numerical simulations often spend most of

their time in a small kernel. It will usually be extremely

New FFT library adapts to machine architecture

on the T3E and Lomond GAVIN


The FFTW library is now available at EPCC both on the

T3E and on Lomond, the Sun SMP cluster which is the

University of Edinburgh's internal HPC resource.

FFTW, which stands for Fastest Fourier Transform in the

West, is a library for computing multiple-dimensional

discrete Fourier transforms for both real and complex


The library uses an explicit divide-and-conquer approach

which takes advantage of the current memory hierarchy

and the sizes of the caches at different levels. When it is

installed, it uses a code generator to produce a number of

highly-optimised routines for computing small transforms

using a dynamic programming algorithm. These small

routines are referred to as codelets and are written in the

Caml-Light dialect of ML.

When the FFTW library is called for the first time for a

particular FFT, it uses an internal interpreter to assess the

run-time characteristics of the current architecture and

compiler. It does this by measuring the wall-clock time

taken to compute this particular FFT for many different

factorisations. It then builds what is called a plan, which

specifies which of the many codelets should be used for a

given FFT.

This initial calculation of the plan can be slow, but it can

be saved to a file, so that it need not be calculated again.

At all subsequent runs which employ the same size of

FFT, the plan is re-used.

This technique results in an FFT which is fast enough to

compete with the vendor's own software on a wide

variety of different platforms.

It is common for FFT libraries to be limited to an input

data size of a power of 2. However, FFTW can work with

data of arbitrary sizes, although prime factors of 2, 3, 5

and 7 are fastest. The FFTW library can also be used with

distributed data, in which case it employs MPI for

communications. In the case of SMP machines, Cilk, or

some flavour of threads (e.g. POSIX) is used.

The library may be used by C, C++ or FORTRAN

programs. For double precision FFTs, use these link

options: -ldfftw -lm

For single precision, use: -lsfftw -lm

On the T3E the fftw module must be loaded first.

The FFTW package was developed at MIT by Matteo

Frigo and Steven G.Johnson.

FFTW: http://www.fftw.org

Cilk: http://supertech.lcs.mit.edu/cilk/

worthwhile for the JIT to compile and optimise this to

the highest degree.

In order to track changes and highlight particular issues

the JGF has a benchmarking initiative lead by EPCC.

Numerical issues

In the view of the JGF, the most serious immediate

problems are related to numerical issues: the need to

ensure that the standard can support the efficient

representation of floating point and complex numbers and

arrays. There is a tradeoff in this area between the exact

reproducibility of results on the one hand, and on the

other the performance benefits offered by the use of the

underlying hardware.

Need for parallel programming standards

Aside from threads which are built in to the Java language,

the question is still open as to what paradigms, tools and

libraries to use to write Java code in parallel. There are

no standards here and this makes it difficult for parallel

programmers take the plunge, since they cannot be

certain that the software and APIs they use will continue

to be supported. The Java Grande Forum is working on

an MPI-like standard, and, led by EPCC, an OpenMP-like



There are many strong reasons why Java may become the

language of choice for HPC programming, particularly in

the heterogeneous world promised by Grid-based

computing. However, further developments and

refinements of the language will be crucial to its

widespread use for numerical programming.

Javagrande Forum:


Javagrande benchmarking initiative:


More magazines by this user
Similar magazines