Download PDF - EPCC - University of Edinburgh

Download PDF - EPCC - University of Edinburgh


The University of Edinburgh

Issue 49, Autumn 2003


2 TOG: scheduling jobs

through the Grid

3 PGPGrid: realistic


4 QCDOC: arrival of the

first chips

5 RealityGrid: a

computational steering


6 GridWeaver: weaving the

fabric of Grid computing

7 FirstDIG: First Data

Investigation on the Grid

8 OGSA-DAI: an end and a

new beginning


Demo for Genetics

10 Binary XML: describing

the data on the Grid

11 EPCC joins the Globus


12 HPCx: 9 months on

14 The MSc in HPC: a second

successful year

2003 –

A year of delivery

16 MS.NetGrid: Grid Services

and Microsoft .NET


Editorial Alison Kennedy

EPCC has been well established as the leading crossdisciplinary

European centre applying novel computing

techniques to real-world problems for a number of years.

We have partners, collaborators and customers in all areas of

science, industry and technology. During the past two years,

we have steadily expanded both the scope and the range of

our activities, to apply our expertise and talents to emerging

opportunities in e-Science and the Grid. 2003 has been a Year

of Delivery, during which we demonstrated our ability to

deliver on these new projects. I would like to highlight two of

these projects to illustrate our success.

HPCx, the new flagship HPC system for UK researchers

went into full operation in December 2002, funded by the

UK Research Councils and operated jointly by EPCC and

CCLRC’s Daresbury Laboratory, with technology supplied by

IBM. The HPCx service not only provides researchers with

access to one of the best high-end computers in the world,

but also has set about the task of assisting users in improving

the performance of key codes to make efficient use of the

capability of the new service. The two partners are now

providing support for 19 research consortia, who used more

than 5.4 million processor hours of HPCx time in the first

nine months of this year.

EPCC has also been making a major name for itself in Grid

middleware. In September 2003, EPCC was invited to join

the Globus Alliance, a tightly integrated consortium dedicated

to collaborative design, development, testing and support

of the open source Globus Toolkit, the de facto standard

Grid software. This provides a measure of how far EPCC

and the University of Edinburgh have come in the two years

in which we have been participating in Grid middleware

projects. EPCC developers play a leading role in the Global

Grid Forum Database Access and Integration Services (DAIS)

working group that is writing specifications for this important

Grid area. EPCC is the primary implementer of a reference

implementation, OGSA-DAI through work funded by the UK

e-Science Grid Core Programme.

EPCC’s success continues to be dependent on close

partnership with academic, industrial end-user and

technology partners. In each of these highlighted projects

(as in the rest of our activities), EPCC has worked in close

collaboration with a range of partners with complementary

skills. The benefits of collaborative working are easy to

identify: increased availability of human and other resources

for technically demanding or time-critical projects,

elimination of duplication of effort, opportunities to specialise

in areas where we are strong. This leads to effective and

efficient delivery on key projects.

Through e-Science, high-performance computing and grid

technology is spreading rapidly beyond its traditional scientific

domains. EPCC looks forward to playing a globally significant

role in its continuing deployment in creating and exploiting

information infrastructures in support of scientific research

and commercial computing.

TOG: scheduling jobs through the Grid

Terry Sloan

The Transfer-queue Over Globus (TOG) software from the

EPCC Sun Data and Compute Grids project allows two or

more enterprises to schedule jobs across shared resources. It

integrates Grid Engine V5.3 and the Globus Toolkit V2.x to

allow a Grid Engine at one site to pass jobs to Grid Engines

at collaborating remote sites when local resources are busy

or when jobs are too compute intensive. TOG uses Globus

(via the Java COG kit) to securely submit and control jobs on

remote compute resources (using Globus job manager), and

to securely transfer jobs’ input and output data to and from

the remote compute resource (using GridFTP).

Since July 2003, TOG has been publicly available from the

open source Grid Engine community website.

The EPCC Sun Data and Compute Grids project is funded

by the UK e-Science Core Programme. The partners are

the National e-Science Centre (represented in this project

by EPCC) and Sun Microsystems. It aims to develop a fully

Globus-enabled compute and data scheduler based around

Grid Engine, Globus and a wide variety of data technologies.

The project is now developing the JOb Scheduling

Hierarchially (JOSH) system. This integrates access to data

sources and is based on OGSA-compliant Globus Toolkit

V3. JOSH queries child-distributed resource managers

such as Grid Engine at collaborating sites and forwards job

specifications to the most appropriate sites according to user

job requirements.

TOG is used in the OGSA-DAI Demo for Genetics (see

page 9). TOG is also providing the grid infrastructure for a

UK-Australia pilot project in e-Social Science looking at gridenabled

fusion of global data and local knowledge for business

and commerce.


For further information, see:



PGPGrid: realistic animations

Kostas Kavoussanakis

Since 1908, when Emile Cohl made ‘Fantasmagorie’,

the first animated film, animation has progressed

hand-in-hand with technology. The use of computers

in animation is widespread, almost compulsory

nowadays, as it is the recognised means to achieve

the realism viewers have come to expect and


The 3D-Matic Laboratory at the University of

Glasgow has designed and constructed a facility that

captures the motion of actors and abstracts this into

models, ready to be used for character animation.

This facility is unique in that it captures 3D, as

opposed to 2D images, thus opening up enormous

creative opportunities to the animators. However, this

process is both computationally- and data-intensive.

It would help if the capture rig could exploit the

network and processing capabilities of the Grid.

While technology has pushed the boundaries of

viewing quality, animation projects are still run

in the same way as they have always been. All

animators involved in a project are crammed into

the same space, such as an aircraft hangar, and

work under constant supervision, towards the final

goal. While this method works for big companies,

it has the drawback that

animators are required to

relocate for every big project.

It would be beneficial for the

companies and people involved

if animators could use modern

technologies to work remotely

yet productively.

In general, the process of setting

up the virtual organisation

required for the whole

animation project – which

involves the coordination and

management of various, highly

specialised organisations

– fails to take advantage

of modern communication

and management tools and


Pepper’s Ghost Productions Ltd

(PGP) is a UK-based, digital

animation company. As is typical

of such businesses, it owns a farm of processors

which is fully utilised approximately 20% of the time.

However, when the animators get a contract, they

need all the processing power they can get. It would

thus be desirable for the company to have access to

large amounts of processing power, without incurring

the purchase, maintenance and upgrade costs

associated with computing hardware.

Currently this is addressed by commercial

organisations that specialise in rendering (the highly

compute-intensive process that turns models to

images). The problem with this approach is that the

animator relinquishes control of the process, meaning

mistakes cannot be detected and corrected until the

expensive rendering process has been completed.

PGPGrid aims to address the issues discussed

above. EPCC will provide 3D-Matic access to highperformance

computers to allow experimentation

with remote processing. This will provide insight into

the benefits and bottlenecks associated with the use

of batch-computing systems as well as the network

for such intensive jobs.

We will also provide access for PGP to drive remote

rendering from their base in

London. We aim to produce

a demo-movie combining the

results of these two exercises.

In the meanwhile, we show on

this page how a capture from

3D-Matic’s rig can be combined

with PGP’s rendering technology

to put a living person’s head into


Finally, we will investigate the

process of setting up virtual

organisations in the animation

business and produce a bestpractice


PGPGrid is funded by the UK

Grid Core Programme and

Pepper’s Ghost Productions Ltd.

For further information, see:



QCDOC: arrival of the

first chips

Bálint Joó

A buzz of excitement was created this May in the offices of QCDOC

project members worldwide by the delivery of the first five QCDOC

chips from IBM, marking the culmination of nearly three years of

design work by the QCDOC Project.

QCDOC stands for Quantum Chromodynamics (QCD)

On a Chip, hardware dedicated to calculations of lattice

QCD (LQCD) – a version of the theory of sub-atomic

particles, such as quarks and gluons, amenable to solution by


Computationally, LQCD calculations involve the evaluation of

very difficult integrals by statistical sampling methods.

However sampling force fields in a vacuum is numerically

very expensive. For accurate simulations many extremely

large systems of linear equations (with dimensions of the

order of millions) need to be solved for each sample.

Solving these equations is a nearest-neighbour problem,

meaning that to perform the computation efficiently using

large numbers of processors, one needs to be able to

completely overlap the computation and communication

performed by each processor.

As the most demanding LQCD problems require the

distribution of a fixed global problem size over as many

processors as possible, the key performance factor for a

LQCD calculation is communications latency. This problem

is difficult to surmount with networking hardware in current

computational clusters. The only other alternative, apart

from buying a commercial supercomputer at great cost, is

to build a special purpose computer with communications

hardware matched to the specific needs of LQCD. Other

advantages of building special purpose computers over

clusters are their lower power consumption per node, smaller


physical footprint and higher performance-to-price ratio. The

disadvantage is the effort required to design and build such

a machine, and the need to create the necessary software to

make it usable.

The QCDOC collaboration consists of the theoretical particle

physics research groups of Columbia University, Brookhaven

National Laboratory and Yale University from the USA, the

RIKEN Laboratory at Brookhaven from Japan, and UKQCD

(the UK consortium of lattice physicists). The project was

spearheaded by the Columbia group, which has a history

of successfully building special purpose QCD machines,

such as the QCDSP, which won the Gordon Bell Award at

Supercomputing 2000.

The other collaborating institutions have provided funding

and, more importantly, motivated members of staff whose

contribution to the current project has been invaluable.

The design philosophy of the QCDOC machine was to have

the essential processing elements integrated onto a single

chip, known as an Application Specific Integrated Circuit

(ASIC). Each such ASIC would form a processing node.

Two nodes would be mounted on a single daughterboard.

The machine could then be scaled up to several thousands of

nodes by plugging 32 daughterboards into a motherboard,

plugging up to 8 motherboards into a cabinet, and potentially

cabling together several cabinets to make the final systems.

The communications system was designed to sustain at

least 50% of peak performance over the complete system

thus built. The final machine is expected to provide a

performance-price ratio of under 1 US dollar/Mflop, with a

power consumption of about 5W per node.

The heart of each processing node is an IBM PowerPC 440

embedded processor, coupled to a floating-point unit (see

figure schematic of QCDOC ASIC). The target clock speed

for each processor is 500MHz, giving a peak performance

of 1Gflops. Additionally each node has 4MB of embedded

DRAM (EDRAM). Up to 2GB of further memory can be

provided off-chip through DDR DRAM memory DIMMs

mounted on the daughterboard.

Continued opposite.



A computational

steering portal

Paul Graham

RealityGrid is an EPSRC-funded project aiming to Gridenable

the realistic modelling and simulation of complex

condensed matter structures. The long-term goal is to provide

generic technology for Grid-based scientific, medical and

commercial activities. It is a collaboration between physical

scientists, computer scientists and software engineers from

several institutions (see

One aspect of EPCC’s contribution to the project is the

development of a prototype computational steering portal.

A specific goal of the RealityGrid project is to produce

an architecture, along with the supporting API, for the

submission and computational steering of simulations on

remote compute resources. The idea of the portal is to allow

scientists to access these simulations straightforwardly via the

API, giving them the ability to review results and ‘steer’ the

computation by adjusting key parameters, enabling remote

investigation of the solution space. Currently the portal is

being developed as a Java applet to allow it to be run in a

standard web browser, encouraging portability and ease of


One of the key features of the portal will be the ability

to traverse the simulation solution space via a ‘checkpoint

tree’. At key stages of the simulation run (typically when a

parameter is adjusted by the scientist) the simulation will be

checkpointed – that is, enough information about the run

will be stored to enable it to be restarted in a repeatable

manner from that point. Then the simulation will be allowed

to continue. However, the scientist may wish to examine the

simulation with a different value for a parameter, and can thus

rewind to the checkpoint and start a new simulation with the

new value. After several iterations of this there will be many

checkpoints and multiple routes through the simulation space,

and one of the challenges for the portal work is to make this

easily navigable. The use of checkpointing in this way should

limit the use of unnecessary compute cycles, saving both time

and money for the scientists.

Other proposed features of the portal include resource

discovery and authentication, job submission and parameter

monitoring. Via the computational steering API, the scientist’s

application will be able to make available its monitored and

adjustable parameters to the portal, allowing the scientist to

track them ‘live’ from their web browser and monitor the

simulation’s progress. Ultimately it is envisaged that the portal

will be linked with a visualisation tool, enabling the scientist

to remotely visualise the simulation’s progress.

For further information, see:

QCDOC continued

Low latency communication is provided by the Serial

Communication Unit (SCU) controlling High Speed Serial

Line (HSSL) components with an aggregate bandwidth of up

to 12Gbit/s over 12 bidirectional links (satisfying the nearestneighbour

communication pattern of LQCD). The SCU can

access the EDRAM using Direct Memory Access (DMA) to

allow communications to proceed without the attention of the

CPU. The nodes also contain an on-chip Ethernet controller,

which allows the chips to be completely controlled over


While the majority of the components on the ASIC are

standard IBM cores, a number of components had to be

specially designed in close collaboration between IBM

Research at Yorktown Heights and the QCDOC team.

The 12 bidirectional links on each node are to be connected

in a 6-dimensional torus topology, of which two are to be

used to partition the final machine (without recabling),

providing the network for computation.

In addition, an Ethernet tree will connect the QCDOC

machine to a front-end computer to allow booting,

management, diagnostics, I/O and job submission and


At the time of writing, more chips have arrived from IBM

and the design of cabinets to hold multiple motherboards is

complete. The nodes, daughterboards and motherboard are

undergoing hardware testing.

So far the nodes have performed to specification. Work

is beginning on the assembly of a 128-node prototype at

Columbia. If all goes well, a 10Tflops peak speed machine

will be available for UKQCD, to be housed in EPCC, by

summer 2004.




the fabric of Grid computing

George Beckett

The challenge of Grid


Researchers, hardware

manufacturers, and

end-users alike dream

of the new scientific

and commercial world

of Grid computing

– a world where

resources are readily

available and service provision is regulated and guaranteed.

The GridWeaver project questions the ability of current

technology to respond to the demands placed on it by this

dream. We neither doubt the vision, nor underestimate the

availability of computing power. What interests us is the

fabric: not only the resources and the middleware, but also

the glue that holds these together.

Large installations are not unusual in the modern computing

world. Methods and tools for managing them vary widely

between organisations, though we have shown that the

underlying philosophy of most is the same, with a dependence

on a locally generated process, accompanied by heroics

from technical staff. However, the Grid spans institutional

boundaries, making management impractical with currently

available technology.

We provide Use Cases that highlight fundamental issues

relating to automated fabric configuration:

• scaling problems associated with the anticipated expansion

in diversity and dynamism of fabrics

• the inevitability of conflicts and errors occurring in


• the ever-increasing importance of security, stability, and

non-disruptive ‘always on’ operation

• the need to accommodate software/hardware failures as an


• the impracticality of relying upon technical staff to manage a

whole fabric explicitly.

A paradigm shift

The drive towards Grid-based computing is one leading

indicator that computing is in the initial phases of its next

paradigm shift, where we raise the abstraction level at

which we manage our computing resources. Under the new

paradigm, we will no longer think of individual machines

within a fabric, but instead focus on collections of computers

managed as single entities. The ultimate goal of our research

is to provide fabric configuration and management solutions

that are capable of handling the full complexity and diversity

found in emerging computing fabrics, and which are able to

automate many node-centric activities that would today need

to be performed by people.

There is a useful analogy in the way that a computer operating

system works. A computer consists of many discrete

components (eg CPUs, memory, secondary storage). The role

of the operating system is to orchestrate the activities of these

separate components, make them function as a whole, and

offer an abstraction that makes it appear as if each application

owns the computer. In the new computing paradigm,

individual nodes are just like components of a larger, abstract,

distributed computer that is managed as one entity.

Showcasing configuration management

Based on our two existing, complementary tools (LCFG and

SmartFrog) we have created a testbed on which to showcase

concepts and solutions for automatic configuration to the

Grid community.

The GridWeaver team built GPrint, a prototype Grid service

that demonstrates large-scale, automatic system configuration.

Using LCFG/SmartFrog as the deployment environment, we

configure and manage a complex adaptive printing service.

GPrint is able to: set up computer resources ‘from the ground

up’; automate the configuration of complex services such as

the Globus toolkit; and respond autonomically to events on

the system such as hardware failure.

Where next

We believe that it is possible to implement a new production

fabric management system, based on the best of the currently

available technology, and augmented by the techniques

developed during the GridWeaver project. This should

provide a significant improvement on existing tools. However,

we believe that some of the currently unsolved problems

would still render such a system unable to fully meet the

requirements of Grid computing. A future effort could

include research into a number of key areas: configuration

languages; peer-to-peer technologies; modelling temporal

dependencies and managing workflows to effect smooth


GridWeaver is funded by the UK e-Science Core programme

and HP Labs.

Reports and other deliverables (including a video presentation of

GPrint) can be found at the project website:




First Data Investigation on the Grid

Terry Sloan

The transformation of the Grid

from an enabling technology in

the scientific domain to a widely-used business tool is a key

requirement if the success expected by the UK Government

is to be realised. To date the UK e-Science Programme

has involved a large number of companies in collaborative

projects. Many of these companies are either IT-related or

classic early adopters, and few of them are in the service

industries. First plc, however, has a clear business problem

which the Grid can help it solve and for this reason it is

collaborating in this project.

OGSA, an extension of XML-based Web services, provides a

clear entry point for many companies into the Grid domain.

Data Access and Integration (OGSA-DAI) services are a key

component of this.

Formed in 1995, First plc operates worldwide in many

different transport sectors. The company runs over 10,000

vehicles in the UK and has 23% of the market, making it the

UK’s largest operator. First is represented in this project by

First South Yorkshire buses.

Many businesses turn to data mining to better inform their

decision-making. In the transport industry, the huge range

of fragmented data sources has hindered its adoption. In the

FirstDIG project we intend to demonstrate how OGSA-DAI

services can provide cost-effective access to disparate data

sources and hence enable data mining of these sources to

answer specific business questions.

The FirstDIG project is a collaboration between First

plc and the National e-Science Centre

(represented by EPCC). It aims to deploy an

early implementation of OGSA-DAI within

the First South Yorkshire bus operational

environment. The project has two central


The data sources to be used in the project are from the

following systems:

• Customer contact – this records correspondence with

customers including commendations and complaints.

• Vehicle mileage – this records the daily vehicle mileage for

bus services.

• Ticket revenue – this contains the daily tickets sold and the

money taken for the bus services.

• Schedule adherence – a satellite tracking system that

records whether a bus is arriving and departing on time

from a bus stop.

These data sources range from SQL sources to ODBC

sources to COBOL files and are located at various company

sites. It is precisely these issues that any technology must

address in order to be applicable and useful to business.

Through data mining and statistical analyses the FirstDIG

project aims to answer specific business questions posed by

First. This will require the consolidation of data from the

customer contact system, the satellite tracking schedule

adherence system, the mileage records system and the

revenue system. The project will determine if OGSA-DAI can

be satisfactorily used to extract and consolidate the necessary

data from these systems to answer these business questions.

For more information on the project and its deliverables please access

the project website at:

• To demonstrate the deployment of OGSA-

DAI services in a commercial environment,

and learn from this process.

• To answer specific business questions

posed by the First group through a short

data mining analysis using the OGSA-DAI

service-enabled data sources.



2a. Query for

data resources

about “X"



4a. SQL,

XPath, XQuery...

4c. Results as XML



2b. Factory handle

3a. Request access to data


3c. GDS handle

Grid Data



4b. SQL, XPath,


SOAP Message

Service Creation

API Call

1. Register



3b. Create GridDataService



MySQL Oracle



An end and a

new beginning

Mario Antonioletti & Neil Chue Hong

The funding for the first phase of OGSA-DAI has now ended. A further phase

of development is scheduled to start in October under DAIT (DAI Two). So,

what is OGSA-DAI What has been achieved and what is planned for DAIT

OGSA-DAI is adding Data Access and Integration capabilities

to the Grid. The middleware produced, freely available from

the project website, provides the base services that allow

access to data resources, such as databases, through a common

interface. You can thus talk to a MySQL, Oracle, DB2, or

Xindice Database Management System (DBMS) without

having to worry about the DBMSs specific connection

requirements, although it may be beneficial to tailor the

database query to the database being accessed.

What OGSA-DAI buys you is the ability to benefit from the

location independence of Grid Services – you do not need

to know where your data resources are located, or what

DBMS they are running on, merely what language (eg SQL)

is required to access them, and what type of information you

are looking for. The base services – and the corresponding

interfaces – may then be used, or extended, to build higherlevel

services that offer more sophisticated capabilities such as

distributed query processing, an early prototype of which is

available from the project website, and data federation.

Using the Globus Toolkit 3 (GT3), which implements the

Global Grid Forum’s (GGF) Open Grid Service Infrastructure

recommendations, a combined team of EPCC and IBM UK

developers have produced three major releases of the OGSA-

DAI distribution. The final OGSA-DAI release coincided

with the first production release of GT3 in July of this year.

Overall over 1000 copies of the OGSA-DAI distribution have

been downloaded from the project website.

The close collaboration with Globus facilitated EPCC’s entry

into the Globus Alliance (see page 11) and the experience and

expertise of team members has allowed them to contribute

positively to the GGF Database Access and Integration

Services (DAIS) standardisation effort which is seen as a key

part of the Open Grid Services Architecture being envisioned

by GGF. At a UK level OGSA-DAI achieved a high profile

at this year’s All Hands e-Science event held at Nottingham,

where the technology was demonstrated as part of the ODD-

Genes project (see page 9) and described in a mini-workshop.

So where next DAIT will provide funding for a further

two years of research and development. This will guarantee

continuity of support to an increasing base of established

OGSA-DAI users and will encourage more widespread

adoption by developing effective OGSA-DAI applications

and collaborating with groups adopting this technology.

In addition to this, DAIT aims to extend the framework

and functionality of OGSA-DAI to support not only more

databases but also file systems. Through DAIT, OGSA-

DAI will continue to be involved in the development

of internationally agreed standards for data access and

integration services. In the future, it is expected that software

developed by DAIT will become a reference implementation

of the data access and integration standards developed by

DAIS, and ultimately a core part of the Globus Toolkit.

For further information, see:



ODD-Genes –

OGSA-DAI Demo for Genetics

Rob Baxter

Figure 1: Data Resource

Discovery with the

ODD-Genes application


Figure 2: The ODD-Genes

application demonstrator

with the accompanying

OGSA-DAI Visualiser.

The OGSA-DAI Demo for Genetics (ODD-Genes) is a

genetics data analysis application built using data access and

job submission middleware from the OGSA-DAI and Sun

Data and Compute Grid projects at EPCC, running on top

of the Globus toolkit. ODD-Genes enables researchers at

the Scottish Centre for Genomic Technology and Informatics

(GTI) and the MRC Human Genetics Unit in Edinburgh to

automate major microarray data analysis tasks securely and

seamlessly. HPC resources at EPCC are used to perform

tightly linked queries on gene identifiers against remote,

independently managed databases, hugely enriching the

information available on individual genes.

ODD-Genes demonstrates the power of OGSA-DAI by

enabling researchers in genetics to perform new kinds of

data analysis. This greatly enhances their ability to understand

the wealth of data in post-genomic bioinformatics. Also, by

making use of the new Transfer queue Over Globus (TOG)

(see page 2), developed to enhance Sun’s Grid Engine,

ODD-Genes harnesses the results of two EPCC Grid Core

Programme projects to create new ways of doing e-Science.

ODD-Genes enables genetics researchers to:

• Perform high-speed batch analysis of microarray data on the


• Browse the results of previous analyses stored in a database

• View data from arbitrary databases as HTML

• Discover related databases out on the Grid

• Perform coupled queries on newly-discovered databases to

provide a richer analysis of gene data.

OGSA-DAI enables ODD-Genes by providing:

• Automated data discovery through database registries

• Standardised, uniform access to arbitrary databases on the


• Automatic translation of query results from XML to HTML

• The quickest way to make data accessible on the Grid

• The ability to compress large result sets prior to returning


A part of ODD-Genes normally hidden from the user is the

OGSA-DAI Visualiser. This monitors all OGSA-DAI traffic

to and from the ODD-Genes application and displays them

in real time on a separate display. Interactions with databases,

data service factories and registries can all be watched as they

happen, or recorded and played back later for demonstration


For more information on the ODD-Genes, please access the project

website at:



Binary XML:

Describing the data

on the Grid Martin Westhead

To today’s Internet infrastructure, the data that it moves

around is just so many bits. The infrastructure is pretty

good at ensuring that the order and integrity of the bits is

maintained as they are moved. However, the information

about the structure, format and meaning of those bits, which

is essential to do anything with them, is primarily embedded

implicitly in the applications that read the data.

As we build the Grid, the next generation of Internet

infrastructure, it is becoming increasingly clear that this

infrastructure will need to understand more about the

structure and semantics of the bits that it is manipulating.

The Grid community is developing a range of technologies

that will allow the results of a computational job to be sent

to a consumer with complete transparency over where or

even when that job was executed. It seems clear that the

consumer should be able to interpret correctly the results

regardless of the byte-order those results are written in, or

even what coordinates system they might use. In today’s grid,

where the consumer is almost always a human being with

additional knowledge about the data, this information is rarely

critical, because the human ensures necessary conversions

and constraints are applied to the data. However, as

we move to increased automatic data manipulation,

knowing about the data representation becomes


BinX is the Binary XML description language,

which aims to provide a canonical description for

data stored in binary files. BinX exists today as an

XML language and a set of libraries and tools.

The figure illustrates how the BinX file is used

to describe the format of the binary file. The

description file on the left provides an XML

description of the binary file on the right. In

this case the BinX description is identifying an

integer followed by a float followed by an array of


doubles. The description can be further annotated to describe

properties of the values such as the byte order or meaning of

the field.

BinX provides the ability to describe three levels of features in

a binary file:

• The underlying physical representation (eg bit/byte


• The primitive types used (eg IEEE float, integer)

• The structure of the data itself (eg array, list of fields, table).

At this level the language allows new types to be constructed

from the existing primitives.

In many applications we anticipate that BinX descriptions will

be included as part of application specific metadata.

This technology allows the construction of tools that can

read a very wide range of file formats. Such tools could

be presented as Web services and could have functionality

to convert between formats, to extract pieces of the data

(eg slices or diagonals of an array), or to browse the file.

DataBinX, for example, is a tool that generates an XML

version of a binary data file based on the BinX description.

BinX description

Data file

Continued opposite


EPCC joins the Globus Alliance

Mario Antonioletti, Neil Chue Hong, Mark Parsons

EPCC and Prof Malcolm Atkinson

from NeSC have joined the Globus

Alliance. The original Globus Project comprised two US

institutions: the University of Chicago (operator of Argonne

National Laboratory) and the University of Southern

California’s Information Sciences Institute. Now, through

the Globus Alliance, the membership has been extended

to include the University of Edinburgh and the Swedish

Center for Parallel Computer (PDC) at KTH, Stockholm,

which contribute database-integration and security expertise


Like the original Globus Project, the Globus Alliance will be

a tightly integrated consortium dedicated to collaborative

design, development, testing and support of the open source

Globus Toolkit, the de facto standard for Grid software.

This new partnership recognises EPCC’s leading role in the

production of Grid middleware, and the centre’s experience

in software development and training. It also honours the

efforts of EPCC and NeSC within the Global Grid Forum

Database Access and Integration Services (DAIS), the working

group that is writing specifications for this important Grid

area. EPCC, together with IBM UK, have been the primary

implementers of a DAIS reference implementation, OGSA-

DAI, funded by the UK e-Science Grid Core Programme (see

page 8).

Since its inception in 1995 the Globus Project has established

itself as the leading enabler for providing the required

infrastructure, and vision, to make the Grid a reality. The

Globus Toolkit has become the key enabling software that

lets people share resources securely across corporate,

institutional, and geographic boundaries without sacrificing

local autonomy. It has been deployed broadly worldwide

for both science and industry, and has developed a strong

community of contributors and users.

The Globus Alliance’s new governing board, which now

includes Malcolm Atkinson and Mark Parsons of Edinburgh,

is structured to continue this tradition of community

engagement and careful design. This board takes on ultimate

responsibility for Globus Toolkit design and governance.

Closer to the coalface, EPCC project managers and

developers will cooperate with the other members of the

Globus Alliance to take forward the development of a new

generation of Grid technologies.

Binary XML continued

In addition to the library for accessing BinX described

files, BinX provides the utilities for things like automatic

construction of a DataBinX view of the data. These utilities

can be used as part of an end-user application. As the library

is able to parse the BinX document, fetch the data, and

output the results, a BinX application may require very little

application code.

The library is currently being applied to various data

convergence requirements within the Astrogrid project.

Astrogrid deals with a small number of complex, table-based

data formats, including an XML representation and a selfdescribing

binary table format.

BinX is being used in the Astronomy testbed to:

• Convert data between different formats.

• Provide an efficient representation for transportation of

large XML files.

• Provide SAX events for applications – BinX provides

middleware to read data from any specified binary files and

return SAX events to the applications.

• Database query and dispatcher – this allows the querying

of a binary file using a XPath query constructed against the

XML view of the data.

The Data Format Description Language (DFDL – pronounced

daffodil) is a Global Grid Forum standards activity which

is building on BinX, and other work, to provide a general

and extensible platform for describing data formats. As the

DFDL standard emerges BinX will aim to provide a reference


BinX is currently being developed as part of the eDIKT project

( at the National e-Science Centre in Edinburgh. The

first release of a C++ library is now available from:



Adrian Jackson

HPCx, Europe’s largest academic

supercomputer, which is ranked 12th in the

Top500 list of the world’s most powerful

computers, is well established now.

It has been operating smoothly for over nine

months at 80% capacity on average, providing

an invaluable computing resource for research

groups across the UK.

The main focus of HPCx is capability

computing; ie the exploitation of very large

amounts of computing power to solve

individual problems. To facilitate this HPCx

has created a number of teams to liaise

with users, providing users with tools and

techniques to increase the performance of their

applications, as well as actively working on

user applications where requested.

HPCx Teams

HPCx is managed and supported by cross-organisational

teams comprised of staff from both EPCC and CCLRC’s

Daresbury Laboratories. There are four key teams involved

in the everyday operation of HPCx: Application Support,

Terascaling Applications, Software Engineering, and

Operations and Systems. As well as these four support

teams there is also an Applications Outreach team (led by

Dr Richard Blake, Daresbury) focused on increasing the

utilisation of HPCx resources both by existing HPC users,

and communities not currently exploiting HPC resources.

Application Support

Leader: Dr David Henty (EPCC)

The applications support team has the responsibility of

providing frontline support for users. This involves running

the helpdesk that facilitates the solving of users’ problems by

matching individual queries with experienced staff. It also

includes providing training for users, and ensuring that HPCx

is meeting users’ requirements and needs. The team consists

of the members of the other HPCx teams, allowing it to

utilise the expertise of all the support staff.

Terascaling Applications

Leader: Dr Martyn Guest (Daresbury)

Whilst the Applications Support team is charged with solving

users’ problems on a day-to-day basis, the main mechanism

for user collaboration within HPCx is the Terascaling

Applications team. This team is responsible for ensuring

that users can effectively exploit the Teraflop capacity of

HPCx. Its members have a detailed knowledge of the

low-level architecture of the machine (both hardware and

software), allowing them to work with users to maximise

the performance of users’ codes. The aim of the terascaling

team is to help users run their codes on a large fraction of the

machine (ie greater than 512 processors).

Software Engineering

Leader: Dr Stephen Booth (EPCC)

The Terascaling team is responsible for the performance, and

scaling, of user codes. The Software Engineering team, on

the other hand, focuses on providing solutions that ensure

the optimal use of HPCx facilities and minimise systemspecific

features. This involves investigating and improving the

performance of system libraries, enabling HPCx for the Grid,

and facilitating advanced data management techniques.

Operations and Systems

Leader: Mr Mike Brown (EPCC)

The Systems group is responsible for the day-to-day operation

of the machine, from hardware to software, and beyond. As

HPCx is located in a machine room at Daresbury, so is the

Systems group. They provide 24-hour cover and support for

the machine, as well as having responsibility for the smooth

deployment and maintenance of HPCx’s hardware and

software environments.



HPCx and the Future

HPCx has a 6-year, 3-phase, life-cycle. The machine is

currently in its first phase, which consists of 40 IBM p690

Regatta-H frames, each containing 32 1.3 GHz Power4

processors and 32GB of memory, connected via IBM’s Colony

switch. This configuration has a nominal peak performance

of 6.6 TeraFlop/s, and a sustained Linpack performance of

3.24 TeraFlop/s. To ensure that HPCx will remain a worldleading

computational resource, it will undergo two hardware

refreshes, progressing the machine to a peak performance of

22 TeraFlop/s in 2006.

The first hardware refresh is scheduled to take place mid-

2004, taking the machine to 11 TeraFlop/s peak performance,

and will involve upgrading all aspects of the machine. The

current plan is to double the performance of the existing

machine by replacing the Colony switch with IBM’s new

Federation network. The current complement of 40 p690

Regatta frames will be upgraded to 48 Regatta-H+ frames,

containing the new 1.8 GHz Power4 processors.

To facilitate this upgrade, the HPCx team already has two

Regatta-H+ frames for advanced software testing, and will

shortly be given a Federation switch. However, even with

advanced access to these technologies, the hardware refresh

will not be a trivial matter. Co-ordinating the upgrade of

1280 processors, the addition of a further 256 processors,

and the complete change of the switch hardware raises

significant logistical issues. Add to this the complete change

of the software environment necessitated by the hardware

upgrade, and we can see that HPCx’s Operations and Systems

group will have their hands full for the foreseeable future!

Towards Capability Computing


10th December, 2003

The Merrison Lecture Theatre, Daresbury Laboratory

The first HPCx Annual Seminar will take place on Wednesday 10th

December 2003 in the Merrison Lecture Theatre at Daresbury Laboratory.

One of the key challenges for the HPCx service is to deliver on the

capability aspirations of the UK HPC community across a broad

spectrum of scientific and engineering disciplines. Hence the focus of

the seminar will be on achieving terascaling performance on capability


Registration is free for UK academics and should be done online.

For programme details, venue information and a registration form, see:



The MSc in HPC:

a second successful

year David Henty

Since the middle of May our Masters students have been working hard on

their MSc dissertations, covering a wide range of topics in high performance

computing and e-Science. I am very pleased to write that all eight students

successfully completed their dissertations and have been awarded the MSc

in High Performance Computing by the University of Edinburgh.

2003/2004 intake

Fourteen new students started on the MSc in October

this year: five from the UK, four from the EU and five

from overseas. Six of them are entirely self-funded,

although Jianrong Chen (third from the left) managed

to secure the University of Edinburgh China and Hong

Kong Scholarship for 2003. This is some achievement

given that there is only a single award covering all

Masters courses at the entire University.

With three distinctions from 13 students in the first two

years, and the class size continuing to grow, the MSc

programme is clearly going from strength to strength.

For more information on the MSc and details of how to apply,


Forthcoming MSc courses

The courses are open to all and have a limited number of free places for academics. Please

note that the course dates are provisional, and may be subject to change.

Course details and registration can be accessed at:


04–06 Nov Shared Memory Programming

11–13 Nov Message Passing Programming

25–27 Nov Parallel Decomposition


27–29 Jan Object Oriented Programming for HPC

10–12 Feb Hardware, Compilers and Performance


02–04 MarchApplied Numerical Algorithms

13–15 April Exploiting the Computational Grid

EPCC also runs a guest lecture series on Fridays at 3.30pm, which is open to all.



Dissertation Topics

This year’s dissertation titles (from left to right in the group

photo) were:

• Christos Kartsaklis Porting T3E MPI to HPCx using LAPI

• James Dobson

• Xing Chen

• Anjan Pakhira

• Seung Hwan Jin

• Eric Saunders

• Adrian Mouat

• Jake Duthie

Parallelisation of a Large-Scale Java

Simulation Code

Investigation of Parallel Sparse Matrix Solvers

Computational Steering on the Grid Using


Performance Evaluation of Scientific Java


Using the Grid to bring HPC to the Biology



Mixed-Mode Programming on Clustered SMP


Two students (Jake and Christos) submitted excellent work

and were awarded overall Distinctions. Christos is currently

working on a short-term project at EPCC. Two other students

have already taken up e-Science positions elsewhere in the

UK – Anjan as an RA working on Grid Visualisation and Eric

as a PPARC e-Science PhD student.

Bringing the Grid to the Biology Workbench

Eric Saunders

A recently developed

medical imaging technique

called Optical Projection

Tomography has provided

an important new tool

for scientists interested

in the detailed imaging of

3D biological specimens.

However, the reconstruction algorithm that builds the

3D image ‘stack’ from a dataset of captured CCD images

is computationally intensive, creating a bottleneck in the

analysis process. A parallelised implementation of the code,

developed at EPCC, allows significant performance gains

when run across a network of computational nodes.

The project provides an OGSI-compliant Grid service,

implemented using Globus 3 (a state-of-the-art Grid

technology) to demonstrate the feasibility of remote

submission of captured CCD image files to a parallel

reconstruction code. The project has provided a practical,

proof-of-concept demonstration of the Grid service model.

The delivered system allows remote service invocation,

file transfer across a network, and remote invocation

of reconstruction jobs, utilising a standard interface for

compatibility with other Grid services. The Grid services

model was found to provide a good fit to the problem space,

although successful leveraging of the functionality provide by

Globus 3 proved more difficult than anticipated. The system

has the potential to be upgraded to a production Grid in the

future, but this is likely to require significant resources.

Mixed-Mode Programming on a Clustered System

Jake Duthie

Clustered SMP Systems are becoming the architecture of

choice for supercomputers in the HPC industry, and hence

the question of how best to program for such machines is

becoming more important.

This project seeks to analyse one such programming style, the

Mixed-Mode model, which uses both MPI and OpenMP in

a single source to take advantage of the underlying machine

configuration. The primary point of comparison will be with

a Pure MPI version of the parallel implementation, which is

the programming norm for such systems. Four codes were

used in the testing process: a program written specifically for

this project based on a standard iterative algorithm; and three

codes taken from an existing benchmark suite. In addition

to a comparison of the execution times, hardware counters

and other system tools were used where appropriate in order

to develop a complete understanding of the performance

characteristics. The

system used to gather

all of the data for this

project was an IBM

p690 Cluster.

In general, Mixed-

Mode was found to

be a less efficient

programming choice

than Pure MPI,

with the OpenMP threads encountering problems with

both computational scalability and in making effective use

of the communication library. However, one Mixed code

from the benchmark suite was able to obtain a performance

improvement of 35% over its MPI version, because it

employed overlapped communication/computation

functionality, and was also able to replace explicit

communications with direct reads/writes to memory.



MS .NetGrid

Grid Services and Microsoft.NET Daragh


The emerging Open Grid Services Architecture (OGSA)

and its underlying infrastructure – the Open Grid

Services Infrastructure (OGSI) – between them define

common and open standards for developing services to be

made available on the Grid.

These specifications define the operation of Grid services

in terms of well-established Web services technologies.

Microsoft .NET is a platform for developing software in a

highly distributed environment, using web services. There

is thus considerable interest in the possibility of leveraging

.NET in Grid services.

This collaboration between Microsoft Research Limited

and NeSC (represented in this project by EPCC and

funded by the UK DTI) aims to exploit this interest by:

• Developing an implementation of OGSI using .NET


• Developing a suite of Grid Service demonstrators

– including OGSA-DAI demonstrators – that can be

deployed under this OGSI implementation

• Developing training courses and materials to educate

and inform the UK e-Science community about .NET

and its applicability to Grid applications

• Delivering training courses to delegates from the UK

e-Science community.

applicability of Microsoft .NET technologies to the

hosting, development and deployment of Grid services.

A complementary goal is facilitating understanding of the

Grid and e-Science within Microsoft.

Our project – which started in March 2003 – has a

duration of 12 months and will make its deliverables

freely available to encourage further use of Microsoft

.NET within the UK e-Science programme. In addition,

to facilitate the development of an OGSI .NET

community, we are fostering close links with Globus,

as well as with the Grid Computing Group at the

University of Virginia, who are also developing an OGSI

implementation on .NET.

Our first ‘OGSI and Microsoft .NET’ training course

was held at the National e-Science Centre in Edinburgh

on September the 9th and 10th 2003. The course was

well received by the attendees, who felt it aided their

understanding of both OGSI and .NET. Further courses

will be held in November 2003, January 2004 and

February 2004.

For further information on these courses, including registration

details, visit the NeSC website at:

Further details of the MS.NETGrid project, our software and our

training materials, are available at:

The goal of our project is to provide a practical

demonstration to the UK e-Science community of the


More magazines by this user
Similar magazines