28.08.2016 Views

Iraqi Kurdistan All in the Timing

GEO_ExPro_v12i6

GEO_ExPro_v12i6

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

As a rough simplification, <strong>the</strong> modern<br />

supercomputer is divided up <strong>in</strong>to nodes. Each<br />

node may have multiple processors which are<br />

<strong>the</strong> hearts of <strong>the</strong> node. Each processor has<br />

access to memory on its node – but does not<br />

have access to <strong>the</strong> memory on <strong>the</strong> o<strong>the</strong>r nodes.<br />

Communication of <strong>in</strong>formation can occur<br />

between processors with<strong>in</strong> or across nodes.<br />

Communication across nodes takes place<br />

through a high-speed network, but is usually<br />

considerably slower than communication<br />

with<strong>in</strong> a node. Data storage has to be accessible<br />

from all nodes <strong>in</strong> order to <strong>in</strong>put and output<br />

data. In order to benefit from a supercomputer,<br />

<strong>the</strong> programs need to utilise <strong>the</strong> resources<br />

of multiple nodes <strong>in</strong> parallel. The network<br />

connects <strong>the</strong> nodes to make larger parallel<br />

computer clusters.<br />

Node 1 Node 2 Node 3 Node n<br />

Memory Memory Memory Memory<br />

p1 p2 p3 pn p1 p2 p3 pn p1 p2 p3 pn p1 p2 p3 pn<br />

processors processors processors processors<br />

NETWORK<br />

<strong>in</strong>dependent sub-problems, and some<br />

sort of communication must take place<br />

between nodes. A common example<br />

is a problem where each processor is<br />

assigned an <strong>in</strong>dependent computation,<br />

but where <strong>the</strong> results from all processors<br />

must be comb<strong>in</strong>ed to form <strong>the</strong> solution.<br />

A simple example of such a problem<br />

is <strong>the</strong> evaluation of an area under a<br />

curve. Numerically, this is usually<br />

solved by divid<strong>in</strong>g <strong>the</strong> area up <strong>in</strong>to small<br />

rectangles: comput<strong>in</strong>g <strong>the</strong> area for each<br />

rectangle and <strong>the</strong>n summ<strong>in</strong>g toge<strong>the</strong>r all<br />

<strong>in</strong>dividual rectangle areas. Table 1 shows<br />

an excerpt from a C program perform<strong>in</strong>g<br />

this computation on a s<strong>in</strong>gle processor.<br />

Basically <strong>the</strong> programmer has used a loop<br />

to perform <strong>the</strong> computation of <strong>the</strong> area of<br />

each small rectangle, start<strong>in</strong>g with <strong>the</strong> first<br />

rectangle and sequentially add<strong>in</strong>g toge<strong>the</strong>r<br />

all rectangles to form <strong>the</strong> total area.<br />

Solv<strong>in</strong>g <strong>the</strong> same problem <strong>in</strong> a parallel<br />

way on a supercomputer with many<br />

processors requires some tools not<br />

present <strong>in</strong> conventional programm<strong>in</strong>g<br />

languages. Although a large number of<br />

computer languages have been designed<br />

and implemented for this purpose, one<br />

of <strong>the</strong> most popular approaches consists<br />

of us<strong>in</strong>g a conventional programm<strong>in</strong>g<br />

language, but extend<strong>in</strong>g <strong>the</strong> language<br />

Table 1: An excerpt from a program written <strong>in</strong><br />

<strong>the</strong> C language to compute <strong>the</strong> area under a<br />

curve def<strong>in</strong>ed by <strong>the</strong> function f(x)=4/(1+x 2 ) <strong>in</strong><br />

<strong>the</strong> <strong>in</strong>terval [0,1] us<strong>in</strong>g <strong>the</strong> rectangle (midpo<strong>in</strong>t)<br />

method; n is <strong>the</strong> number of rectangles used and<br />

h is <strong>the</strong> width of each rectangle. The third l<strong>in</strong>e is<br />

<strong>the</strong> start of a loop do<strong>in</strong>g <strong>the</strong> area computation<br />

and i is a variable loop<strong>in</strong>g from 1 and up to n. The<br />

expression i++ means that i is <strong>in</strong>cremented by<br />

one for each time <strong>the</strong> loop is executed. The last<br />

l<strong>in</strong>e computes <strong>the</strong> f<strong>in</strong>al answer, which <strong>in</strong> this case<br />

is <strong>the</strong> first few digits of <strong>the</strong> famous number, pi.<br />

with a small collection of functions<br />

capable of send<strong>in</strong>g and synchronis<strong>in</strong>g<br />

data or messages between processors, a<br />

so-called message pass<strong>in</strong>g <strong>in</strong>terface or<br />

commonly referred to as MPI. It turns<br />

out that by us<strong>in</strong>g this simple approach a<br />

large number of parallel problems can be<br />

efficiently programmed.<br />

The example given <strong>in</strong> <strong>the</strong> first table<br />

can be parallelised by <strong>the</strong> computer<br />

n=1000000;<br />

h=1/n;<br />

for (i = 1; i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!