Iraqi Kurdistan All in the Timing
GEO_ExPro_v12i6
GEO_ExPro_v12i6
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
As a rough simplification, <strong>the</strong> modern<br />
supercomputer is divided up <strong>in</strong>to nodes. Each<br />
node may have multiple processors which are<br />
<strong>the</strong> hearts of <strong>the</strong> node. Each processor has<br />
access to memory on its node – but does not<br />
have access to <strong>the</strong> memory on <strong>the</strong> o<strong>the</strong>r nodes.<br />
Communication of <strong>in</strong>formation can occur<br />
between processors with<strong>in</strong> or across nodes.<br />
Communication across nodes takes place<br />
through a high-speed network, but is usually<br />
considerably slower than communication<br />
with<strong>in</strong> a node. Data storage has to be accessible<br />
from all nodes <strong>in</strong> order to <strong>in</strong>put and output<br />
data. In order to benefit from a supercomputer,<br />
<strong>the</strong> programs need to utilise <strong>the</strong> resources<br />
of multiple nodes <strong>in</strong> parallel. The network<br />
connects <strong>the</strong> nodes to make larger parallel<br />
computer clusters.<br />
Node 1 Node 2 Node 3 Node n<br />
Memory Memory Memory Memory<br />
p1 p2 p3 pn p1 p2 p3 pn p1 p2 p3 pn p1 p2 p3 pn<br />
processors processors processors processors<br />
NETWORK<br />
<strong>in</strong>dependent sub-problems, and some<br />
sort of communication must take place<br />
between nodes. A common example<br />
is a problem where each processor is<br />
assigned an <strong>in</strong>dependent computation,<br />
but where <strong>the</strong> results from all processors<br />
must be comb<strong>in</strong>ed to form <strong>the</strong> solution.<br />
A simple example of such a problem<br />
is <strong>the</strong> evaluation of an area under a<br />
curve. Numerically, this is usually<br />
solved by divid<strong>in</strong>g <strong>the</strong> area up <strong>in</strong>to small<br />
rectangles: comput<strong>in</strong>g <strong>the</strong> area for each<br />
rectangle and <strong>the</strong>n summ<strong>in</strong>g toge<strong>the</strong>r all<br />
<strong>in</strong>dividual rectangle areas. Table 1 shows<br />
an excerpt from a C program perform<strong>in</strong>g<br />
this computation on a s<strong>in</strong>gle processor.<br />
Basically <strong>the</strong> programmer has used a loop<br />
to perform <strong>the</strong> computation of <strong>the</strong> area of<br />
each small rectangle, start<strong>in</strong>g with <strong>the</strong> first<br />
rectangle and sequentially add<strong>in</strong>g toge<strong>the</strong>r<br />
all rectangles to form <strong>the</strong> total area.<br />
Solv<strong>in</strong>g <strong>the</strong> same problem <strong>in</strong> a parallel<br />
way on a supercomputer with many<br />
processors requires some tools not<br />
present <strong>in</strong> conventional programm<strong>in</strong>g<br />
languages. Although a large number of<br />
computer languages have been designed<br />
and implemented for this purpose, one<br />
of <strong>the</strong> most popular approaches consists<br />
of us<strong>in</strong>g a conventional programm<strong>in</strong>g<br />
language, but extend<strong>in</strong>g <strong>the</strong> language<br />
Table 1: An excerpt from a program written <strong>in</strong><br />
<strong>the</strong> C language to compute <strong>the</strong> area under a<br />
curve def<strong>in</strong>ed by <strong>the</strong> function f(x)=4/(1+x 2 ) <strong>in</strong><br />
<strong>the</strong> <strong>in</strong>terval [0,1] us<strong>in</strong>g <strong>the</strong> rectangle (midpo<strong>in</strong>t)<br />
method; n is <strong>the</strong> number of rectangles used and<br />
h is <strong>the</strong> width of each rectangle. The third l<strong>in</strong>e is<br />
<strong>the</strong> start of a loop do<strong>in</strong>g <strong>the</strong> area computation<br />
and i is a variable loop<strong>in</strong>g from 1 and up to n. The<br />
expression i++ means that i is <strong>in</strong>cremented by<br />
one for each time <strong>the</strong> loop is executed. The last<br />
l<strong>in</strong>e computes <strong>the</strong> f<strong>in</strong>al answer, which <strong>in</strong> this case<br />
is <strong>the</strong> first few digits of <strong>the</strong> famous number, pi.<br />
with a small collection of functions<br />
capable of send<strong>in</strong>g and synchronis<strong>in</strong>g<br />
data or messages between processors, a<br />
so-called message pass<strong>in</strong>g <strong>in</strong>terface or<br />
commonly referred to as MPI. It turns<br />
out that by us<strong>in</strong>g this simple approach a<br />
large number of parallel problems can be<br />
efficiently programmed.<br />
The example given <strong>in</strong> <strong>the</strong> first table<br />
can be parallelised by <strong>the</strong> computer<br />
n=1000000;<br />
h=1/n;<br />
for (i = 1; i