New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.4. QAD GRID WORKER 133<br />
Figure 5.4.9: Time needed <strong>for</strong> <strong>the</strong> OS to cycle through <strong>the</strong> active thread (yield) vs.<br />
number <strong>of</strong> running threads. This shows that <strong>the</strong>re exists an almost linear relationship.<br />
Data was created on a 2.3GHz single-core machine (2GB RAM) running Debian Linux<br />
and Windows Vista, respectively.<br />
5.4.4 Worker Migration<br />
So far, we presented a system that can collect and manage jobs that are<br />
<strong>the</strong>n requested by workers running on some client machine per<strong>for</strong>ming <strong>the</strong><br />
computation. At <strong>the</strong> moment a new job becomes available it is easy <strong>for</strong> a<br />
client machine to decide whe<strong>the</strong>r to request or to skip it, given that a worker<br />
has knowledge about <strong>the</strong> current system state (e.g. CPU utilization) <strong>of</strong> its<br />
host computer. This concept is known as static scheduling, an early approach<br />
being <strong>the</strong> Linda system (Carriero and Gelernter, 1986).<br />
The problem now is that <strong>the</strong>se conditions can (and mostly will) change<br />
over time. So using static scheduling might be a good idea within a dedicated<br />
cluster environment but as soon as user’s workstations are integrated<br />
<strong>the</strong> following obvious problem arises: A user, say Alice, integrates her desktop<br />
workstation into <strong>the</strong> Grid and goes away <strong>for</strong> some time. Ano<strong>the</strong>r User, say<br />
Bob, creates a job in <strong>the</strong> Grid system that is <strong>the</strong>n requested by <strong>the</strong> worker<br />
running on Alice’s machine. The trouble comes if Alice returns to her workstation<br />
be<strong>for</strong>e Bob’s job has finished. If static scheduling is used, <strong>the</strong> options are<br />
limited: (a) we can allow Bob’s job to finish which will make Alice unhappy<br />
because her system will be slow or (b) terminate Bob’s job that will make Bob<br />
unhappy because he loses work already done.<br />
In <strong>the</strong> QAD Grid system it is <strong>of</strong> primary importance that an owner <strong>of</strong> a<br />
workstation does not pay penalty <strong>for</strong> integrating its machine into <strong>the</strong> Grid, that<br />
is running a worker on his/her machine. So, workers must have <strong>the</strong> ability to<br />
vacate a workstation (or at least pause computation) if a users starts working<br />
on that machine and/or wants <strong>the</strong> worker to quit. In <strong>the</strong> remaining <strong>of</strong> this<br />
section we will describe a dynamic scheduling approach that is able to satisfy<br />
both, Alice and Bob. The key idea is that when Alice returns and starts using<br />
her machine Bob’s job is paused, transferred to ano<strong>the</strong>r workstation that is idle<br />
at that moment and continued from <strong>the</strong> point it was stopped. This concept<br />
holds also <strong>for</strong> multi-core systems where <strong>the</strong> load <strong>of</strong> one <strong>of</strong> <strong>the</strong> cores might get<br />
too high.<br />
Worker migration (also known as Process Migration) is <strong>the</strong> process <strong>of</strong> freezing<br />
a workers state, sending that state along with <strong>the</strong> worker to ano<strong>the</strong>r idle<br />
workstation, start that worker at that workstation initialized with <strong>the</strong> restored