New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
132 CHAPTER 5. COMPUTER SCIENCE GRID STRATEGIES<br />
currently running on this machine be<strong>for</strong>e returning to <strong>the</strong> worker thread<br />
15 (see below <strong>for</strong> details).<br />
Connection quality: The connection quality is measured with respect to<br />
<strong>the</strong> RTT - <strong>the</strong> time a data packet needs to travel from <strong>the</strong> worker to <strong>the</strong><br />
plat<strong>for</strong>m server and back.<br />
Each time a status update is per<strong>for</strong>med a timestamp is automatically set to this<br />
database entry. There<strong>for</strong>e, <strong>the</strong> time <strong>of</strong> <strong>the</strong> latest update can be determined. If<br />
<strong>the</strong> status has not been sent within <strong>the</strong> last 300 seconds a client is considered<br />
lost and will be logged out.<br />
Workload Determination<br />
As stated above, we define <strong>the</strong> machine’s local load as <strong>the</strong> time needed by<br />
<strong>the</strong> operating system to cycle through all processes currently running on this<br />
machine excluding <strong>the</strong> worker process. In our worker reference implementation<br />
we use <strong>the</strong> Java method Thread.yield() 16 that<br />
“Causes <strong>the</strong> currently executing thread object to temporarily pause<br />
and allow o<strong>the</strong>r threads to execute.” (SUNMicrosystems, 2006)<br />
So, if a thread executes yield, it is suspended and <strong>the</strong> CPU is given to some<br />
o<strong>the</strong>r runnable thread. It <strong>the</strong>n sleeps until <strong>the</strong> CPU becomes available again.<br />
Technically put, <strong>the</strong> executing thread is put back into <strong>the</strong> ready queue <strong>of</strong> <strong>the</strong><br />
processor and waits <strong>for</strong> its next turn.<br />
This means, if <strong>the</strong> worker is <strong>the</strong> only (high priority) program (thread) on<br />
a machine <strong>the</strong> time needed <strong>for</strong> <strong>the</strong> yield will be almost zero because <strong>the</strong>re is<br />
no o<strong>the</strong>r program that will consume time. The beauty <strong>of</strong> this approach is<br />
that <strong>the</strong> CPU utilization can be well close to 100% but if caused exclusively<br />
by <strong>the</strong> worker <strong>the</strong> local load is about zero, because <strong>the</strong> worker thread is <strong>the</strong><br />
only program running. On <strong>the</strong> o<strong>the</strong>r hand, if <strong>the</strong>re are many CPU intensive<br />
processes running on <strong>the</strong> host machine it will take a long time <strong>for</strong> <strong>the</strong> worker<br />
thread to get back control. This time is measured and can be directly related<br />
to <strong>the</strong> number <strong>of</strong> running o<strong>the</strong>r threads and <strong>the</strong>ir CPU consumption as can<br />
be seen in Figure 5.4.9. Fur<strong>the</strong>r, this tool gives us only <strong>the</strong> utilization <strong>of</strong> <strong>the</strong><br />
processor core we are really working on.<br />
O<strong>the</strong>r measure such as CPU utilization (Windows) or workload (Unix)<br />
have <strong>the</strong> disadvantage that (a) it cannot be distinguished what process causes<br />
<strong>the</strong> load (is it us or <strong>the</strong> o<strong>the</strong>rs?) and (b) it is not entirely clear what is actually<br />
measured. For example, <strong>the</strong> average load in <strong>the</strong> Linux world (which is <strong>of</strong>ten<br />
mistakenly taken as <strong>the</strong> CPU utilization by many benchmarks) is actually an<br />
exponentially-damped moving average <strong>of</strong> <strong>the</strong> total CPU queue length (see e.g.<br />
(O’Reilly et al., 1997)).<br />
15 A thread in computer science is short <strong>for</strong> a thread <strong>of</strong> execution. Threads are a way <strong>for</strong> a<br />
program to <strong>for</strong>k (or split) itself into two or more simultaneously (or pseudo-simultaneously)<br />
running tasks. Threads and processes differ from one operating system to ano<strong>the</strong>r but, in<br />
general, a thread is contained inside a process and different threads <strong>of</strong> <strong>the</strong> same process share<br />
some resources while different processes do not. Multiple threads can be executed in parallel<br />
on many computer systems. This multithreading generally occurs by time slicing (similar to<br />
time-division multiplexing), wherein a single processor switches between different threads,<br />
in which case <strong>the</strong> processing is not literally simultaneous, <strong>for</strong> <strong>the</strong> single processor is really<br />
doing only one thing at a time.<br />
16 The yield() method is available in almost all programming languages that provide <strong>the</strong><br />
thread concept.