08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

132 CHAPTER 5. COMPUTER SCIENCE GRID STRATEGIES<br />

currently running on this machine be<strong>for</strong>e returning to <strong>the</strong> worker thread<br />

15 (see below <strong>for</strong> details).<br />

Connection quality: The connection quality is measured with respect to<br />

<strong>the</strong> RTT - <strong>the</strong> time a data packet needs to travel from <strong>the</strong> worker to <strong>the</strong><br />

plat<strong>for</strong>m server and back.<br />

Each time a status update is per<strong>for</strong>med a timestamp is automatically set to this<br />

database entry. There<strong>for</strong>e, <strong>the</strong> time <strong>of</strong> <strong>the</strong> latest update can be determined. If<br />

<strong>the</strong> status has not been sent within <strong>the</strong> last 300 seconds a client is considered<br />

lost and will be logged out.<br />

Workload Determination<br />

As stated above, we define <strong>the</strong> machine’s local load as <strong>the</strong> time needed by<br />

<strong>the</strong> operating system to cycle through all processes currently running on this<br />

machine excluding <strong>the</strong> worker process. In our worker reference implementation<br />

we use <strong>the</strong> Java method Thread.yield() 16 that<br />

“Causes <strong>the</strong> currently executing thread object to temporarily pause<br />

and allow o<strong>the</strong>r threads to execute.” (SUNMicrosystems, 2006)<br />

So, if a thread executes yield, it is suspended and <strong>the</strong> CPU is given to some<br />

o<strong>the</strong>r runnable thread. It <strong>the</strong>n sleeps until <strong>the</strong> CPU becomes available again.<br />

Technically put, <strong>the</strong> executing thread is put back into <strong>the</strong> ready queue <strong>of</strong> <strong>the</strong><br />

processor and waits <strong>for</strong> its next turn.<br />

This means, if <strong>the</strong> worker is <strong>the</strong> only (high priority) program (thread) on<br />

a machine <strong>the</strong> time needed <strong>for</strong> <strong>the</strong> yield will be almost zero because <strong>the</strong>re is<br />

no o<strong>the</strong>r program that will consume time. The beauty <strong>of</strong> this approach is<br />

that <strong>the</strong> CPU utilization can be well close to 100% but if caused exclusively<br />

by <strong>the</strong> worker <strong>the</strong> local load is about zero, because <strong>the</strong> worker thread is <strong>the</strong><br />

only program running. On <strong>the</strong> o<strong>the</strong>r hand, if <strong>the</strong>re are many CPU intensive<br />

processes running on <strong>the</strong> host machine it will take a long time <strong>for</strong> <strong>the</strong> worker<br />

thread to get back control. This time is measured and can be directly related<br />

to <strong>the</strong> number <strong>of</strong> running o<strong>the</strong>r threads and <strong>the</strong>ir CPU consumption as can<br />

be seen in Figure 5.4.9. Fur<strong>the</strong>r, this tool gives us only <strong>the</strong> utilization <strong>of</strong> <strong>the</strong><br />

processor core we are really working on.<br />

O<strong>the</strong>r measure such as CPU utilization (Windows) or workload (Unix)<br />

have <strong>the</strong> disadvantage that (a) it cannot be distinguished what process causes<br />

<strong>the</strong> load (is it us or <strong>the</strong> o<strong>the</strong>rs?) and (b) it is not entirely clear what is actually<br />

measured. For example, <strong>the</strong> average load in <strong>the</strong> Linux world (which is <strong>of</strong>ten<br />

mistakenly taken as <strong>the</strong> CPU utilization by many benchmarks) is actually an<br />

exponentially-damped moving average <strong>of</strong> <strong>the</strong> total CPU queue length (see e.g.<br />

(O’Reilly et al., 1997)).<br />

15 A thread in computer science is short <strong>for</strong> a thread <strong>of</strong> execution. Threads are a way <strong>for</strong> a<br />

program to <strong>for</strong>k (or split) itself into two or more simultaneously (or pseudo-simultaneously)<br />

running tasks. Threads and processes differ from one operating system to ano<strong>the</strong>r but, in<br />

general, a thread is contained inside a process and different threads <strong>of</strong> <strong>the</strong> same process share<br />

some resources while different processes do not. Multiple threads can be executed in parallel<br />

on many computer systems. This multithreading generally occurs by time slicing (similar to<br />

time-division multiplexing), wherein a single processor switches between different threads,<br />

in which case <strong>the</strong> processing is not literally simultaneous, <strong>for</strong> <strong>the</strong> single processor is really<br />

doing only one thing at a time.<br />

16 The yield() method is available in almost all programming languages that provide <strong>the</strong><br />

thread concept.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!