01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

76 M. Bonn and H. Schmeck<br />

2.2 Node Monitor<strong>in</strong>g<br />

A central assumption dur<strong>in</strong>g the design <strong>of</strong> JoSchKa was the heterogeneity <strong>of</strong> the<br />

nodes. They not only differ with respect to their hard- and s<strong>of</strong>tware configuration, but<br />

also <strong>in</strong> their reliability characteristics. Some nodes process every job successfully but<br />

others break down <strong>in</strong> an unpredictable manner. They may also change their reliability<br />

or switch from unreliable to robust and vice versa. Imag<strong>in</strong>e a PC <strong>in</strong> a student pool,<br />

which runs nonstop for many hours <strong>in</strong> the night, but gets rebooted very <strong>of</strong>ten at daytime,<br />

when students use this PC.<br />

To assess the behavior <strong>of</strong> the nodes a monitor<strong>in</strong>g component was developed. The<br />

goal is to reach the ability to predict the behavior <strong>of</strong> the node <strong>in</strong> the near future [5].<br />

Amongst others, the follow<strong>in</strong>g values are monitored and calculated for each node,<br />

respectively:<br />

• �: The relative CPU power <strong>of</strong> the node compared to the others, ���. • ���: The average uptime <strong>in</strong> wall clock m<strong>in</strong>utes, ��� � �.<br />

• ���: The current uptime <strong>in</strong> wall clock m<strong>in</strong>utes, ��� � �.<br />

• �: A synthetic reliability <strong>in</strong>dex <strong>of</strong> the node, ���−1,1�. In the follow<strong>in</strong>g, let ������,…,��� be the exponential weighted average, which<br />

processes the s<strong>in</strong>gle <strong>in</strong>put values �� consider<strong>in</strong>g their age. It is def<strong>in</strong>ed recursively:<br />

������,…,��� =��� =���� �1 −���� ��� with ���0,1� and �� � =��. The �� are<br />

sorted by age, �� is the youngest. More formally, the values for a s<strong>in</strong>gle node<br />

���1, … , �� are calculated as follows:<br />

• �� = ∑ �� �<br />

��� �<br />

, where ��� denotes the time span (<strong>in</strong> wall clock milliseconds) the<br />

���� �<br />

node needs to process a fixed arithmetic-logical benchmark. This benchmark is<br />

done by the node <strong>in</strong> periodic <strong>in</strong>tervals.<br />

• ���� =������� ,…,���, where the �� are the nodes’ last �� uptimes.<br />

• ���� =���� −��, where ���� denotes the current wall clock time and �� denotes<br />

the start<strong>in</strong>g wall clock time <strong>of</strong> the node.<br />

• �� =������� ,..,���, where the �� are reliability criteria for the last �� processed jobs. If a job is done successfully, then �� =1, if failed �� =−1.<br />

In practice, we chose �� =�� =10 and � =0.25. These data which are collected by<br />

the server for each s<strong>in</strong>gle node can be summarized as follows: For each node, we now<br />

know,<br />

• how fast it can work compared to all other nodes (�),<br />

• how long it is available (on average) before it breaks down (���),<br />

• how long it is up s<strong>in</strong>ce its last boot (���) and<br />

• how reliable it behaved when process<strong>in</strong>g the last jobs (�).<br />

This knowledge is used by the different distribution strategies, which are described <strong>in</strong><br />

the follow<strong>in</strong>g section.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!