01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

166 M.F. Dolz et al.<br />

This daemon also updates the wait<strong>in</strong>g time (both per user and per queue) <strong>of</strong><br />

all enqueued jobs and ensures that the current database size does not exceed the<br />

specified maximum size.<br />

3.2 Activation and Deactivation Conditions<br />

Node activation. This operation is performed us<strong>in</strong>g the ether-wake command<br />

[18] which sends the magic packet WOL. Nodes can be turned on if any <strong>of</strong> the<br />

follow<strong>in</strong>g conditions are met:<br />

– There are not enough appropriate active resources to run a job. That is, as<br />

soon as the system detects that a job does not have enough resources, because<br />

all the nodes that conta<strong>in</strong> the appropriate type <strong>of</strong> resource are turned <strong>of</strong>f,<br />

nodes are powered on to serve the request.<br />

– The average wait<strong>in</strong>g time <strong>of</strong> an enqueued job exceeds a given threshold. The<br />

adm<strong>in</strong>istrator must def<strong>in</strong>e a maximum average wait<strong>in</strong>g time <strong>in</strong> queue for<br />

the jobs <strong>of</strong> each group. When the average wait<strong>in</strong>g time <strong>of</strong> an enqueued job<br />

exceeds the maximum value assigned to the correspond<strong>in</strong>g user’s group, the<br />

system will turn on nodes which conta<strong>in</strong> resources <strong>of</strong> the same type as those<br />

usually requested by the same user.<br />

– The number <strong>of</strong> enqueued jobs for a user exceeds the maximum value for its<br />

group. In this case, the daemon selects and switches on nodes which feature<br />

the properties required by most <strong>of</strong> the enqueued jobs.<br />

When the magic packet is sent, the daemon for activation/deactivation actions<br />

starts a timer. If this daemon does not detect that the node is active after the<br />

timer expires, the node is automatically marked as unavailable.<br />

The system adm<strong>in</strong>istrator can also use the follow<strong>in</strong>g options to select the<br />

(candidate) nodes that will be activated:<br />

– Ordered: The list <strong>of</strong> candidate nodes is sorted <strong>in</strong> alphabetical order us<strong>in</strong>g<br />

the name <strong>of</strong> the node (hostname).<br />

– Randomize: The list <strong>of</strong> candidate nodes is sorted randomly.<br />

– Balanced: The list <strong>of</strong> candidate nodes is sorted accord<strong>in</strong>g to the period that<br />

the nodes were active dur<strong>in</strong>g the last t hours (with t sets by the adm<strong>in</strong>istrator).<br />

The nodes that are selected to be powered on are among those which<br />

have been <strong>in</strong>active a longer period.<br />

– Prioritized: The list <strong>of</strong> candidate nodes is ordered us<strong>in</strong>g a priority assigned<br />

by the system adm<strong>in</strong>istrator. This priority can be def<strong>in</strong>ed, e.g., accord<strong>in</strong>g to<br />

the location <strong>of</strong> the node with respect to the flow <strong>of</strong> cool air [19].<br />

In the context <strong>of</strong> the SGE queue system, the slots <strong>of</strong> a queue <strong>in</strong>stance for a given<br />

a node <strong>in</strong>dicate the maximum number <strong>of</strong> jobs that can be executed concurrently<br />

<strong>in</strong> that node. When an exclusive execution is required (as, e.g., is usual <strong>in</strong> HPC<br />

clusters), the number <strong>of</strong> slots equals the number <strong>of</strong> processors. The daemon can<br />

also specify a strict threshold to power on nodes to serve job requirements:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!