21.01.2014 Views

Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...

Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...

Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

commonly used to h<strong>and</strong>le temporal violations. For example, in a scientific cloud<br />

computing environment, the underlying cloud infrastructure can provide unlimited<br />

scalable resource on dem<strong>and</strong> at workflow runtime. Therefore, we can add a new<br />

resource (i.e. a new virtual machine) to execute the local workflow segment for the<br />

violated workflow instances. In that case, these workflow activities can be given<br />

higher priorities (to decrease the waiting time) <strong>and</strong>/or higher execution speed (to<br />

decrease the execution time), <strong>and</strong> hence the existing time deficits can be<br />

compensated. To realise that, all the computation tasks <strong>and</strong> input data <strong>of</strong> these<br />

activities need to be reallocated to the new resources.<br />

Additional monetary cost <strong>and</strong> time overheads are required for adding a new<br />

resource at workflow runtime. The monetary cost for adding a new resource in the<br />

pay-for-usage cloud computing environment is mainly equal to the transfer cost<br />

since it is normally free for setting up new resources; the total time overheads<br />

consist <strong>of</strong> the transfer time for the data <strong>and</strong> the set-up time for a new service which<br />

is normally around several minutes as in Amazon Elastic Compute Cloud (EC2,<br />

http://aws.amazon.com/ec2/) given the load <strong>of</strong> the system <strong>and</strong> network.<br />

Stop <strong>and</strong> restart: stop <strong>and</strong> restart consist <strong>of</strong> two basic steps. The stop step is to<br />

stop the execution at the current checkpoint <strong>and</strong> store all the running data. The<br />

restart step is to restart the application at a new set <strong>of</strong> resources. Although the<br />

strategy is very flexible, the natural stop, migrate <strong>and</strong> restart approach to<br />

rescheduling can be expensive: each migration event may involve large volume <strong>of</strong><br />

data transfers. Moreover, restarting the application can incur expensive startup costs,<br />

<strong>and</strong> significant application modifications may be required (with human interventions)<br />

for specialised restart code. As demonstrated in [28], the overhead for this strategy<br />

includes at least the following aspects: resource selection, performance modelling,<br />

computing resource overhead, application start, checkpoint reading. Among them,<br />

the time for reading checkpoints dominates the rescheduling overhead, as it involves<br />

moving data across the Internet <strong>and</strong> redistributing data to more processors.<br />

According to their experiments, the average overhead for the strategy is around 500<br />

seconds.<br />

Processor swapping: the basic idea <strong>of</strong> processor swapping is as follows:<br />

applications are launched with the reservations <strong>of</strong> more machines than actually used<br />

at workflow build time; <strong>and</strong> then at runtime, constantly monitor <strong>and</strong> periodically<br />

126

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!