Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...
Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...
Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
commonly used to h<strong>and</strong>le temporal violations. For example, in a scientific cloud<br />
computing environment, the underlying cloud infrastructure can provide unlimited<br />
scalable resource on dem<strong>and</strong> at workflow runtime. Therefore, we can add a new<br />
resource (i.e. a new virtual machine) to execute the local workflow segment for the<br />
violated workflow instances. In that case, these workflow activities can be given<br />
higher priorities (to decrease the waiting time) <strong>and</strong>/or higher execution speed (to<br />
decrease the execution time), <strong>and</strong> hence the existing time deficits can be<br />
compensated. To realise that, all the computation tasks <strong>and</strong> input data <strong>of</strong> these<br />
activities need to be reallocated to the new resources.<br />
Additional monetary cost <strong>and</strong> time overheads are required for adding a new<br />
resource at workflow runtime. The monetary cost for adding a new resource in the<br />
pay-for-usage cloud computing environment is mainly equal to the transfer cost<br />
since it is normally free for setting up new resources; the total time overheads<br />
consist <strong>of</strong> the transfer time for the data <strong>and</strong> the set-up time for a new service which<br />
is normally around several minutes as in Amazon Elastic Compute Cloud (EC2,<br />
http://aws.amazon.com/ec2/) given the load <strong>of</strong> the system <strong>and</strong> network.<br />
Stop <strong>and</strong> restart: stop <strong>and</strong> restart consist <strong>of</strong> two basic steps. The stop step is to<br />
stop the execution at the current checkpoint <strong>and</strong> store all the running data. The<br />
restart step is to restart the application at a new set <strong>of</strong> resources. Although the<br />
strategy is very flexible, the natural stop, migrate <strong>and</strong> restart approach to<br />
rescheduling can be expensive: each migration event may involve large volume <strong>of</strong><br />
data transfers. Moreover, restarting the application can incur expensive startup costs,<br />
<strong>and</strong> significant application modifications may be required (with human interventions)<br />
for specialised restart code. As demonstrated in [28], the overhead for this strategy<br />
includes at least the following aspects: resource selection, performance modelling,<br />
computing resource overhead, application start, checkpoint reading. Among them,<br />
the time for reading checkpoints dominates the rescheduling overhead, as it involves<br />
moving data across the Internet <strong>and</strong> redistributing data to more processors.<br />
According to their experiments, the average overhead for the strategy is around 500<br />
seconds.<br />
Processor swapping: the basic idea <strong>of</strong> processor swapping is as follows:<br />
applications are launched with the reservations <strong>of</strong> more machines than actually used<br />
at workflow build time; <strong>and</strong> then at runtime, constantly monitor <strong>and</strong> periodically<br />
126