21.01.2014 Views

Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...

Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...

Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

k 1<br />

∑ − Mean i + 1 − Mean i<br />

Rank ( k)<br />

= i = 1<br />

k −1<br />

Formula 4.1<br />

The theoretical basis for MaxSDev is Tchebysheff’s theorem [87] which has<br />

been widely used in the statistics theory. According to the theorem, given a number<br />

2<br />

d greater than or equal to 1 <strong>and</strong> a set <strong>of</strong> n samples, at least [ 1 ( 1 ) ]<br />

− <strong>of</strong> the<br />

d<br />

samples will lie within d st<strong>and</strong>ard deviations <strong>of</strong> their mean, no matter what the<br />

actual probability distribution is. For example, 88.89% <strong>of</strong> the samples will fall into<br />

the interval <strong>of</strong> ( µ − 3 σ , µ + 3σ<br />

)<br />

. The value <strong>of</strong> µ <strong>and</strong> σ can be estimated by the<br />

k<br />

sample mean <strong>and</strong> sample st<strong>and</strong>ard deviation as µ = ∑ Xi<br />

k <strong>and</strong><br />

i=1<br />

2<br />

⎛ k ⎞<br />

σ = ⎜ ∑ ( Xi − µ ) ⎟ ( k −1)<br />

respectively. If it happens to be a normal distribution,<br />

i=<br />

1<br />

⎝<br />

⎠<br />

Tchebysheff’s theorem is turning to one <strong>of</strong> its special cases, i.e. the “3σ ” rule which<br />

means with a probability <strong>of</strong> 99.73% that the sample is falling into the interval <strong>of</strong><br />

( µ 3 σ , µ + 3σ<br />

)<br />

− [49]. Therefore, if we control the deviation to be less than m% <strong>of</strong><br />

the mean, then 3σ ≤ m % × µ is ensured. We can thus specify MaxSDev by Formula<br />

4.2:<br />

m% MaxSDev = × µ<br />

Formula 4.2<br />

3<br />

K-MaxSDev is the core algorithm for pattern recognition in our strategy, <strong>and</strong> it is<br />

applied to discover the minimal set <strong>of</strong> potential duration-series patterns.<br />

4.3.4 Forecasting Algorithms<br />

As presented in Section 4.3.2, the interval forecasting strategy for workflow<br />

activities in data/computation intensive scientific applications are composed <strong>of</strong> four<br />

major steps: duration series building, duration pattern recognition, duration pattern<br />

matching, <strong>and</strong> duration interval forecasting. In this section, we will propose the<br />

detailed algorithms for each step. Note that since pattern matching <strong>and</strong> interval<br />

forecasting are always performed together, we illustrate them within an integrated<br />

process.<br />

54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!