Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...
Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...
Xiao Liu PhD Thesis.pdf - Faculty of Information and Communication ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
k 1<br />
∑ − Mean i + 1 − Mean i<br />
Rank ( k)<br />
= i = 1<br />
k −1<br />
Formula 4.1<br />
The theoretical basis for MaxSDev is Tchebysheff’s theorem [87] which has<br />
been widely used in the statistics theory. According to the theorem, given a number<br />
2<br />
d greater than or equal to 1 <strong>and</strong> a set <strong>of</strong> n samples, at least [ 1 ( 1 ) ]<br />
− <strong>of</strong> the<br />
d<br />
samples will lie within d st<strong>and</strong>ard deviations <strong>of</strong> their mean, no matter what the<br />
actual probability distribution is. For example, 88.89% <strong>of</strong> the samples will fall into<br />
the interval <strong>of</strong> ( µ − 3 σ , µ + 3σ<br />
)<br />
. The value <strong>of</strong> µ <strong>and</strong> σ can be estimated by the<br />
k<br />
sample mean <strong>and</strong> sample st<strong>and</strong>ard deviation as µ = ∑ Xi<br />
k <strong>and</strong><br />
i=1<br />
2<br />
⎛ k ⎞<br />
σ = ⎜ ∑ ( Xi − µ ) ⎟ ( k −1)<br />
respectively. If it happens to be a normal distribution,<br />
i=<br />
1<br />
⎝<br />
⎠<br />
Tchebysheff’s theorem is turning to one <strong>of</strong> its special cases, i.e. the “3σ ” rule which<br />
means with a probability <strong>of</strong> 99.73% that the sample is falling into the interval <strong>of</strong><br />
( µ 3 σ , µ + 3σ<br />
)<br />
− [49]. Therefore, if we control the deviation to be less than m% <strong>of</strong><br />
the mean, then 3σ ≤ m % × µ is ensured. We can thus specify MaxSDev by Formula<br />
4.2:<br />
m% MaxSDev = × µ<br />
Formula 4.2<br />
3<br />
K-MaxSDev is the core algorithm for pattern recognition in our strategy, <strong>and</strong> it is<br />
applied to discover the minimal set <strong>of</strong> potential duration-series patterns.<br />
4.3.4 Forecasting Algorithms<br />
As presented in Section 4.3.2, the interval forecasting strategy for workflow<br />
activities in data/computation intensive scientific applications are composed <strong>of</strong> four<br />
major steps: duration series building, duration pattern recognition, duration pattern<br />
matching, <strong>and</strong> duration interval forecasting. In this section, we will propose the<br />
detailed algorithms for each step. Note that since pattern matching <strong>and</strong> interval<br />
forecasting are always performed together, we illustrate them within an integrated<br />
process.<br />
54