25.12.2013 Views

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.6.4 Bootstrapping<br />

Bootstrapping is a validation technique initially introduced by Efron et al. (1993).<br />

Thorough information about the methodology can be found in Efron and Tibshirani<br />

(1993). Bootstrapping has proven to be a powerful technique, especially when dealing<br />

with relatively small datasets.<br />

Given an initial dataset with samples, a bootstrap training dataset is created by<br />

sampling instances from the original data uniformly with replacement; based on this<br />

approach, any given sample could be present multiple times within the same bootstrap<br />

training set. The probability for any given instance not being present in the bootstrap<br />

training set after selections is approximately 36.8% (Kohavi, 1995; Bauer and<br />

Kohavi, 1999). These instances constitute the bootstrap test set. A common approach<br />

is to repeat bootstrapping a great number of times in order to construct, for example,<br />

100 or even up to 1000 news bootstraps of the same size. The total number of<br />

bootstraps strongly depends on the number of samples in the initial dataset.<br />

Bootstrapping generates instances of lower variance and relatively moderate bias<br />

compared to the previous techniques. Even though bootstrapping is a fairly<br />

straightforward method, it consists a computationally demanding statistical procedure<br />

that may lead to extremely long execution times.<br />

1.6.5 Model Selection, complexity and the bias-variance trade-off<br />

It is a common approach to use validation techniques such as cross-validation or<br />

bootstrapping as a means of optimising the adjustable parameters of a classifier. In<br />

order to maximise the performance of a classification model, it is often tempting to<br />

repeatedly train the model until a minimum training prediction error (or maximum<br />

training accuracy) is achieved (Suykens et al., 2002; Brereton, 2006; Izenman, 2008).<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!