11.07.2015 Views

A comparison of bootstrap methods and an adjusted bootstrap ...

A comparison of bootstrap methods and an adjusted bootstrap ...

A comparison of bootstrap methods and an adjusted bootstrap ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Ordinary Bootstrap method [4] has the problem that the learning <strong><strong>an</strong>d</strong> test sets overlap. Inthis, a prediction rule is built on a <strong>bootstrap</strong> sample <strong><strong>an</strong>d</strong> tested on the original sample,averaging the misclassification rates across all <strong>bootstrap</strong> replications gives the ordinary<strong>bootstrap</strong> estimate. This method seriously underestimates the prediction error since asubset <strong>of</strong> data is used both in building <strong><strong>an</strong>d</strong> in assessing the prediction model.Bootstrap Cross-Validation The method is proposed by Fu, Carroll <strong><strong>an</strong>d</strong> W<strong>an</strong>g [9] toh<strong><strong>an</strong>d</strong>le small sample problems. The procedure generates B <strong>bootstrap</strong> samples <strong>of</strong> size nfrom the observed sample <strong><strong>an</strong>d</strong> then calculates a leave-one-out cross-validation estimateon each <strong>bootstrap</strong> sample. Averaging the B cross-validation estimates gives the <strong>bootstrap</strong>cross-validation estimate for the prediction error. The paper <strong>of</strong> Fu et al. [9] did notcarefully address the issue <strong>of</strong> feature selection. When the method is applied to highdimensional gene expression data, we emphasize that feature selection must be conductedin this method on every leave-one-out learning set derived from every <strong>bootstrap</strong> sample.Since <strong>an</strong> original observation c<strong>an</strong> appear more th<strong>an</strong> once in a <strong>bootstrap</strong> sample, a leaveone-outlearning set may overlap with the left out item when the cross-validationprocedure is applied on a <strong>bootstrap</strong> sample. Consequently, the <strong>bootstrap</strong> cross-validationmethod tends to underestimate the true prediction error.Leave-One-Out Bootstrap The leave-one-out <strong>bootstrap</strong> procedure (Efron [2]) generatesa total <strong>of</strong> B <strong>bootstrap</strong> samples <strong>of</strong> size n . Each observed specimen is predicted repeatedlyusing the <strong>bootstrap</strong> samples in which the particular observation does not appear. In thisway, the method avoids testing a prediction model on the specimens used for constructingthe model. The leave-one-out <strong>bootstrap</strong> estimate is given by*, b{ }e 1/ n 1/ | C | I y ≠r ( t , x ) where is the collection <strong>of</strong> <strong>bootstrap</strong>LOOBSnni= 1 i b∈Ci ii= ∑ ∑samples not containing observation i <strong><strong>an</strong>d</strong> | | is the number <strong>of</strong> such <strong>bootstrap</strong> samples.C i*,bFeature selection <strong><strong>an</strong>d</strong> class prediction should be performed on each <strong>bootstrap</strong> sample x ,b=1,..., B.C i6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!