GfKl 2008 - Legos
GfKl 2008 - Legos
GfKl 2008 - Legos
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Classification of Paired Data Using Ensemble<br />
Methods<br />
Werner Adler 1 , Alexander Brenning 2 , and Berthold Lausen 1<br />
1<br />
Chair for Biometry and Epidemiology, University of Erlangen-Nuremberg,<br />
Germany<br />
werner.adler@imbe.imed.uni-erlangen.de,<br />
berthold.lausen@rzmail.uni-erlangen.de<br />
2<br />
Department of Geography, University of Waterloo, Canada<br />
brenning@fesmail.uwaterloo.ca<br />
Abstract. In glaucoma classification, the underlying data have a paired structure<br />
that often is accounted for by simply using only one eye per subject. Brenning and<br />
Lausen (<strong>2008</strong>) showed that the proper use of both eyes in paired cross-validation<br />
decreases the variance of the estimation, compaired to cross-validation using only<br />
one eye per subject.<br />
We discuss and compare different strategies to generate the bootstrap samples<br />
for training Adaboost (Freund and Schapire, 1996), Random Forest (Breiman, 2001),<br />
and Double Bagging (Hothorn and Lausen, 2005). The simplest approach is to ignore<br />
the paired data structure and proceed as usual. Adapting the idea by Brenning and<br />
Lausen, we also perform subject based sampling. In a first step, subjects are drawn<br />
with replacement. In a second step, for each drawn subject either both eyes or<br />
one randomly selected eye are chosen, or two eyes are drawn with replacement.<br />
The subjects not selected for training the base learners constitute the out-of-bag<br />
samples. We compare error rates resulting from these different approaches obtained<br />
by a simulation study.<br />
Key words: Bootstrap, Classification, Glaucoma, Paired Organs<br />
References<br />
Breiman, L. (2001): Random forests. Machine Learning, 45, 5–32.<br />
Brenning, A. and Lausen, B. (<strong>2008</strong>): Estimating error rates in the classification of<br />
paired organs. Statistics in Medicine, submitted.<br />
Freund, Y. and Schapire, R. (1996): Experiments with a new boosting algorithm.<br />
Proceedings of the 13th International Conference on Machine Learning, 148–<br />
156.<br />
Hothorn, T. and Lausen, B. (2005): Bundling classifiers by bagging trees. Computational<br />
Statistics & Data Analysis, 49, 1068–1078.<br />
− 4 −