13.12.2012 Views

GfKl 2008 - Legos

GfKl 2008 - Legos

GfKl 2008 - Legos

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classification of Paired Data Using Ensemble<br />

Methods<br />

Werner Adler 1 , Alexander Brenning 2 , and Berthold Lausen 1<br />

1<br />

Chair for Biometry and Epidemiology, University of Erlangen-Nuremberg,<br />

Germany<br />

werner.adler@imbe.imed.uni-erlangen.de,<br />

berthold.lausen@rzmail.uni-erlangen.de<br />

2<br />

Department of Geography, University of Waterloo, Canada<br />

brenning@fesmail.uwaterloo.ca<br />

Abstract. In glaucoma classification, the underlying data have a paired structure<br />

that often is accounted for by simply using only one eye per subject. Brenning and<br />

Lausen (<strong>2008</strong>) showed that the proper use of both eyes in paired cross-validation<br />

decreases the variance of the estimation, compaired to cross-validation using only<br />

one eye per subject.<br />

We discuss and compare different strategies to generate the bootstrap samples<br />

for training Adaboost (Freund and Schapire, 1996), Random Forest (Breiman, 2001),<br />

and Double Bagging (Hothorn and Lausen, 2005). The simplest approach is to ignore<br />

the paired data structure and proceed as usual. Adapting the idea by Brenning and<br />

Lausen, we also perform subject based sampling. In a first step, subjects are drawn<br />

with replacement. In a second step, for each drawn subject either both eyes or<br />

one randomly selected eye are chosen, or two eyes are drawn with replacement.<br />

The subjects not selected for training the base learners constitute the out-of-bag<br />

samples. We compare error rates resulting from these different approaches obtained<br />

by a simulation study.<br />

Key words: Bootstrap, Classification, Glaucoma, Paired Organs<br />

References<br />

Breiman, L. (2001): Random forests. Machine Learning, 45, 5–32.<br />

Brenning, A. and Lausen, B. (<strong>2008</strong>): Estimating error rates in the classification of<br />

paired organs. Statistics in Medicine, submitted.<br />

Freund, Y. and Schapire, R. (1996): Experiments with a new boosting algorithm.<br />

Proceedings of the 13th International Conference on Machine Learning, 148–<br />

156.<br />

Hothorn, T. and Lausen, B. (2005): Bundling classifiers by bagging trees. Computational<br />

Statistics & Data Analysis, 49, 1068–1078.<br />

− 4 −

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!