13.07.2015 Views

Scalable approaches for analysis of human genome-wide ...

Scalable approaches for analysis of human genome-wide ...

Scalable approaches for analysis of human genome-wide ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ASupplementary Results <strong>for</strong> Gene Set StatisticsA.1. ClassifiersIn addition to the centroid classifier, we tested the shrunken centroid (Tibshirani et al.,2003) in the R package pamr (Hastie et al., 2009b), our implementation <strong>of</strong> the classifierfrom (van ’t Veer et al., 2002), and a support vector machine with a linear kernel (kernlabpackage (Karatzoglou et al., 2004)). We optimised the shrunken centroid’s threshold and theSVM’s number <strong>of</strong> features and its l 2 penalty using nested random splits, where the data wasrandomly split into three parts: training, validation, and testing. The model was fit to thetraining data, and its AUC calculated <strong>for</strong> its prediction on the validation data. This wasrepeated over a grid <strong>of</strong> values appropriate <strong>for</strong> each model type. The optimal hyperparameterswere then chosen as the ones maximising the AUC over the validation set. The model wasthen refit using the optimal hyperparameters on the training and validation data together,and tested on the remaining test data. Its AUC over the test data is reported. The wholeprocedure is repeated B times, producing B classifiers (<strong>for</strong> each classifier type), with differentsets <strong>of</strong> optimal hyperparameters. The procedure is per<strong>for</strong>med separately <strong>for</strong> each <strong>of</strong> the fivedatasets.There are conflicting descriptions <strong>of</strong> the exact <strong>for</strong>m <strong>of</strong> the classifier used in (van ’t Veer etal., 2002). In the original paper, it seems that the classifier classifies each sample using itsPearson correlation with each <strong>of</strong> the centroids <strong>of</strong> the positive and negative metastasis classes:ŷ i = arg min {Corr(x i, c j )},j∈{−1,+1}(A.1)205

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!