13.07.2015 Views

Scalable approaches for analysis of human genome-wide ...

Scalable approaches for analysis of human genome-wide ...

Scalable approaches for analysis of human genome-wide ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

B. Supplementary Results <strong>for</strong> Sparse Linear Modelsall cut<strong>of</strong>f values. The ROC curves can be summarised using the Area Under the receiver operatingcharacteristic Curve (AUC) (Hanley and McNeil, 1982), computed through numericalintegration or alternatively estimated asÂUC =N1+N + N − ∑N −∑ {I(ŷ i > ŷ j ) + 1 2 I(ŷ i = ŷ j )} ,i=1 j=1(B.1)where N + and N − are the number <strong>of</strong> cases and controls respectively, ŷ i is ith the prediction<strong>for</strong> the ith case, ŷ j is the prediction <strong>for</strong> the jth control, and I(⋅) is the indicator functionevaluating to 1 if its argument is true and to zero otherwise. Eq. B.1 shows that the sampleAUC is the maximum likelihood estimate <strong>of</strong> the probability <strong>of</strong> correctly ranking a randomlyselectedcausal SNP more highly than a randomly-selected non-causal SNP (with correction<strong>for</strong> ties). The expected AUC <strong>for</strong> a classifier producing random predictions is 0.5, perfectpredictions have AUC = 1.0 and perfectly-wrong predictions have an AUC = 0.0.Another useful statistic is the Area under the Precision-Recall Curve (APRC, also knownas Average Precision), which can be integrated numerically, but is usually approximated asÂPRC = 1 MM∑ Prec m ,m=1(B.2)where Prec m is the precision <strong>for</strong> the mth level <strong>of</strong> recall, out <strong>of</strong> M levels. The expectedAPRC <strong>for</strong> a classifier producing random predictions is the proportion <strong>of</strong> positive samples. Forestimating APRC, we used the program perf (http://kodiak.cs.cornell.edu/kddcup/s<strong>of</strong>tware.html).Unlike the APRC, the AUC does not depend on the relative proportions <strong>of</strong> the classes(the class balance). However, the AUC, as commonly used, can be misleading when used<strong>for</strong> comparing classifiers when the proportion <strong>of</strong> causal SNPs is very small, as is the case inGWA. Consider our HAPGEN simulations, where only 148 <strong>of</strong> the 73, 832 SNPs are true causalSNPs. To see why AUC is not in<strong>for</strong>mative in these settings, consider a thought experimentsimilar to that used by Sonnenburg et al. (2006), where we have a classifier that at some cut<strong>of</strong>fcorrectly classifies 100% <strong>of</strong> the true causal SNPs (TPR=1), but also wrongly classifies 1%<strong>of</strong> the non-causal SNPs (FPR=0.01). The AUC is the area under the curve induced by theTPR and the FPR, and is monotonically increasing. There<strong>for</strong>e, the AUC in this case mustbe ≥ 0.99, which seems like very good discrimination. However, when there are 73 684 noncausalSNPs, even the low false positive rate <strong>of</strong> 1% implies 0.01 × 73 684 ≈ 737 false positiveson average. In comparison, even assuming a fixed recall (=TPR) <strong>of</strong> 1, so that the APRCis equal to the precision, then the number <strong>of</strong> false positives needs to be as low as 148 (thenumber <strong>of</strong> causal SNPs) <strong>for</strong> the precision and APRC to be 0.5, and conversely, with a falsepositive rate <strong>of</strong> just 0.5% leading to ∼ 368 false positives on average, both the precision andthe APRC drop to 148/(148 + 368) = 0.287. In many real world settings, such extreme results218

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!