12.07.2015 Views

Fundamentals of Clinical Research for Radiologists ROC Analysis

Fundamentals of Clinical Research for Radiologists ROC Analysis

Fundamentals of Clinical Research for Radiologists ROC Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>ROC</strong> <strong>Analysis</strong>TABLE 3 Fictitious Data Comparing the Accuracy <strong>of</strong> Two Diagnostic Tests<strong>ROC</strong> CurveXYEstimated AUC 0.841 0.841Estimated SE <strong>of</strong> AUC 0.041 0.045Estimated PAUC where FPR < 0.20 0.112 0.071Estimated SE <strong>of</strong> PAUC 0.019 0.014Estimated covariance 0.00001Z test comparing PAUCs Z = [0.112 – 0.071] / √[0.019 2 + 0.014 2 – 0.00002]95% CI <strong>for</strong> difference in PAUCs [0.112 – 0.071] ± 1.96 × √[0.019 2 + 0.014 2 – 0.00002]Note.—AUC = area under the curve, PAUC = partial area under the curve, CI = confidence interval.binormality. Alternatively, one can use s<strong>of</strong>twarelike <strong>ROC</strong>KIT [32] that will bin the testresults into an optimal number <strong>of</strong> categoriesand apply the same maximum likelihoodmethods as mentioned earlier <strong>for</strong> rating datalike the BI-RADS scores.More elaborate models <strong>for</strong> the <strong>ROC</strong> curve thatcan take into account covariates (e.g., the patient’sage, symptoms) have also been developedin the statistics literature [37–39] and will becomemore accessible as new s<strong>of</strong>tware is written.Estimating the Area Under the <strong>ROC</strong> CurveEstimation <strong>of</strong> the area under the smoothcurve, assuming a binormal distribution, isdescribed in Appendix 1. In this subsection,we describe and illustrate estimation <strong>of</strong> thearea under the empiric <strong>ROC</strong> curve. The process<strong>of</strong> estimating the area under the empiric<strong>ROC</strong> curve is nonparametric, meaning thatno assumptions are made about the distribution<strong>of</strong> the test results or about any hypothesizedunderlying distribution. The estimationworks <strong>for</strong> tests scored with a rating scale, a0–100% confidence scale, or a true continuous-scalevariable.The process <strong>of</strong> estimating the area under theempiric <strong>ROC</strong> curve involves four simple steps:First, the test result <strong>of</strong> a patient with disease iscompared with the test result <strong>of</strong> a patient withoutdisease. If the <strong>for</strong>mer test result indicatesmore suspicion <strong>of</strong> disease than the latter test result,then a score <strong>of</strong> 1 is assigned. If the test resultsare identical, then a score <strong>of</strong> 1/2 isassigned. If the diseased patient has a test resultindicating less suspicion <strong>for</strong> disease than thetest result <strong>of</strong> the nondiseased patient, then ascore <strong>of</strong> 0 is assigned. It does not matter whichdiseased and nondiseased patient you beginwith. Using the data in Table 1 as an illustration,suppose we start with a diseased patient assigneda test result <strong>of</strong> “normal” and a nondiseasedpatient assigned a test result <strong>of</strong> “normal.”Because their test results are the same, this pairis assigned a score <strong>of</strong> 1/2.Second, repeat the first step <strong>for</strong> every possiblepair <strong>of</strong> diseased and nondiseased patientsin your sample. In Table 1 there are 100diseased patients and 100 nondiseased patients,thus 10,000 possible pairs. Becausethere are only five unique test results, the10,000 possible pairs can be scored easily, asin Table 2.Third, sum the scores <strong>of</strong> all possible pairs.From Table 2, the sum is 8,632.5.Fourth, divide the sum from step 3 by thenumber <strong>of</strong> pairs in the study sample. In ourexample we have 10,000 pairs. Dividing thesum from step 3 by 10,000 gives us 0.86325,which is our estimate <strong>of</strong> the area under theempiric <strong>ROC</strong> curve. Note that this method <strong>of</strong>estimating the area under the empiric <strong>ROC</strong>curve gives the same result as one would obtainby fitting trapezoids under the curve andsumming the areas <strong>of</strong> the trapezoids (socalledtrapezoid method).The variance <strong>of</strong> the estimated area underthe empiric <strong>ROC</strong> curve is given by DeLonget al. [40] and can be used <strong>for</strong> constructingCIs; s<strong>of</strong>tware programs are available <strong>for</strong> estimatingthe nonparametric AUC and itsvariance [41].Comparing the AUCs or PAUCs <strong>of</strong> TwoDiagnostic TestsTo test whether the AUC (or PAUC) <strong>of</strong> onediagnostic test (denoted by AUC 1 ) equals theAUC (or PAUC) <strong>of</strong> another diagnostic test(AUC 2 ), the following test statistic is calculated:Z = [AUC 1 – AUC 2 ] /√[var 1 + var 2 – 2 × cov], (4)where var 1 is the estimated variance <strong>of</strong> AUC 1 ,var 2 is the estimated variance <strong>of</strong> AUC 2 , andcov is the estimated covariance between AUC 1and AUC 2 . When different samples <strong>of</strong> patientsundergo the two diagnostic tests, the covarianceequals zero. When the same sample <strong>of</strong> patientsundergoes both diagnostic tests (i.e., a pairedstudy design), then the covariance is not generallyequal to zero and is <strong>of</strong>ten positive. The estimatedvariances and covariances are standardoutput <strong>for</strong> most <strong>ROC</strong> s<strong>of</strong>tware [32, 41].The test statistic Z follows a standard normaldistribution. For a two-tailed test withsignificance level <strong>of</strong> 0.05, the critical valuesare –1.96 and +1.96. If Z is less than −1.96,then we conclude that the accuracy <strong>of</strong> diagnostictest 2 is superior to that <strong>of</strong> diagnostictest 1; if Z exceeds +1.96, then we concludethat the accuracy <strong>of</strong> diagnostic test 1 is superiorto that <strong>of</strong> diagnostic test 2.A two-sided CI <strong>for</strong> the difference in AUC(or PAUC) between two diagnostic tests canbe calculated fromLL = [AUC 1 – AUC 2 ] – z α/2 ×√[var 1 + var 2 – 2 × cov] (5)UL = [AUC 1 – AUC 2 ] + z α/2 ×√[var 1 + var 2 – 2 × cov], (6)where LL is the lower limit <strong>of</strong> the CI, UL isthe upper limit, and z α/2 is a value from thestandard normal distribution corresponding toa probability <strong>of</strong> α/2. For example, to constructa 95% CI, α = 0.05, thus z α/2 = 1.96.Consider the <strong>ROC</strong> curves in Figure 2A. Theestimated areas under the smooth <strong>ROC</strong> curves <strong>of</strong>the two tests are the same, 0.841. The PAUCswhere the FPR is greater than 0.20, however, differ.From the estimated variances and covariancein Table 3, the value <strong>of</strong> the Z statistic <strong>for</strong> comparingthe PAUCs is 1.77, which is not statisticallysignificant. The 95% CI <strong>for</strong> the difference inPAUCs is more in<strong>for</strong>mative: (−0.004 to 0.086);the CI <strong>for</strong> the partial area index is (−0.02 to 0.43).The CI contains large positive differences, suggestingthat more research is needed to investigatethe relative accuracies <strong>of</strong> these twodiagnostic tests <strong>for</strong> FPRs less than 0.20.<strong>Analysis</strong> <strong>of</strong> MRMC <strong>ROC</strong> StudiesMultiple published methods discuss per<strong>for</strong>mingthe statistical analysis <strong>of</strong> MRMC studies [13–20]. The methods are used to construct CIs <strong>for</strong> diagnosticaccuracy and statistical tests <strong>for</strong> assessingdifferences in accuracy between tests. Astatistical overview <strong>of</strong> the methods is given elsewhere[10]. Here, we briefly mention some <strong>of</strong> thekey issues <strong>of</strong> MRMC <strong>ROC</strong> analyses.AJR:184, February 2005 369

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!