11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A.S. Whittemore 393form a useful foundation for this work; see, e.g., Brier (1950), Hsu and Murphy(1986), Murphy (1973), and Wilks (1995).35.3 How do we evaluate a personal risk model?Risk models for long-term future outcomes are commonly assessed with respectto two attributes. Their calibration reflects how well their assigned risksagree with observed outcome occurrence within subgroups of the population.Their discrimination (also called precision or resolution) reflects how well theydistinguish those who ultimately do and do not develop the outcome. Goodcalibration does not imply good discrimination. For example, if the actualdisease risks of a population show little inter-personal variation, discriminationwill be poor even for a perfectly calibrated risk model. Conversely, gooddiscrimination does not imply good calibration. Discrimination depends onlyon the ranks of a model’s assigned risks, so any rank-invariant transformationof a model’s risks will affect its calibration but not its discrimination.An important task is to quantify how much a model’s calibration anddiscrimination can be improved by expanding it with additional covariates,such as newly discovered genetic markers. However, the discrimination of arisk model depends on the distribution of risk-associated covariates in thepopulation of interest. As noted in the previous paragraph, no model candiscriminate well in a population with a homogeneous covariate distribution.Thus while large discrimination gains from adding covariates to a model areinformative (indicating substantial additional risk variation detected by theexpanded model), a small precision gain is less so, as it may merely reflectunderlying risk homogeneity in the population.Several metrics have been proposed to assess and compare models withrespect to their calibration and discrimination. Their usefulness depends onhow they will be used, as shown by the following examples.Example 1. Risk models are used to determine eligibility for randomizedclinical trials involving treatments with serious potential side effects. For instance,the BCRAT model was used to determine eligibility for a randomizedtrial to determine if tamoxifen can prevent breast cancer (Fisher et al., 1998,2005). Because tamoxifen increases the risks of stroke, endometrial cancer anddeep-vein thrombosis, eligibility was restricted to women whose breast cancerrisks were deemed high enough to warrant exposure to these side effects. Thuseligible women were those whose BCRAT-assigned five-year breast cancer riskexceeded 1.67%. For this type of application, a good risk model should yielda decision rule with few false positives, i.e., one that excludes women whotruly are at low breast cancer risk. A model without this attribute could in-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!