View PDF Version - RePub - Erasmus Universiteit Rotterdam
View PDF Version - RePub - Erasmus Universiteit Rotterdam
View PDF Version - RePub - Erasmus Universiteit Rotterdam
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 1<br />
16<br />
Prediction models with correction for overfi tting<br />
To develop an individual baseline prediction to a dichotomous response of treatment<br />
statistical logistic regression techniques are used. The aim of the fi nal model is to identify<br />
which patient characteristics or disease specifi c factors are independently or in combination<br />
with each other associated with response. As a result the prediction of response for<br />
the individual patient can be assessed.<br />
The design of a good model is a laborious process of comparing different strategies and<br />
combinations of covariates using statistical measurements of model fi t and performance<br />
measurements, such as the Akaike’s Information Criteria (AIC) and the Area Under the<br />
receiver operating Curve (AUC), in combination with sound statistical knowledge and<br />
logical sense, build on knowledge from previous studies and clinical experience. 25-26<br />
When a well-fi tted and stable prediction model has fi nally been achieved the fi tted coeffi<br />
cients needs to be corrected for overfi tting. Because the model has been designed on<br />
a fi xed dataset it is known that the model in general will tend to overfi t when applied to a<br />
new individual or dataset. As a result the predictive performance will therefore be worse.<br />
The best option to solve this issue is to validate the model in an independent and similar<br />
dataset. When this option is not available bootstrapping26 is an established method to<br />
study the degree of overfi tting. Penalized likelihood estimation26 can also be used to fi t<br />
a penalty-score to correct for overfi tting.<br />
The fi nal step is to present the prediction model and the method will depend on the<br />
audience and the user. The model in general is quite complex, and mathematical formulas<br />
can present diffi culties in interpretation to some users. Instead nomograms or<br />
graphical presentations can be used along with medical decision trees. Another option<br />
might be the design of a website which generates a specifi c predicted probability of<br />
response after entering the required patient specifi c variables.<br />
Dynamic prediction models<br />
During treatment patients will be monitored regularly. Dynamic update of an individual’s<br />
prediction of response to treatment based on new information is not routinely implemented<br />
in prediction models, but can be of great importance for the individual subject<br />
and the further choice of treatment.<br />
Two different statistical approaches are considered: (1) directly modelling the prediction<br />
of the outcome variable with the use of logistic regression techniques and (2) indirectly<br />
classifying individuals into an outcome category over time using Bayes’ theorem.<br />
For the direct approach (1) either the observed marker value or the subject specifi c<br />
pattern of the marker are used as predictors. A generalized estimating equations (GEE)