23.10.2012 Views

View PDF Version - RePub - Erasmus Universiteit Rotterdam

View PDF Version - RePub - Erasmus Universiteit Rotterdam

View PDF Version - RePub - Erasmus Universiteit Rotterdam

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 1<br />

16<br />

Prediction models with correction for overfi tting<br />

To develop an individual baseline prediction to a dichotomous response of treatment<br />

statistical logistic regression techniques are used. The aim of the fi nal model is to identify<br />

which patient characteristics or disease specifi c factors are independently or in combination<br />

with each other associated with response. As a result the prediction of response for<br />

the individual patient can be assessed.<br />

The design of a good model is a laborious process of comparing different strategies and<br />

combinations of covariates using statistical measurements of model fi t and performance<br />

measurements, such as the Akaike’s Information Criteria (AIC) and the Area Under the<br />

receiver operating Curve (AUC), in combination with sound statistical knowledge and<br />

logical sense, build on knowledge from previous studies and clinical experience. 25-26<br />

When a well-fi tted and stable prediction model has fi nally been achieved the fi tted coeffi<br />

cients needs to be corrected for overfi tting. Because the model has been designed on<br />

a fi xed dataset it is known that the model in general will tend to overfi t when applied to a<br />

new individual or dataset. As a result the predictive performance will therefore be worse.<br />

The best option to solve this issue is to validate the model in an independent and similar<br />

dataset. When this option is not available bootstrapping26 is an established method to<br />

study the degree of overfi tting. Penalized likelihood estimation26 can also be used to fi t<br />

a penalty-score to correct for overfi tting.<br />

The fi nal step is to present the prediction model and the method will depend on the<br />

audience and the user. The model in general is quite complex, and mathematical formulas<br />

can present diffi culties in interpretation to some users. Instead nomograms or<br />

graphical presentations can be used along with medical decision trees. Another option<br />

might be the design of a website which generates a specifi c predicted probability of<br />

response after entering the required patient specifi c variables.<br />

Dynamic prediction models<br />

During treatment patients will be monitored regularly. Dynamic update of an individual’s<br />

prediction of response to treatment based on new information is not routinely implemented<br />

in prediction models, but can be of great importance for the individual subject<br />

and the further choice of treatment.<br />

Two different statistical approaches are considered: (1) directly modelling the prediction<br />

of the outcome variable with the use of logistic regression techniques and (2) indirectly<br />

classifying individuals into an outcome category over time using Bayes’ theorem.<br />

For the direct approach (1) either the observed marker value or the subject specifi c<br />

pattern of the marker are used as predictors. A generalized estimating equations (GEE)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!