08.12.2012 Views

Scientific Concept of the National Cohort (status ... - Nationale Kohorte

Scientific Concept of the National Cohort (status ... - Nationale Kohorte

Scientific Concept of the National Cohort (status ... - Nationale Kohorte

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A.6 Planned statistical analyses and statistical power considerations<br />

All power calculations in Sects. A.6.4.2 and A.6.4.3 are based on <strong>the</strong> following assumptions:<br />

� Possible effects <strong>of</strong> exposure measurement errors are ignored. (Sect. A.6.2.3.1, however,<br />

addresses <strong>the</strong> issue <strong>of</strong> random errors, particularly for continuous exposure measurements,<br />

and presents a strategy for reproducibility/calibration substudies embedded<br />

in <strong>the</strong> <strong>National</strong> <strong>Cohort</strong>, to improve quantitative estimation <strong>of</strong> RRs, adjusting for<br />

regression dilution effects due to random measurement errors.)<br />

� The possibility <strong>of</strong> missing data was ignored. While imputation methods will be used in<br />

<strong>the</strong> analysis when necessary, <strong>the</strong> effects on power can approximately be translated<br />

with <strong>the</strong> percentage <strong>of</strong> missing values for a given covariable 811 .<br />

� No confounding is assumed. However, <strong>the</strong> variance inflation factor (VIF), which is<br />

discussed below, directly translates into <strong>the</strong> factor for <strong>the</strong> sample size.<br />

In addition to sample size requirements for detecting RRs, Sect. A.6.4.5 presents estimates<br />

<strong>of</strong> sample size requirements for nested case-control studies aiming to estimate sensitivity <strong>of</strong><br />

a continuous diagnostic marker at a prefixed level <strong>of</strong> specificity.<br />

A.6.4.2 minimally detectable odds ratios in main effects models<br />

Exposures or risk factors are frequently measured or coded as binary, multiple categorical,<br />

or continuous variables. In nested case-control studies, typical examples <strong>of</strong> binary exposure<br />

variables based on laboratory measurements include classifying individuals into carriers<br />

or noncarriers <strong>of</strong> a given infectious agent, or carrier <strong>status</strong> for a given (polymorphic)<br />

genetic risk allele. In cohort studies, a typical example <strong>of</strong> binary risk factor classifications<br />

is a comparison between use versus nonuse <strong>of</strong> exogenous hormones or medical drugs.<br />

Examples <strong>of</strong> continuously measured risk factors include many biomarkers (e.g., markers<br />

<strong>of</strong> metabolism) measured in blood or urine samples or quantitative scores based a larger<br />

number <strong>of</strong> questionnaire responses (e.g., nutrient intakes estimated from a food frequency<br />

questionnaire). In a first step, continuous exposure variables may also be broken down into<br />

quantile categories and RRs (or odds ratios) estimated for each quantile, taking one <strong>of</strong> <strong>the</strong><br />

quantile categories as reference. A statistical instrument for <strong>the</strong> latter situation is a categorical<br />

model, counting events in cells <strong>of</strong> a multidimensional contingency table and assuming<br />

an order between <strong>the</strong> categories <strong>of</strong> <strong>the</strong> new quantile variable. This latter approach can thus<br />

be regarded as a general and robust method that should be followed by dose–response<br />

analyses812 .<br />

For continuous exposure or risk factor measurements, it is <strong>of</strong>ten appropriate to test <strong>the</strong> hypo<strong>the</strong>sis<br />

that increasing levels <strong>of</strong> <strong>the</strong> exposure are associated with increases in risk on a continuous<br />

(log-)linear scale. The statistical power <strong>of</strong> such tests is based on a standardized difference<br />

(A). Definition and interpretation <strong>of</strong> standardized difference are given in Annex C.2.2.<br />

For continuous exposure/risk factor variables that follow skewed (e.g., approximately lognormal)<br />

probability distributions or more complex distributional types, including a spike at<br />

zero situation (e.g., cigarette smoking or alcohol consumption), more complex modeling<br />

strategies, such as fractional polynomials, may be needed813 . Rigorous sample size or statistical<br />

power calculations for <strong>the</strong>se more complex situations are not being considered here.<br />

In multifactorial statistical models that include a number <strong>of</strong> covariates as adjustment or confounding<br />

factors, <strong>the</strong> estimates <strong>of</strong> sample size requirements will increase, compared to <strong>the</strong><br />

sample size needed for nonadjusted analyses. A general approximation to this increase in<br />

sample size requirements is given by <strong>the</strong> VIF814 �1 �<br />

193 � � �� b � � w � �b<br />

�<br />

194 log( risk) � � � �1<br />

xi1<br />

� � 2 xi2<br />

194 log( odds) � � � ( �1<br />

� �2<br />

) xi1<br />

� �2<br />

( xi<br />

2 � xi1)<br />

194 log( risk) � � � � L �w1 xi1<br />

� w2x<br />

i2<br />

�<br />

P(<br />

D)<br />

� P(<br />

D | E )<br />

195<br />

PAR �<br />

P(<br />

D)<br />

P(<br />

D)<br />

� P(<br />

D*)<br />

195<br />

GIF �<br />

P(<br />

D)<br />

:<br />

� 1 �<br />

203<br />

VIF �<br />

�<br />

�<br />

�<br />

� , 2<br />

�1<br />

� �CE<br />

�<br />

203<br />

181<br />

�<br />

2<br />

CE<br />

A.6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!