09.08.2013 Views

Fundamentals of epidemiology - an evolving text - Are you looking ...

Fundamentals of epidemiology - an evolving text - Are you looking ...

Fundamentals of epidemiology - an evolving text - Are you looking ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 Expected values<br />

Perhaps, the single most import<strong>an</strong>t concept to remember is to have <strong>an</strong> idea <strong>of</strong> what is<br />

expected. This concept has been applied during the editing <strong>an</strong>d cle<strong>an</strong>ing process. Underst<strong>an</strong>ding<br />

what is expected is a function <strong>of</strong> both the study design <strong>an</strong>d the values <strong>of</strong> the parameters in the target<br />

population. For example, if r<strong>an</strong>domized allocation has been used, then the r<strong>an</strong>domized groups<br />

should be similar. If controls are selected from the general population via r<strong>an</strong>dom digit dialing<br />

methods, then their demographics should reflect the population as a whole. When examining a<br />

table, first check the variables, labels, <strong>an</strong>d N's for the total table <strong>an</strong>d the subcategories that are not<br />

included to make sure that <strong>you</strong> underst<strong>an</strong>d the subset <strong>of</strong> observations represented. Second examine<br />

the marginal distributions to make sure they conform to what <strong>you</strong> expect. Then examine the<br />

internal distribution, particularly, with regards to the referent group. Finally proceed to assess the<br />

association or other information in the table.<br />

3.4 Missing values<br />

The impact <strong>of</strong> missing data is magnified for <strong>an</strong>alyses involving large number <strong>of</strong> variables,<br />

since m<strong>an</strong>y <strong>an</strong>alytic procedures require omitting <strong>an</strong>y observation that lacks a value for even one <strong>of</strong><br />

the variables in the <strong>an</strong>alysis. Thus, if there are four variables, each with missing data for 10% <strong>of</strong> the<br />

observations, in a worst-case situation 40% <strong>of</strong> the observations could be omitted from the <strong>an</strong>alysis.<br />

To assess the extent <strong>an</strong>d nature <strong>of</strong> missing data for a variable, a complete "missing value" <strong>an</strong>alysis<br />

should ideally be done. That me<strong>an</strong>s comparing the presence/absence <strong>of</strong> information for a variable<br />

with other key factors, e.g. age, race, gender, exposure status, <strong>an</strong>d/or disease status. The goal is to<br />

identify correlates <strong>of</strong> missing information. Relationships are indicative, though not conclusive, <strong>of</strong><br />

selection bias. This <strong>an</strong>alysis may give insights into how to impute values for those missing (e.g.,<br />

missing cholesterol could be estimated as a function <strong>of</strong> sex, age, race, <strong>an</strong>d body mass). Strong<br />

relationships between one covariate <strong>an</strong>d missing values for <strong>an</strong>other indicate that imputed values<br />

should be stratified by levels <strong>of</strong> the first covariate.<br />

Although they receive relatively little attention in introductory treatments <strong>of</strong> data <strong>an</strong>alysis,<br />

missing values are the b<strong>an</strong>e <strong>of</strong> the <strong>an</strong>alyst. Examination <strong>of</strong> the data for missing values (e.g., via SAS<br />

PROC FREQ or PROC UNIVARIATE) is <strong>an</strong> essential first step prior to <strong>an</strong>y formal <strong>an</strong>alyses.<br />

Special missing value codes (see above) facilitate this examination. Missing values are a serious<br />

nuis<strong>an</strong>ce or impediment in data <strong>an</strong>alysis <strong>an</strong>d interpretation. One <strong>of</strong> the best motivations to<br />

designing data collection systems that minimize missing values is experience in trying to deal with<br />

them during <strong>an</strong>alysis!<br />

3.4.1 Effects <strong>of</strong> missing data<br />

Two kinds <strong>of</strong> missing data c<strong>an</strong> be distinguished: data-missing <strong>an</strong>d case-missing. In the<br />

former case, information is available from a study particip<strong>an</strong>t, but some responses are missing. In<br />

case-missing, the prospective particip<strong>an</strong>t has declined to enroll or has dropped out. This discussion<br />

will address the situation <strong>of</strong> data-missing.<br />

Missing data have a variety <strong>of</strong> effects. As a minimum, missing data decrease the effective<br />

sample size, so that estimates are less precise (have wider confidence intervals) <strong>an</strong>d statistical tests<br />

_____________________________________________________________________________________________<br />

www.sph.unc.edu/EPID168/ © Victor J. Schoenbach 16. Data m<strong>an</strong>agement <strong>an</strong>d data <strong>an</strong>alysis - 539<br />

rev. 9/27/1999, 10/22/1999, 10/28/1999

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!