01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

444 Further regression models<br />

follow-up. This particular pattern is dist<strong>in</strong>guished from other patterns because<br />

it has recently been studied quite extensively, and will be discussed below.<br />

With miss<strong>in</strong>g values <strong>in</strong> longitud<strong>in</strong>al data, the analyst is confronted with<br />

the problem of whether or not the values on an <strong>in</strong>dividual that are available<br />

can be used <strong>in</strong> an analysis and, if so, how. A naõÈve approach would simply be to<br />

analyse the data that were available. This could present technical problems for<br />

some methods; for example, repeated measures analysis of variance cannot deal<br />

with unbalanced data <strong>in</strong> which different <strong>in</strong>dividuals are measured different<br />

numbers of times. There are techniques which allow the statistician to impute<br />

values for the miss<strong>in</strong>g observations, thereby `complet<strong>in</strong>g' the data and allow<strong>in</strong>g<br />

the analysis that was orig<strong>in</strong>ally <strong>in</strong>tended to be performed. These methods were<br />

widely used <strong>in</strong> the days before substantial comput<strong>in</strong>g power was as readily<br />

available as it is now, because many methods for unbalanced data used to<br />

present formidable computational obstacles. However, this approach is clearly<br />

undesirable if more than a very small proportion of the data is miss<strong>in</strong>g. Other<br />

methods, for example, a summary measures analysis us<strong>in</strong>g the mean of the<br />

responses on an <strong>in</strong>dividual, have no difficulty with unbalanced data. However,<br />

regardless of the ability of any technique to cope with unbalanced data, analyses<br />

which ignore the reasons why values are miss<strong>in</strong>g can be seriously mislead<strong>in</strong>g. For<br />

example, if the outcome is a blood chemistry measurement that tends to be high<br />

when some chronic condition is particularly active and debilitat<strong>in</strong>g, then it may<br />

be on just such occasions that the patient feels too unwell to attend the cl<strong>in</strong>ic, so<br />

the miss<strong>in</strong>g values are generally the high values. A summary measures mean of<br />

the available values would be biased downwards, as would be the results from<br />

the <strong>in</strong>ter<strong>in</strong>dividual stratum <strong>in</strong> a repeated measures analysis of variance.<br />

It is clear that there are potentially severe difficulties <strong>in</strong> deal<strong>in</strong>g with miss<strong>in</strong>g<br />

values. It is important to know if the fact that an observation is miss<strong>in</strong>g is related<br />

to the value that would have been obta<strong>in</strong>ed had it not been miss<strong>in</strong>g. It is equally<br />

clear that unequivocal statistical evidence is unlikely to be forthcom<strong>in</strong>g on this<br />

issue. Nevertheless, a substantial body of statistical research has recently<br />

emerged on this topic, stemm<strong>in</strong>g from the sem<strong>in</strong>al work of Little (1976) and<br />

Rub<strong>in</strong> (1976), and this will be discussed briefly below. It should also be noted<br />

that, although the problem of miss<strong>in</strong>g data can be severe, it does not have to be,<br />

and it is important to keep matters <strong>in</strong> perspective. If the number of miss<strong>in</strong>g<br />

values is small as a proportion of the whole data set, and if the purpose of the<br />

analysis is not focused on extreme aspects of the distribution of the response,<br />

such as determ<strong>in</strong><strong>in</strong>g the top few percentiles, then it is unlikely that naõÈve<br />

approaches will be seriously mislead<strong>in</strong>g.<br />

Almost all of the formal methods discussed for this problem stem from the<br />

classification of miss<strong>in</strong>g data mechanisms <strong>in</strong>to three groups, described by Little<br />

and Rub<strong>in</strong> (1987). These are: (i) miss<strong>in</strong>g completely at random (MCAR); (ii)<br />

miss<strong>in</strong>g at random (MAR); and (iii) processes not <strong>in</strong> (i) or (ii). The f<strong>in</strong>al group

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!