11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

J.P. Shaffer 55Asimpleexampleisthedivisionintoprimaryandsecondaryoutcomesin clinical research. If the primary outcomes are of major importance, howshould that be taken into account? Should error control at a nominal level,for example the usual .05 level, be set separately for each set of outcomes?Should there be a single α level for the whole set, but with different weightson the two different types of outcomes? Should the analysis be treated ashierarchical, with secondary outcomes tested only if one (or more) of theprimary outcomes shows significant effects?A more complex example is the analysis of a multifactor study by ANOVA.The standard analysis considers the main effect of each factor and the interactionsof all factors. Should the whole study be evaluated at the single nominalα level? That seems unwise. Should each main effect be evaluated at thatlevel? How should interactions be treated? Some researchers feel that if thereis an interaction, main effects shouldn’t be further analyzed. But suppose onehigh-order interaction is significant at the nominal α level. Does that meanthe main-effect tests of the factors involved aren’t meaningful?Beyond these analyses, if an effect is assumed to be significant, how shouldthe ensuing more detailed analysis (e.g., pairwise comparisons of treatments)be handled, considering the multiplicity issues? There is little literature on thissubject, which is clearly very difficult. Westfall and Young (1993, Chapter 7)give examples of such studies and the problems they raise.Finally, one of the most complex situations is encountered in a large survey,where there are multiple factors of different types, multiple subgroups,perhaps longitudinal comparisons. An example is the National Assessmentof Educational Progress (NAEP), now carried out yearly, with many educationalsubjects, many subgroups (gender, race-ethnicity, geographical area,socio-economic status, etc.), and longitudinal comparisons in all these.A crucial element in all such ill-structured problems, as noted, is the definitionsof families for which error control is desired. In my two years as directorof the psychometric and statistical analysis of NAEP at Educational TestingService, we had more meetings on this subject, trying to decide on family definitionsand handling of interactions, than any other. Two examples of difficultproblems we faced:(a) Long term trend analyses were carried out by using the same test at differenttime points. For example, nine-year-olds were tested in mathematicswith an identical test given nine times from 1978 to 2004. At first it wasplanned to compare each time point with the previous one. In 1982, whenthe second test was given, there was only one comparison. In 2004, therewere eight comparisons (time 2 with time 1, time 3 with time 2, etc.).Treating the whole set of comparisons at any one time as a family, thefamily size increased with the addition of each new testing time. Thus,to control the FWER, each pairwise test had to reach a stricter level ofsignificance in subsequent analyses. But it would obviously be confusing tocall a change significant at one time only to have it declared not significantat a later time point.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!