Book Reviews Assessing Educational Measurement: Ovations ...
Book Reviews Assessing Educational Measurement: Ovations ...
Book Reviews Assessing Educational Measurement: Ovations ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
a modest goal; more radical aims should be<br />
contemplated.<br />
For one thing, it seems to me that previous<br />
editions of <strong>Educational</strong> <strong>Measurement</strong><br />
have uniformly and implicitly defined the<br />
universe of testing as consisting nearly exclusively<br />
of large-scale, standardized assessments.<br />
It is a curious contrast: Although<br />
so much educational testing and assessment<br />
occur at the level of the individual student<br />
and teacher or at the classroom level, the<br />
content of each edition of <strong>Educational</strong> <strong>Measurement</strong><br />
is terribly tilted toward the technologies<br />
of testing programs such as the<br />
SAT, ACT, and GRE. It is as if the<br />
Federal Aviation Administration were<br />
to consider aviation safety with exclusive<br />
reference to commercial airlines, ignoring<br />
the much greater volume of private aircraft<br />
flights each day. Clearly, the evolving<br />
technologies of computer adaptive<br />
testing, item response theory, generalizability<br />
theory, and differential item<br />
functioning warrant documentation and<br />
dissemination; and it is true that the<br />
results of large-scale tests are often consequential.<br />
However, it is equally true that<br />
these developments pertain to a narrow<br />
slice of educational assessment and that<br />
classroom testing and grading are consequential<br />
in their own right. The inclusion<br />
of a chapter on classroom assessment in<br />
the fourth edition is commendable and<br />
definitely a step in the right direction, but<br />
this initiative must be broadened.<br />
Accordingly, it seems appropriate to recommend<br />
that educational measurement be<br />
(re)considered more broadly, that balkanization<br />
of topics be avoided, and that cross-level<br />
perspectives be integrated and crosscutting<br />
questions be addressed, to the extent possible,<br />
in each chapter. For example, how<br />
should teachers think about setting standards<br />
on classroom tests? What are appropriate<br />
ways to consider the reliability of<br />
alternate assessments and other tests administered<br />
to sometimes very small samples?<br />
How might coherence between classroom<br />
assessments and state-level content standards<br />
be promoted? Are there any differences in<br />
appropriate testing accommodations for<br />
classroom and large-scale tests? What sources<br />
of validity evidence are appropriate for tests<br />
at different levels, with differing purposes, or<br />
with differing consequences?<br />
Although the recommendation for greater<br />
integration might seem unrealistic, the current<br />
edition of <strong>Educational</strong> <strong>Measurement</strong><br />
actually contains a remarkably comprehensive<br />
example of the kind of integrated treatment<br />
that could serve as a model for chapters<br />
in the next edition. The chapter by Lane<br />
and Stone (2006) on performance assessment<br />
deftly weaves together treatments of<br />
reliability, cognitive psychology, scoring,<br />
measurement models, classroom assessment<br />
concerns, computer-aided testing, validity,<br />
test design, fairness, and other concerns in a<br />
way that fully covers the identified topic of<br />
the chapter but does not duplicate the essential<br />
content of other chapters in the volume.<br />
Finally, to inform thinking about the next<br />
edition, it may be illuminating to look backward.<br />
A historical note appears in the preface<br />
to the first edition of <strong>Educational</strong> <strong>Measurement</strong>.<br />
It refers the reader to a preceding<br />
volume, The Construction and Use of Achievement<br />
Examinations (Hawkes, Lindquist, &<br />
Mann, 1936), which was produced by the<br />
same publisher as the subsequent editions<br />
and could fairly claim to be the real first edition<br />
in the series. That earliest volume contained<br />
a chapter by McConn (1936), whose<br />
observations would easily be at home in the<br />
latest edition:<br />
When one begins to meditate upon<br />
[achievement tests], one can hardly fail to<br />
be astonished by their multiplicity. . . . We<br />
are impelled to ask, why do we give such<br />
countless tests? Probably many persons<br />
will answer immediately that the obvious<br />
and legitimate purpose of practically all<br />
this achievement testing is the maintenance<br />
of standards; which seems to mean<br />
either one or both of two things: the imposition<br />
and enforcement of a prescribed<br />
curriculum; or the enforcement of some<br />
minimum degree of attainment. (p. 446)<br />
McConn (1936) also asks some of the<br />
policy questions that are being asked today<br />
and identifies a gap in the measurement<br />
literature:<br />
What do we accomplish by all this testing<br />
. . . anyway? Is it worth all the effort and<br />
money it costs? Do we perchance do<br />
harm instead of good, or harm as well as<br />
good with our examinations, and especially<br />
through the uses we make of their<br />
results? In short, it seems that we need<br />
not only techniques, but also some philosophy<br />
. . . dealing with the right uses of<br />
such instruments and their wrong uses or<br />
abuses. (p. 443)<br />
In conclusion, the fourth edition of<br />
<strong>Educational</strong> <strong>Measurement</strong> clearly succeeds<br />
in capturing the state of the art in the field.<br />
However, although this new edition documents<br />
substantial advances in the technology<br />
of testing, McConn’s observations highlight<br />
the presence of lingering challenges to be<br />
addressed—challenges related to the social,<br />
political, and educational contexts in which<br />
the science of psychometrics has long been<br />
situated. By tradition, the first two chapters<br />
of each edition of <strong>Educational</strong> <strong>Measurement</strong><br />
are devoted to the essential topics of validity<br />
and reliability, respectively. Two additional<br />
chapters would be a welcome enhancement<br />
to the next edition. The first would be an initial<br />
chapter, preceding those on validity<br />
and reliability, that would begin to articulate<br />
a philosophy—or perhaps multiple<br />
philosophies—of educational testing and<br />
provide a context for relating those foundational<br />
ideas to the technological advances<br />
chronicled in each edition. The second<br />
would describe various models for how the<br />
enterprise of educational measurement can<br />
be integrated across levels of a planned educational<br />
assessment system; such a chapter<br />
would explicitly probe possible structures for<br />
effectively melding classroom assessment,<br />
large-scale testing in elementary and secondary<br />
schools, and postsecondary assessments.<br />
The challenge ahead lies in enhancing<br />
the utility of each component in the system<br />
for consumers of the results while retaining<br />
the fidelity of each component to its intended<br />
measurement objective.<br />
REFERENCES<br />
American <strong>Educational</strong> Research Association,<br />
American Psychological Association, &<br />
National Council on <strong>Measurement</strong> in Education.<br />
(1999). Standards for educational and<br />
psychological testing. Washington, DC: American<br />
<strong>Educational</strong> Research Association.<br />
Black, P., & Wiliam, D. (1998). Assessment and<br />
classroom learning. Assessment in Education,<br />
5(1), 7–74.<br />
Brennan, R. L. (Ed.). (2006a). <strong>Educational</strong> measurement<br />
(4th ed.). Westport, CT: Praeger.<br />
Brennan, R. L. (2006b). Perspectives on the<br />
evolution and future of educational measurement.<br />
In R. L. Brennan (Ed.), <strong>Educational</strong><br />
measurement (4th ed., pp. 1–16). Westport,<br />
CT: Praeger.<br />
Cizek, G. J. (2005). Adapting testing technology<br />
to serve accountability aims: The case of<br />
vertically moderated standard setting. Applied<br />
<strong>Measurement</strong> in Education, 18(1), 1–10.<br />
Cohen, A. S., & Wollack, J. A (2006). Test<br />
administration, security, scoring, and reporting.<br />
In R. L. Brennan (Ed.), <strong>Educational</strong> measurement<br />
(4th ed., pp. 355–386). Westport,<br />
CT: Praeger.<br />
MARCH 2008<br />
99