30.10.2012 Views

Book Reviews Assessing Educational Measurement: Ovations ...

Book Reviews Assessing Educational Measurement: Ovations ...

Book Reviews Assessing Educational Measurement: Ovations ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

a modest goal; more radical aims should be<br />

contemplated.<br />

For one thing, it seems to me that previous<br />

editions of <strong>Educational</strong> <strong>Measurement</strong><br />

have uniformly and implicitly defined the<br />

universe of testing as consisting nearly exclusively<br />

of large-scale, standardized assessments.<br />

It is a curious contrast: Although<br />

so much educational testing and assessment<br />

occur at the level of the individual student<br />

and teacher or at the classroom level, the<br />

content of each edition of <strong>Educational</strong> <strong>Measurement</strong><br />

is terribly tilted toward the technologies<br />

of testing programs such as the<br />

SAT, ACT, and GRE. It is as if the<br />

Federal Aviation Administration were<br />

to consider aviation safety with exclusive<br />

reference to commercial airlines, ignoring<br />

the much greater volume of private aircraft<br />

flights each day. Clearly, the evolving<br />

technologies of computer adaptive<br />

testing, item response theory, generalizability<br />

theory, and differential item<br />

functioning warrant documentation and<br />

dissemination; and it is true that the<br />

results of large-scale tests are often consequential.<br />

However, it is equally true that<br />

these developments pertain to a narrow<br />

slice of educational assessment and that<br />

classroom testing and grading are consequential<br />

in their own right. The inclusion<br />

of a chapter on classroom assessment in<br />

the fourth edition is commendable and<br />

definitely a step in the right direction, but<br />

this initiative must be broadened.<br />

Accordingly, it seems appropriate to recommend<br />

that educational measurement be<br />

(re)considered more broadly, that balkanization<br />

of topics be avoided, and that cross-level<br />

perspectives be integrated and crosscutting<br />

questions be addressed, to the extent possible,<br />

in each chapter. For example, how<br />

should teachers think about setting standards<br />

on classroom tests? What are appropriate<br />

ways to consider the reliability of<br />

alternate assessments and other tests administered<br />

to sometimes very small samples?<br />

How might coherence between classroom<br />

assessments and state-level content standards<br />

be promoted? Are there any differences in<br />

appropriate testing accommodations for<br />

classroom and large-scale tests? What sources<br />

of validity evidence are appropriate for tests<br />

at different levels, with differing purposes, or<br />

with differing consequences?<br />

Although the recommendation for greater<br />

integration might seem unrealistic, the current<br />

edition of <strong>Educational</strong> <strong>Measurement</strong><br />

actually contains a remarkably comprehensive<br />

example of the kind of integrated treatment<br />

that could serve as a model for chapters<br />

in the next edition. The chapter by Lane<br />

and Stone (2006) on performance assessment<br />

deftly weaves together treatments of<br />

reliability, cognitive psychology, scoring,<br />

measurement models, classroom assessment<br />

concerns, computer-aided testing, validity,<br />

test design, fairness, and other concerns in a<br />

way that fully covers the identified topic of<br />

the chapter but does not duplicate the essential<br />

content of other chapters in the volume.<br />

Finally, to inform thinking about the next<br />

edition, it may be illuminating to look backward.<br />

A historical note appears in the preface<br />

to the first edition of <strong>Educational</strong> <strong>Measurement</strong>.<br />

It refers the reader to a preceding<br />

volume, The Construction and Use of Achievement<br />

Examinations (Hawkes, Lindquist, &<br />

Mann, 1936), which was produced by the<br />

same publisher as the subsequent editions<br />

and could fairly claim to be the real first edition<br />

in the series. That earliest volume contained<br />

a chapter by McConn (1936), whose<br />

observations would easily be at home in the<br />

latest edition:<br />

When one begins to meditate upon<br />

[achievement tests], one can hardly fail to<br />

be astonished by their multiplicity. . . . We<br />

are impelled to ask, why do we give such<br />

countless tests? Probably many persons<br />

will answer immediately that the obvious<br />

and legitimate purpose of practically all<br />

this achievement testing is the maintenance<br />

of standards; which seems to mean<br />

either one or both of two things: the imposition<br />

and enforcement of a prescribed<br />

curriculum; or the enforcement of some<br />

minimum degree of attainment. (p. 446)<br />

McConn (1936) also asks some of the<br />

policy questions that are being asked today<br />

and identifies a gap in the measurement<br />

literature:<br />

What do we accomplish by all this testing<br />

. . . anyway? Is it worth all the effort and<br />

money it costs? Do we perchance do<br />

harm instead of good, or harm as well as<br />

good with our examinations, and especially<br />

through the uses we make of their<br />

results? In short, it seems that we need<br />

not only techniques, but also some philosophy<br />

. . . dealing with the right uses of<br />

such instruments and their wrong uses or<br />

abuses. (p. 443)<br />

In conclusion, the fourth edition of<br />

<strong>Educational</strong> <strong>Measurement</strong> clearly succeeds<br />

in capturing the state of the art in the field.<br />

However, although this new edition documents<br />

substantial advances in the technology<br />

of testing, McConn’s observations highlight<br />

the presence of lingering challenges to be<br />

addressed—challenges related to the social,<br />

political, and educational contexts in which<br />

the science of psychometrics has long been<br />

situated. By tradition, the first two chapters<br />

of each edition of <strong>Educational</strong> <strong>Measurement</strong><br />

are devoted to the essential topics of validity<br />

and reliability, respectively. Two additional<br />

chapters would be a welcome enhancement<br />

to the next edition. The first would be an initial<br />

chapter, preceding those on validity<br />

and reliability, that would begin to articulate<br />

a philosophy—or perhaps multiple<br />

philosophies—of educational testing and<br />

provide a context for relating those foundational<br />

ideas to the technological advances<br />

chronicled in each edition. The second<br />

would describe various models for how the<br />

enterprise of educational measurement can<br />

be integrated across levels of a planned educational<br />

assessment system; such a chapter<br />

would explicitly probe possible structures for<br />

effectively melding classroom assessment,<br />

large-scale testing in elementary and secondary<br />

schools, and postsecondary assessments.<br />

The challenge ahead lies in enhancing<br />

the utility of each component in the system<br />

for consumers of the results while retaining<br />

the fidelity of each component to its intended<br />

measurement objective.<br />

REFERENCES<br />

American <strong>Educational</strong> Research Association,<br />

American Psychological Association, &<br />

National Council on <strong>Measurement</strong> in Education.<br />

(1999). Standards for educational and<br />

psychological testing. Washington, DC: American<br />

<strong>Educational</strong> Research Association.<br />

Black, P., & Wiliam, D. (1998). Assessment and<br />

classroom learning. Assessment in Education,<br />

5(1), 7–74.<br />

Brennan, R. L. (Ed.). (2006a). <strong>Educational</strong> measurement<br />

(4th ed.). Westport, CT: Praeger.<br />

Brennan, R. L. (2006b). Perspectives on the<br />

evolution and future of educational measurement.<br />

In R. L. Brennan (Ed.), <strong>Educational</strong><br />

measurement (4th ed., pp. 1–16). Westport,<br />

CT: Praeger.<br />

Cizek, G. J. (2005). Adapting testing technology<br />

to serve accountability aims: The case of<br />

vertically moderated standard setting. Applied<br />

<strong>Measurement</strong> in Education, 18(1), 1–10.<br />

Cohen, A. S., & Wollack, J. A (2006). Test<br />

administration, security, scoring, and reporting.<br />

In R. L. Brennan (Ed.), <strong>Educational</strong> measurement<br />

(4th ed., pp. 355–386). Westport,<br />

CT: Praeger.<br />

MARCH 2008<br />

99

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!