Book Reviews Assessing Educational Measurement: Ovations ...

a modest goal; more radical aims should be 

contemplated. 

For one thing, it seems to me that previous 

editions of Educational Measurement 

have uniformly and implicitly defined the 

universe of testing as consisting nearly exclusively 

of large-scale, standardized assessments. 

It is a curious contrast: Although 

so much educational testing and assessment 

occur at the level of the individual student 

and teacher or at the classroom level, the 

content of each edition of Educational Measurement 

is terribly tilted toward the technologies 

of testing programs such as the 

SAT, ACT, and GRE. It is as if the 

Federal Aviation Administration were 

to consider aviation safety with exclusive 

reference to commercial airlines, ignoring 

the much greater volume of private aircraft 

flights each day. Clearly, the evolving 

technologies of computer adaptive 

testing, item response theory, generalizability 

theory, and differential item 

functioning warrant documentation and 

dissemination; and it is true that the 

results of large-scale tests are often consequential. 

However, it is equally true that 

these developments pertain to a narrow 

slice of educational assessment and that 

classroom testing and grading are consequential 

in their own right. The inclusion 

of a chapter on classroom assessment in 

the fourth edition is commendable and 

definitely a step in the right direction, but 

this initiative must be broadened. 

Accordingly, it seems appropriate to recommend 

that educational measurement be 

(re)considered more broadly, that balkanization 

of topics be avoided, and that cross-level 

perspectives be integrated and crosscutting 

questions be addressed, to the extent possible, 

in each chapter. For example, how 

should teachers think about setting standards 

on classroom tests? What are appropriate 

ways to consider the reliability of 

alternate assessments and other tests administered 

to sometimes very small samples? 

How might coherence between classroom 

assessments and state-level content standards 

be promoted? Are there any differences in 

appropriate testing accommodations for 

classroom and large-scale tests? What sources 

of validity evidence are appropriate for tests 

at different levels, with differing purposes, or 

with differing consequences? 

Although the recommendation for greater 

integration might seem unrealistic, the current 

edition of Educational Measurement 

actually contains a remarkably comprehensive 

example of the kind of integrated treatment 

that could serve as a model for chapters 

in the next edition. The chapter by Lane 

and Stone (2006) on performance assessment 

deftly weaves together treatments of 

reliability, cognitive psychology, scoring, 

measurement models, classroom assessment 

concerns, computer-aided testing, validity, 

test design, fairness, and other concerns in a 

way that fully covers the identified topic of 

the chapter but does not duplicate the essential 

content of other chapters in the volume. 

Finally, to inform thinking about the next 

edition, it may be illuminating to look backward. 

A historical note appears in the preface 

to the first edition of Educational Measurement. 

It refers the reader to a preceding 

volume, The Construction and Use of Achievement 

Examinations (Hawkes, Lindquist, & 

Mann, 1936), which was produced by the 

same publisher as the subsequent editions 

and could fairly claim to be the real first edition 

in the series. That earliest volume contained 

a chapter by McConn (1936), whose 

observations would easily be at home in the 

latest edition: 

When one begins to meditate upon 

[achievement tests], one can hardly fail to 

be astonished by their multiplicity. . . . We 

are impelled to ask, why do we give such 

countless tests? Probably many persons 

will answer immediately that the obvious 

and legitimate purpose of practically all 

this achievement testing is the maintenance 

of standards; which seems to mean 

either one or both of two things: the imposition 

and enforcement of a prescribed 

curriculum; or the enforcement of some 

minimum degree of attainment. (p. 446) 

McConn (1936) also asks some of the 

policy questions that are being asked today 

and identifies a gap in the measurement 

literature: 

What do we accomplish by all this testing 

. . . anyway? Is it worth all the effort and 

money it costs? Do we perchance do 

harm instead of good, or harm as well as 

good with our examinations, and especially 

through the uses we make of their 

results? In short, it seems that we need 

not only techniques, but also some philosophy 

. . . dealing with the right uses of 

such instruments and their wrong uses or 

abuses. (p. 443) 

In conclusion, the fourth edition of 

Educational Measurement clearly succeeds 

in capturing the state of the art in the field. 

However, although this new edition documents 

substantial advances in the technology 

of testing, McConn’s observations highlight 

the presence of lingering challenges to be 

addressed—challenges related to the social, 

political, and educational contexts in which 

the science of psychometrics has long been 

situated. By tradition, the first two chapters 

of each edition of Educational Measurement 

are devoted to the essential topics of validity 

and reliability, respectively. Two additional 

chapters would be a welcome enhancement 

to the next edition. The first would be an initial 

chapter, preceding those on validity 

and reliability, that would begin to articulate 

a philosophy—or perhaps multiple 

philosophies—of educational testing and 

provide a context for relating those foundational 

ideas to the technological advances 

chronicled in each edition. The second 

would describe various models for how the 

enterprise of educational measurement can 

be integrated across levels of a planned educational 

assessment system; such a chapter 

would explicitly probe possible structures for 

effectively melding classroom assessment, 

large-scale testing in elementary and secondary 

schools, and postsecondary assessments. 

The challenge ahead lies in enhancing 

the utility of each component in the system 

for consumers of the results while retaining 

the fidelity of each component to its intended 

measurement objective. 

REFERENCES 

American Educational Research Association, 

American Psychological Association, & 

National Council on Measurement in Education. 

(1999). Standards for educational and 

psychological testing. Washington, DC: American 

Educational Research Association. 

Black, P., & Wiliam, D. (1998). Assessment and 

classroom learning. Assessment in Education, 

5(1), 7–74. 

Brennan, R. L. (Ed.). (2006a). Educational measurement 

(4th ed.). Westport, CT: Praeger. 

Brennan, R. L. (2006b). Perspectives on the 

evolution and future of educational measurement. 

In R. L. Brennan (Ed.), Educational 

measurement (4th ed., pp. 1–16). Westport, 

CT: Praeger. 

Cizek, G. J. (2005). Adapting testing technology 

to serve accountability aims: The case of 

vertically moderated standard setting. Applied 

Measurement in Education, 18(1), 1–10. 

Cohen, A. S., & Wollack, J. A (2006). Test 

administration, security, scoring, and reporting. 

In R. L. Brennan (Ed.), Educational measurement 

(4th ed., pp. 355–386). Westport, 

CT: Praeger. 

MARCH 2008 

99

Previous page

Next page

1

2

3

4

5

Book Reviews Assessing Educational Measurement: Ovations ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?