What is the Proper Role of Assessment in Education
What is the Proper Role of Assessment in Education
What is the Proper Role of Assessment in Education
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Are <strong>Education</strong>al Tests Inherently Evil?Stephen G. SireciUniversity <strong>of</strong> MassachusettsPresidential Column for February 2007 Issue <strong>of</strong> <strong>the</strong> NERA Researcher(see http://www.nera-education.org/)Tests are given for many reasons <strong>in</strong> <strong>the</strong> educational system. Many <strong>of</strong> <strong>the</strong>se reasons arehated. For example, tests are major components <strong>in</strong> accountability systems that may haveundesirable consequences for teachers, schools, or d<strong>is</strong>tricts. Tests are also sometimes used as arequirement for someth<strong>in</strong>g, such as high school graduation, scholarship, or eligibility toparticipate <strong>in</strong> collegiate athletics. In <strong>the</strong>se <strong>in</strong>stances, tests are <strong>of</strong>ten seen as a hurdle to overcomeor as an unnecessary roadblock to an <strong>in</strong>herent right. Tests are also commonly used to assigngrades to students, particularly beyond elementary school. Given <strong>the</strong>se purposes, who couldpossibly like tests? The answer <strong>is</strong> hardly anyone, perhaps only <strong>the</strong> relatively few high achieverswho enjoy a challenge or <strong>the</strong> opportunity to show what <strong>the</strong>y (we?) can do.Why are tests so widespread if <strong>the</strong>y are so hated? Is it <strong>the</strong> same reason we have<strong>in</strong>telligent design and global warm<strong>in</strong>g? Of course not. The reason <strong>is</strong> that educational tests, ifdeveloped carefully, used properly, and <strong>in</strong>terpreted appropriately, have enormous utility. Assoon as all sides <strong>of</strong> <strong>the</strong> educational community acknowledge that fact, we can make progresstoward a common goal <strong>of</strong> us<strong>in</strong>g assessments to improve student learn<strong>in</strong>g.To properly understand educational tests, particularly <strong>the</strong>ir benefits and limitations, wemust consider <strong>the</strong>ir use <strong>in</strong> specific situations. In th<strong>is</strong> column, I d<strong>is</strong>cuss common perceptions andm<strong>is</strong>conceptions <strong>of</strong> educational tests and <strong>the</strong> role tests currently play <strong>in</strong> federal and state educationreform efforts. A primary goal <strong>of</strong> th<strong>is</strong> d<strong>is</strong>cussion <strong>is</strong> to bridge <strong>the</strong> gap between proponents andopponents <strong>of</strong> standardized test<strong>in</strong>g so that we can work toge<strong>the</strong>r to improve student learn<strong>in</strong>g.Why Are Tests So Ubiquitous <strong>in</strong> <strong>Education</strong>?A popular, but <strong>in</strong>correct, myth <strong>is</strong> that educational are pushed by <strong>the</strong> extreme right <strong>of</strong> <strong>the</strong>political spectrum. Th<strong>is</strong> perception <strong>is</strong> simply false. Although <strong>the</strong> No Child Left Beh<strong>in</strong>d Act(NCLB) was proposed and signed by “he-who-must-not-be-named,” it was really an extension <strong>of</strong>Cl<strong>in</strong>ton’s Goals 2000: Educate America leg<strong>is</strong>lation, which was an extension <strong>of</strong> he-who-mustnot-be-named’sfa<strong>the</strong>r’s America 2000 leg<strong>is</strong>lation. Thus, educational reform and accountabilitymovements <strong>in</strong>volv<strong>in</strong>g test<strong>in</strong>g are one <strong>of</strong> <strong>the</strong> few bipart<strong>is</strong>an areas <strong>of</strong> leg<strong>is</strong>lation we have seen over<strong>the</strong> past several decades. There are, <strong>of</strong> course, strong differences <strong>in</strong> educational policy betweenDemocrats and Republicans, such as <strong>the</strong> f<strong>in</strong>anc<strong>in</strong>g <strong>of</strong> education, but it <strong>is</strong> important to note that<strong>the</strong> NCLB Act was sponsored by Democrat Ted Kennedy and Republican Judd Gregg <strong>in</strong> <strong>the</strong>Senate and Democrat George Miller and Republican John Boehner <strong>in</strong> <strong>the</strong> House. It passedoverwhelm<strong>in</strong>gly <strong>in</strong> both (87-10 <strong>in</strong> <strong>the</strong> Senate and 381-41 <strong>in</strong> <strong>the</strong> House).
Why do federal leg<strong>is</strong>lators agree that mandated test<strong>in</strong>g <strong>is</strong> an important part <strong>of</strong> educationreform? There are several reasons. First, assessment <strong>is</strong> seen as a critical component <strong>in</strong> <strong>the</strong>educational process. In fact, quality education requires cont<strong>in</strong>uous <strong>in</strong>teraction among <strong>in</strong>struction,curriculum, and assessment. Good <strong>in</strong>struction starts with good curricula and both <strong>in</strong>fluence eacho<strong>the</strong>r. The development <strong>of</strong> curricula at <strong>the</strong> d<strong>is</strong>trict and state levels <strong>is</strong> certa<strong>in</strong>ly <strong>in</strong>fluenced bywhat teachers teach <strong>in</strong> <strong>the</strong>ir classrooms. As <strong>the</strong> curricula are developed, teach<strong>in</strong>g practiceschange accord<strong>in</strong>gly. <strong>Assessment</strong>s are needed to d<strong>is</strong>cover what students are learn<strong>in</strong>g. Based onthat <strong>in</strong>formation, changes to <strong>in</strong>struction and curricula occur. My colleague Ron Hambletonrefers to th<strong>is</strong> dynamic <strong>in</strong>teraction as <strong>the</strong> curriculum-assessment-<strong>in</strong>struction cycle, which <strong>is</strong>d<strong>is</strong>played <strong>in</strong> Figure 1. Alignment <strong>of</strong> <strong>the</strong>se three components <strong>of</strong> <strong>the</strong> educational process <strong>is</strong>necessary for quality <strong>in</strong>struction.Figure 1The Curriculum-Instruction-<strong>Assessment</strong> CycleCurriculumInstruction<strong>Assessment</strong>A second reason tests play a prom<strong>in</strong>ent role <strong>in</strong> federal and state education reformmovements <strong>is</strong> that <strong>the</strong>y are an effective means for quickly chang<strong>in</strong>g <strong>in</strong>structional practices. AsMcDonnell (2004) described “although standardized tests are primarily measurement tools toobta<strong>in</strong> <strong>in</strong>formation about student and school performance, <strong>the</strong>y are also strategies for pursu<strong>in</strong>g avariety <strong>of</strong> political goals” (p. 2). McDonnell also po<strong>in</strong>ts out that <strong>the</strong>re are few alternativesavailable to policy makers to enforce <strong>the</strong>ir educational policies. As she put it “Test<strong>in</strong>g’s strongappeal <strong>is</strong> largely attributable to <strong>the</strong> lack <strong>of</strong> alternative policy strategies that fit <strong>the</strong> uniquecircumstances <strong>of</strong> public school<strong>in</strong>g…Standardized tests are one <strong>of</strong> <strong>the</strong> few, albeit <strong>in</strong>complete,ways to measure outcomes <strong>of</strong> teach<strong>in</strong>g” (p. 9).A third reason mandated test<strong>in</strong>g <strong>is</strong> a key component <strong>of</strong> education reform <strong>is</strong> that it forceseducators to align <strong>the</strong>ir <strong>in</strong>struction with state curriculum frameworks. No teacher likes to beoverly constra<strong>in</strong>ed regard<strong>in</strong>g what she or he should teach. However, no one wants teachersspend<strong>in</strong>g large amounts <strong>of</strong> <strong>in</strong>structional time teach<strong>in</strong>g knowledge and skills that most wouldconsider unimportant, relative to o<strong>the</strong>r skills. Thus, education <strong>in</strong>volves consensus about whatshould be taught. The development <strong>of</strong> curriculum frameworks without a means for assess<strong>in</strong>ghow well students master <strong>the</strong> objectives with<strong>in</strong> <strong>the</strong>m would create a situation <strong>in</strong> which <strong>the</strong> goodwork done <strong>in</strong> develop<strong>in</strong>g <strong>the</strong> frameworks could be simply ignored.Critics <strong>of</strong> state-mandated test<strong>in</strong>g argue that <strong>the</strong>se tests narrow <strong>the</strong> curriculum and forceteach<strong>in</strong>g-to-<strong>the</strong> test. Proponents counter that <strong>the</strong> tests are aligned with curriculum frameworks,which were developed through a consensus process, and so teach<strong>in</strong>g to <strong>the</strong> test <strong>is</strong> teach<strong>in</strong>g to <strong>the</strong>frameworks. As <strong>in</strong> most debates, <strong>the</strong> truth probably lies somewhere <strong>in</strong> <strong>the</strong> middle. Never<strong>the</strong>less,
it <strong>is</strong> important to bear <strong>in</strong> m<strong>in</strong>d that <strong>the</strong> idea beh<strong>in</strong>d consensus statewide curriculum frameworksand tests designed to measure <strong>the</strong>m <strong>is</strong> a noble one, because its goal <strong>is</strong> to improve <strong>in</strong>struction. Asa parent, I can understand th<strong>is</strong> position. After all, I want to know that my sons’ teachers areteach<strong>in</strong>g <strong>the</strong> important knowledge and skills <strong>the</strong>y will need to succeed personally andacademically. State curriculum frameworks and tests designed to measure <strong>the</strong>m attempt toensure that what <strong>is</strong> taught <strong>is</strong> important. Thus, <strong>the</strong>y aim toward facilitat<strong>in</strong>g quality <strong>in</strong>struction, asdepicted <strong>in</strong> Figure 1.Focus<strong>in</strong>g on Test UseOne <strong>of</strong> <strong>the</strong> greatest challenges we experience as educational researchers <strong>is</strong> ask<strong>in</strong>g <strong>the</strong>right research questions. With respect to educational test<strong>in</strong>g, <strong>the</strong> questions we ask should bespecific to us<strong>in</strong>g a test for a particular purpose. Thus, questions motivat<strong>in</strong>g research <strong>in</strong> th<strong>is</strong> areashould not be “Is <strong>the</strong> test bad?” or “Is <strong>the</strong> test fair?” but ra<strong>the</strong>r “Will <strong>the</strong> results <strong>of</strong> th<strong>is</strong> testprovide <strong>the</strong> <strong>in</strong>formation it <strong>is</strong> designed to produce?” Thus, evaluat<strong>in</strong>g a test means evaluat<strong>in</strong>g <strong>the</strong>use <strong>of</strong> a test for a particular purpose. Tests are not <strong>in</strong>herently “good” or <strong>in</strong>herently “bad,” butus<strong>in</strong>g a test for some purpose could be ei<strong>the</strong>r, depend<strong>in</strong>g on what <strong>the</strong> test was designed to doversus what it <strong>is</strong> used for. Th<strong>is</strong> notion <strong>is</strong> clear <strong>in</strong> <strong>the</strong> def<strong>in</strong>ition <strong>of</strong> validity presented <strong>in</strong> <strong>the</strong>Standards for <strong>Education</strong>al and Psychological Test<strong>in</strong>g (American <strong>Education</strong>al ResearchAssociation, American Psychological Association, & National Council on Measurement <strong>in</strong><strong>Education</strong>, 1999):Validity refers to <strong>the</strong> degree to which evidence and <strong>the</strong>ory support <strong>the</strong> <strong>in</strong>terpretations <strong>of</strong>test scores entailed by proposed uses <strong>of</strong> tests. (p. 9)As <strong>is</strong> evident <strong>in</strong> th<strong>is</strong> def<strong>in</strong>ition, it <strong>is</strong> not a test that <strong>is</strong> validated per se, but <strong>the</strong> use <strong>of</strong> a testfor a particular purpose. Defend<strong>in</strong>g <strong>in</strong>ferences derived from test scores <strong>in</strong>volves both qualitativeevidence based on <strong>the</strong>ories <strong>of</strong> what <strong>is</strong> be<strong>in</strong>g measured and quantitative evidence <strong>in</strong>dicat<strong>in</strong>g <strong>the</strong>scores reflect <strong>the</strong> measured attribute. Thus, all educational researchers can contribute to researchon test<strong>in</strong>g, regardless <strong>of</strong> <strong>the</strong>ir particular research orientation. I will not d<strong>is</strong>cuss specific meansfor validat<strong>in</strong>g <strong>in</strong>ferences <strong>in</strong> th<strong>is</strong> column (see Kane, 1992 or Sireci, 2005 for examples). Instead, Ifocus on test use <strong>in</strong> educational test<strong>in</strong>g and how it can help or hurt <strong>the</strong> educational process.How are tests used <strong>in</strong> education? Teachers use tests to measure how well students grasp<strong>the</strong> material taught (e.g., classroom tests). Counselors use tests to diagnose students’ strengthsand weaknesses and make referrals for remediation, advanced courses, or o<strong>the</strong>r placementdec<strong>is</strong>ions. Policy makers use tests to evaluate teachers, schools, d<strong>is</strong>tricts, states, countries, andvarious educational programs. Tests are also used as one criterion for high school graduationand for o<strong>the</strong>r types <strong>of</strong> certification such as an honors diploma, and for adm<strong>is</strong>sions <strong>in</strong>topostsecondary and graduate education. As <strong>the</strong> stakes associated with educational tests <strong>in</strong>crease,such as <strong>in</strong> <strong>the</strong> cases <strong>of</strong> grant<strong>in</strong>g a high school diploma or evaluat<strong>in</strong>g <strong>the</strong> performance <strong>of</strong> particularteachers, <strong>the</strong> critic<strong>is</strong>ms also <strong>in</strong>crease. And <strong>the</strong>y should <strong>in</strong>crease. If a test <strong>is</strong> used to make a “big”dec<strong>is</strong>ion, <strong>the</strong> use <strong>of</strong> <strong>the</strong> test for that purpose should be supported by “big” evidence. Thus, aseducational researchers, our assessment research activities should be focused on ask<strong>in</strong>g <strong>the</strong> rightquestions about test use (e.g., Is <strong>the</strong>re evidence to support use <strong>of</strong> th<strong>is</strong> test as a high schoolgraduation requirement?).
I am a psychometrician work<strong>in</strong>g <strong>in</strong> educational measurement and so it <strong>is</strong> pretty obviousthat I must believe <strong>in</strong> <strong>the</strong> usefulness <strong>of</strong> educational tests. However, my strong belief <strong>in</strong> <strong>the</strong>utility <strong>of</strong> educational tests stems not from my psychometric tra<strong>in</strong><strong>in</strong>g, but from my experience as aparent. How do I know if my sons are receiv<strong>in</strong>g a good education? The class work,assignments, and report cards that come home give me some <strong>in</strong>dication, but <strong>the</strong> norm-referencedand criterion-referenced test score reports give me a lot more to go on. The Iowa Tests <strong>of</strong> BasicSkills that our local school d<strong>is</strong>trict uses allows me to compare my sons’ performance to nationalnorms. The Massachusetts Comprehensive <strong>Assessment</strong> System (MCAS) tests allow me to seehow my sons are do<strong>in</strong>g with respect to <strong>the</strong> performance standards establ<strong>is</strong>hed by <strong>the</strong> State. Now,when my wife and I speak with <strong>the</strong>ir teachers or <strong>the</strong> Pr<strong>in</strong>cipal, we can talk about <strong>the</strong>se<strong>in</strong>dependent assessments, and how th<strong>is</strong> <strong>in</strong>formation can be used to improve <strong>the</strong>ir <strong>in</strong>struction.Look<strong>in</strong>g Forward: Collaborat<strong>in</strong>g on <strong>Education</strong>al <strong>Assessment</strong> ResearchIn th<strong>is</strong> column, I merely touched on a few <strong>of</strong> <strong>the</strong> important <strong>is</strong>sues that concerneducational assessment policy and <strong>the</strong> proper use <strong>of</strong> tests <strong>in</strong> our nation’s schools. I know <strong>the</strong>reare many people who will never acknowledge <strong>the</strong> utility <strong>of</strong> a standardized test, but I also know<strong>the</strong>re are many more who hate someth<strong>in</strong>g else—bad teach<strong>in</strong>g! Teachers who are not help<strong>in</strong>g ourstudents reach <strong>the</strong>ir academic potential are much more dangerous to our children than any test. Ido not advocate us<strong>in</strong>g tests to “police” teachers or us<strong>in</strong>g test results to provide sanctions andrewards for teachers (a very bad idea, given <strong>the</strong> different types <strong>of</strong> students taught by differentteachers). However, I like <strong>the</strong> idea <strong>of</strong> measur<strong>in</strong>g students’ achievement with respect to standardsdeveloped through a consensus process, and I like <strong>the</strong> idea <strong>of</strong> provid<strong>in</strong>g as much <strong>in</strong>formation aspossible to parents and o<strong>the</strong>rs about <strong>the</strong> academic achievement and progress <strong>of</strong> <strong>the</strong>ir children.Are <strong>the</strong>re problems with our current educational assessment policies? I th<strong>in</strong>k so. Thereare several valid critic<strong>is</strong>ms about current educational tests. Personally, I am concerned about <strong>the</strong>amount our students are tested, and I am very concerned about <strong>the</strong> pressure that <strong>is</strong> put onstudents before <strong>the</strong>y take a test. So, <strong>the</strong>re <strong>is</strong> much room for improvement, which <strong>is</strong> where we, aseducational researchers, come <strong>in</strong>. Let us not throw <strong>the</strong> baby out with <strong>the</strong> bathwater and simplyd<strong>is</strong>m<strong>is</strong>s tests as useless. Instead, let us research what seems to be work<strong>in</strong>g, what seems to beharmful, and what needs to be improved. By work<strong>in</strong>g toge<strong>the</strong>r, we can improve educationalassessment and provide advice to educational policy makers that <strong>is</strong> based on solid research. Ifwe can do that, we will improve curriculum development, <strong>in</strong>struction, and assessment, with <strong>the</strong>happy consequence <strong>of</strong> improv<strong>in</strong>g student learn<strong>in</strong>g.Clos<strong>in</strong>g RemarksI hope I have <strong>in</strong>spired everyone who read th<strong>is</strong> column to focus on test use when th<strong>in</strong>k<strong>in</strong>gabout educational assessments. Some <strong>of</strong> you may vehemently d<strong>is</strong>agree with one or more <strong>of</strong> <strong>the</strong>po<strong>in</strong>ts I ra<strong>is</strong>ed. O<strong>the</strong>rs may agree, but have additional <strong>is</strong>sues to br<strong>in</strong>g to <strong>the</strong> d<strong>is</strong>cussion. In ei<strong>the</strong>rcase, I encourage you to write a short response for a future edition <strong>of</strong> <strong>the</strong> NERA Researcher, orwrite to me directly at Sireci@acad.umass.edu. I seriously believe that everyone on both sides <strong>of</strong><strong>the</strong> test<strong>in</strong>g debate needs to work toge<strong>the</strong>r to improve <strong>the</strong> <strong>in</strong>struction and learn<strong>in</strong>g experiences <strong>of</strong>our children. Please also contact me if <strong>the</strong>re <strong>is</strong> an assessment <strong>is</strong>sue you would like me to address
<strong>in</strong> a future edition <strong>of</strong> th<strong>is</strong> column. And one o<strong>the</strong>r th<strong>in</strong>g—don’t forget to get your proposals <strong>in</strong> forNERA 2007!ReferencesAmerican <strong>Education</strong>al Research Association, American Psychological Association, & NationalCouncil on Measurement <strong>in</strong> <strong>Education</strong>. (1999). Standards for educational andpsychological test<strong>in</strong>g. Wash<strong>in</strong>gton, D.C.: American <strong>Education</strong>al Research Association.Kane, M.T. (1992). An argument-based approach to validity. Psychological Bullet<strong>in</strong>, 112,527-535.McDonnell, L. M. (2004). Politics, persuasion, and educational test<strong>in</strong>g. Cambridge, MA:Harvard University Press.Sireci, S. G. (2005). Validity <strong>the</strong>ory and applications. Encyclopedia <strong>of</strong> stat<strong>is</strong>tics <strong>in</strong> <strong>the</strong>behavioral sciences (Volume 4, pp. 2103-2107). West Sussex, UK: John Wiley & Sons.