11.07.2015 Views

Assessing Local Dependence in Educational Performance ...

Assessing Local Dependence in Educational Performance ...

Assessing Local Dependence in Educational Performance ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Assess<strong>in</strong>g</strong> <strong>Local</strong> <strong>Dependence</strong> <strong>in</strong> <strong>Educational</strong> <strong>Performance</strong> Assessmentswith Clustered Frce-Reslx~nse ItemsSteven FerraraMaryland State Department of EducationHuynh HuvnhUniversity of South Carol<strong>in</strong>aHeibatollah BaghiDelaware Department of Public InstructionThe purposes of this study are to (a} extend to polychotomous items a procedure foridentify<strong>in</strong>g dependence <strong>in</strong> b<strong>in</strong>ary items that does not require assumptions of IRT-basedapproaches, (b) explore the magnitude of local dependence caused by cluster<strong>in</strong>g of freeresponseitems around a topical task or authentic read<strong>in</strong>g passages <strong>in</strong> a large-scaleperformance assessment, and (c) probe contextual factors which may contribute to clusterdependence. Item cluster<strong>in</strong>g is common <strong>in</strong> performance-based or "authentic" assessments.Data for the study come from the 1991 adm<strong>in</strong>istration of the Mm'yland School <strong>Performance</strong>Assessment Program. The results <strong>in</strong>dicate that although there is very little correlationacross clusters, levels of sometimes substantial positive dependence exist for items with<strong>in</strong>clusters. Hypothesized explanations are made for the high levels of local dependencewhere it exists.Key words: Authentic assessment, item dependence, free response, multi-stepmath problem, open-ended items, performance-based assessment, read<strong>in</strong>g passageI",,.O101'k'-Authors' notes: An earlier version of this paper was presented at the annual meet<strong>in</strong>gof the National Council on Measurement <strong>in</strong> Education, San Francisco, April 1992. Wewish to thank Susan Ciotta, Trudy Collier, and June Danaher for their <strong>in</strong>valuable help withthis manuscript.


_ L . . . .As.~..uiall Lo~ IX-pcndcm,z - 1<strong>Assess<strong>in</strong>g</strong> Lxx:al <strong>Dependence</strong> <strong>in</strong> <strong>Educational</strong> Pcrforrnance A~essmcntswith Clustered Free-Response ItemsA current trend <strong>in</strong> educational assessment is to move away from the exclusive useof forced-choice (e.g., multiple-choice) items and to rely more and more on free-responseor open-ended items. Such assessments are often referred to as "performanceassessmenls" or "authentic assessments" (e.g., Wigg<strong>in</strong>s, 1989). Authentic assessmentsare <strong>in</strong>tended to mirror exemplary strategic <strong>in</strong>struction and ate believed by proponents tomeasure higher-order th<strong>in</strong>k<strong>in</strong>g processes and knowledge <strong>in</strong>tegration (e.g., Frederiksen,1984; Shepard, 1991). In exemplary <strong>in</strong>struction, teachers often present background<strong>in</strong>formation and promote connections among steps <strong>in</strong> the learn<strong>in</strong>g process to make thelearn<strong>in</strong>g more fluid and less arduous. Similarly, <strong>in</strong> authentic assessments the exam<strong>in</strong>eemay be presented with a context or purpose for complet<strong>in</strong>g the assessment activities. Inaddition, the set of items presented may have been selected and sequenced to lead theexam<strong>in</strong>ee to a culm<strong>in</strong>ation such as an explanation, decision, or recommendation and f<strong>in</strong>alexpression of understand<strong>in</strong>g which reveals complex th<strong>in</strong>k<strong>in</strong>g. In such assessments, openendeditems are presented together and may be organized either around one or more read<strong>in</strong>gpassages or <strong>in</strong> a multi-step task, <strong>in</strong> what we refer to here as "clusters."<strong>Performance</strong> assessments that conta<strong>in</strong> such clusters of items may better assesshigher-order th<strong>in</strong>k<strong>in</strong>g and knowledge <strong>in</strong>tegration than do performance assessments whichare non-clustered collections of open-ended items. For example, <strong>in</strong> a read<strong>in</strong>g cluster theexam<strong>in</strong>ee may be asked to read one or more complete "authentic" (i.e., previouslypublished) texts and then respond to a series of assessment activities perta<strong>in</strong><strong>in</strong>g to bothtexts. In a mathematics cluster, a complex problem may be posed that requires multiple


,b,.." <strong>Assess<strong>in</strong>g</strong> <strong>Local</strong> <strong>Dependence</strong> -.2steps to reach a solution. In all likelihood, the cluster<strong>in</strong>g of such free-response itcmswould create some level of dcpendence <strong>in</strong> the responses <strong>in</strong> each cluster.This likelihood of item dependence can be problematic. <strong>Local</strong> <strong>in</strong>dcpendcnce mustbe achieved <strong>in</strong> order to meet the assumlaions of item respon~ theory (IRT) measurementmodels and to optimize test score reliability. Often, the prefened solution to itemdependence <strong>in</strong> test clusters is to sum the scores from dependent items <strong>in</strong>to <strong>in</strong>dependent"testlets" (e.g., Wa<strong>in</strong>cr & Kiely, 1987) t rather than to delete or revise items until<strong>in</strong>dependence is achieved. Little is currently known about how much depende, nee mayexist <strong>in</strong> item clusters <strong>in</strong> performance assessments nor about the item characteristics that maycause dependence (cf. Yen, <strong>in</strong> press).Three purposes of this study arc to (a) extend to polychotomous items a procedurefor identify<strong>in</strong>g dependence <strong>in</strong> b<strong>in</strong>ary items that does not require assumptions of IRT-basedapproaches, (b) explore the magnitude of local dependence caused by the cluster<strong>in</strong>g of freeresponseitems around a topical task or authcnfic read<strong>in</strong>g passages <strong>in</strong> a large-scaleperformance assessment, and (c) probe contextual factors which may contribute to clusterdependence.Rclatcd SludiesSeveral researchers have noted passage dependence of standardized read<strong>in</strong>g testitems requir<strong>in</strong>g complex <strong>in</strong>fcrences, <strong>in</strong>clud<strong>in</strong>g Hanna and Oaster (1980), Nicholas andB rookshire (I 987), and Scherich and Hanna (1977). ,Green and Langhorst (1986) notedthat some degree of both passage dependence and <strong>in</strong>dependence is to be expected <strong>in</strong> anytest of read<strong>in</strong>g comprehension that covers a broad range of objectives.Ackerman (1987) and Ackerman and Spray(1986) used a model for itemdependence to simulate the effects of violat<strong>in</strong>g local <strong>in</strong>dependence on item calibration and


Asscss<strong>in</strong>$ <strong>Local</strong> Del~nClenc~ - 3ability estimation. They suggested that calibra~cl dependent itcrn parameters tended toov'er-estimatc the origirm[ itcrn parameters. Ability estimates ~'erc even less accurate as thedegree of dependence <strong>in</strong>creased. Bell, Pattison, and Withers (1988) described a Iogl<strong>in</strong>earmodel<strong>in</strong>g approach to exam<strong>in</strong>e local dependence of items with<strong>in</strong> and between clusters.They found item dependence to be more marked with<strong>in</strong> rather than between clusters, thatdependence was stronger <strong>in</strong> items ba.~:d on mathematical than on verbal material, and thatdependence <strong>in</strong>creased with exam<strong>in</strong>ee ability.Wilson (1988) provides a proc~ure to assess local dependence of bioary itemswhen the responses fit a Rasch modcl and provides illustrations ba.~'d on 20 science items.First he calibrated b<strong>in</strong>ary items under the assumption of local <strong>in</strong>dependence. Next. hegrouped the b<strong>in</strong>ary itcms <strong>in</strong>to clusters (i.e., what he called subtests or superitems) and thencalibrated via Masters' one-parameter Partial Credit model for i:x)lychotomous items(Masters, 1982_; Wright & Masters, 1982). F<strong>in</strong>ally, he osserted local dependence whenthere were substantial discrepancies between the results of the two calibrations. Huynh(1993) provides a partial theoretical justification of the procedure proposed by Wilson.F<strong>in</strong>ally, Yen (1984) proposed us<strong>in</strong>g the Q3 statistic to assess local dependenceunder assumptions of latent trait models. For an exam<strong>in</strong>ee with ability 0j on item j, let Xjbe the raw score, Ej be the expected score from a latent trait model, and let Dj = Xj - Ej bethe residual score. Then the Q3jj, <strong>in</strong>dex for itemsj andj' is the correlation between the tworesiduals Dj and D'j taken over all exam<strong>in</strong>ees. Yen noted that s<strong>in</strong>ce item j is <strong>in</strong>cludedexplicitly <strong>in</strong> the raw score Xj and implicitly <strong>in</strong> the (estimated) expected score E, a negativebias is built <strong>in</strong>to Q3. When local <strong>in</strong>dependence holds, Q3 is expected to be approximately-1/(n-l) where n is the number of b<strong>in</strong>ary items. Yen (<strong>in</strong> press) applied the Q3 statistic tostudy b<strong>in</strong>ary, and polychotomous items and found that the Q3 statistic performs adequatelyfor polychotomous items. However, the use of Q3 is limited to data which adequately fit alatent trait model.


- _Azz, csz<strong>in</strong>g <strong>Local</strong> Delx.adeace - 4For a complex assessment program <strong>in</strong>volv<strong>in</strong>g authentic activities such as theMaryland School <strong>Performance</strong> Assessment Program (MSPA~ described below), it isconceivable that complex, multi.dimensional latent traits may be needed to fully capture thenature of these items. Hence, it would be helpful to be able to study local dependence offree-response items without hav<strong>in</strong>g to impose a particular latent trait model on the data.The methodology <strong>in</strong> this paper for assess<strong>in</strong>g local dependence <strong>in</strong> free-response items isbased on previous work for multiple-choice items cited <strong>in</strong> Hamblcton (1988). Hamblctonsuggests <strong>in</strong>vestigat<strong>in</strong>g local <strong>in</strong>dependence b v check<strong>in</strong>g correlation matrices for-exam<strong>in</strong>ees atdifferent <strong>in</strong>tervals on the ability or test score scale. This study is based on real exam<strong>in</strong>eedata from the 1991 adm<strong>in</strong>istration of the Marxland School <strong>Performance</strong> As~ssmentProgram.The Maq,'land School <strong>Performance</strong> Assessment Program (MSPAP)The 1991 adm<strong>in</strong>istration of MSPAP <strong>in</strong>cluded several parallel test forms withclustered free-response items." Each cluster of read<strong>in</strong>g items is conta<strong>in</strong>ed <strong>in</strong> a task (i.e., acluster of assessment activities) and perta<strong>in</strong>s to one or more complete authentic texts oftenof several pr<strong>in</strong>ted pages (e.g., from the children's magaz<strong>in</strong>es "Cricket," "Cobblestones,"and "Ranger Rick'). Each mathematics item cluster is a self-conta<strong>in</strong>ed problem-solv<strong>in</strong>gtask conta<strong>in</strong><strong>in</strong>g 3-5 assessment activities which arc organized around a theme (e.g.,"recycl<strong>in</strong>g alum<strong>in</strong>um cans to make money'), and lead<strong>in</strong>g to a f<strong>in</strong>al recommendation,decision, or other culm<strong>in</strong>ation. Efforts were made to write items so that exam<strong>in</strong>ees couldrespond to each item whether or not the,." had responded to other items <strong>in</strong> a cluster.However, there was no guarantee that the results of this effort would satisfy the conditionof local <strong>in</strong>dependence. Because of these complex contexts surround<strong>in</strong>g free-response


Re<strong>in</strong>s, the level of dependence may be more visible <strong>in</strong> a free-response modc than <strong>in</strong> thcmore commonly used forced-choice mode.MSPAP is pan of a larger school reform effort <strong>in</strong>itiated by the Maryland StateDepartment of Education. <strong>Performance</strong> on the MSPAP is used to evaluate schoolperformance and to provide <strong>in</strong>formation to guide school improvement efforts. Thc 1991MSPAP <strong>in</strong>cluded assessments of widely distributed learn<strong>in</strong>g outcomes <strong>in</strong> read<strong>in</strong>g, writ<strong>in</strong>g,language usage and mathematics <strong>in</strong> grades 3.5, and 8. All tasks <strong>in</strong> thc MSPAP rcquircbrief and extended open-ended responses to performance tasks: Exam<strong>in</strong>ees write, diagram,and sketch responses to tasks that focus on thcir ability to construct and extend mean<strong>in</strong>gfrom what they read, construct and cxlcnd mean<strong>in</strong>g through writ<strong>in</strong>g, and solve multi-stepmathematics problcms. Tasks arc designed to clicit exam<strong>in</strong>ees' thoughtful application ofknowledge, skills, and th<strong>in</strong>k<strong>in</strong>g processes. Exam<strong>in</strong>ee responses are ~ored by tra<strong>in</strong>edteachers us<strong>in</strong>g scor<strong>in</strong>g activity-specific keys for brief correct and <strong>in</strong>correct responses;generic ~les for longer responses which are scored as correct/adequate, partiallycorrect/adequate, or <strong>in</strong>correct/<strong>in</strong>adequate; and rubri~ for essays and other extendedresponses. Although each 1991 MSPAP test form was adm<strong>in</strong>istered to a randomlyassigned group of at least 3,000 exam<strong>in</strong>ees, samples of approximately 1,400 exam<strong>in</strong>eeswere used <strong>in</strong> this study.Task developers were tra<strong>in</strong>ed to m<strong>in</strong>imize response dependence <strong>in</strong> the assessmentactivities they wrote for MSPAP assessment tasks. After the development phase, taskswere submitted to both judgmental and statistical analyses. One such judgmental reviewfocused on response dependence. The goals of this review were to (a) identify clusters of<strong>in</strong>terdependent responses, and (b) treat<strong>in</strong>g <strong>in</strong>terdependent clusters as testlets, determ<strong>in</strong>e ifan entire assessment form conta<strong>in</strong>ed adequate numbers of <strong>in</strong>dependent responses to allowfor IRT scal<strong>in</strong>g. (Details on scal<strong>in</strong>g procedures are available <strong>in</strong> C'TB Macmillan/McGraw-Hill, 1992.) To identify <strong>in</strong>terdependent responses with<strong>in</strong> item clusters, reviewers read eachitem that was designed to elicit an exam<strong>in</strong>ee response and answered the questions, "Can


#.. Asse~<strong>in</strong>g <strong>Local</strong> ~ - 6exam<strong>in</strong>ees respond to this item whether or not they have (a) responded to previous or• subsequent items, and (b) answered corroctly or adequately to previous or subsequentitems?" An answer of "No" to either of these questions <strong>in</strong>dicated a likely <strong>in</strong>tcrdcpcndentcluster. For MSPAP scal<strong>in</strong>g purposes, judgmentally dctcrm<strong>in</strong>ed <strong>in</strong>tcrdcpendent clusterswere scaled as testlets only if Q3 statistics (Yen, 1984) <strong>in</strong>dicated strong response<strong>in</strong>terdependence and if creation of testlcts did not result <strong>in</strong> substantial item misfit. (Detailsare available <strong>in</strong> CTB Macmillan/McGraw-Hill, 1992.)i.McthodData Sets Used <strong>in</strong> this StudySeven data sets were systematically extracted from the MSPAP file ofapproximately 1.50,000 cases and served as the data base for this study. The first three damsets (RD36, RDSI, and RD81) consist of the responses to the read<strong>in</strong>g exam<strong>in</strong>ations forgrades 3, .5, and 8. The next two data sets (MC31 and MC83) consist of the responses tothe mathematics content (MC) exam<strong>in</strong>ations for grades 3 and 8. F<strong>in</strong>ally, the last two datasets (MP31 and MP8 l) conta<strong>in</strong> the responses to the mathematics process (MP)exam<strong>in</strong>ations for grades 3 and 8.Each read<strong>in</strong>g exam<strong>in</strong>ation consists of clusters of items based on four read<strong>in</strong>gpassages. Each mathematics exam<strong>in</strong>ation consists of c]usicrs of items centered on n<strong>in</strong>etasks, cach task conta<strong>in</strong><strong>in</strong>g both mathematics content'and mathematics process items. TableI lists the number of items for each exam<strong>in</strong>ation as wcll as the number of items <strong>in</strong> eachcluster.


0' Alseas<strong>in</strong>lt ~ ~ . "7In.sen Table 1 about hereThe upper panel ~Table 2 provides the summa~" statistics for each cxam<strong>in</strong>ation.The data <strong>in</strong>dicate that the read<strong>in</strong>g exam<strong>in</strong>ations are slightly easy for an exam<strong>in</strong>ee of averageability and that the distributions of the raw scores are ncgatively skcwed. The meanpercentage of maximum scores are 51% for RD36, 60% for RDSI, and 64% for RD81.On the other hand, the mathematics exam<strong>in</strong>ations are quite difficult for an exam<strong>in</strong>ee ofaverage ability and the distributions of the raw scores are positively skewed. The meanpercentage of maximum scores are 39% for MC31, 34% for MC83, 22% for MFr31, and29% for MPSI.Insert Table 2 about hemProceduresSort<strong>in</strong>g subjects <strong>in</strong>to homogeneous score groups. The first step <strong>in</strong> the data analysisfor an exam<strong>in</strong>ation was to use the raw score on the exam<strong>in</strong>ation to sort subjects <strong>in</strong>to fivehomogeneous score groups of approximately the same size.Descriptive statistics for the score groups are found <strong>in</strong> the lower panel of Table 2.It may be seen that among the five score groups constructed from each exam<strong>in</strong>ation, thereis more variability <strong>in</strong> the group at the skewed tail of the score range. In order to make the<strong>in</strong>ter-item correlations <strong>in</strong> each exam<strong>in</strong>ation more compatible across score groups, the


, Aclw.,~ng <strong>Local</strong> i::)epend¢m:e - 8extreme groups (Group I for the wad<strong>in</strong>g exam<strong>in</strong>ations and Group 5 for thc mathematicsexam<strong>in</strong>ations) were deleted from all subsequent analyses.Overall procedure for dependence analysis. For cach score group, all <strong>in</strong>ter-itemcorrelations were computed. The average with<strong>in</strong>-cluster correlation was then computed forall <strong>in</strong>ter-item correlations with<strong>in</strong> a given common cluster. A frequency distribution wassubsequently constructed to give an overall picture of these correlati,ms. Then the averagebetween-cluster correlations were computed and tabulated for all <strong>in</strong>ter-itcrn correlationsbetween any two given clusters. F<strong>in</strong>ally, all clusters with mc, derate or substantial with<strong>in</strong>clusteraverage correlations were idcntificd and hypothesized explanations made on theprobable causes for this dependence.ResultsBetween-Clustcr Correlation AnalysesTable 3 reports frequency distributions of the average between-cluster correlationsfor the n<strong>in</strong>e data sets. Although the negative correlations outnumber the positive ones,practically all the correlations are between -.07 and +.07. 3 Given the similarity between theaverage <strong>in</strong>ter-item correlations used <strong>in</strong> this study and the Q3 statistic (Yen. <strong>in</strong> press), thenegative bias of Q3 is expected to hold true for these average correlations as well. Most ofthe average correlations for each of the n<strong>in</strong>e test forms reported <strong>in</strong> the last row of Table 3are close to the quantity-I/(n-I) (the expected value of Q3 under local <strong>in</strong>dependence,described earlier), where n is the maximum raw score. Thus, these data provide strongevidence that the vectors of responses to the dusters can be assumed to be locally<strong>in</strong>dependent.


A~x~ng <strong>Local</strong> <strong>Dependence</strong>. 9Insert Table 3 about hereWith<strong>in</strong>-Cluster Correlation A nalv~sTable 4 reports frequency distributions for the average with<strong>in</strong>-cluster eorrclations.S<strong>in</strong>ce almost all the correlations are above zero and several are atxwe +.07. the data clearlypo<strong>in</strong>t to positive dependence among the items with<strong>in</strong> several clusters.Insert Table 4 about hereTables 5 and 6 report the average with<strong>in</strong>-cluster correlations for each cluster andscore group along with a qualitative description of the degree of Icx:al dependence. Thelocal dependence <strong>in</strong> Table 4 is described <strong>in</strong> Tables 5 and 6 as "moderate" if two or morescore groups have correlations <strong>in</strong> a cluster of at least .08. The description changes to"substantial" if there are at least two correlations of at least. 10. 4. t ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . o . . . . . . .Insert Tables 5 and 6 about here


, .......... -- A=e==.s Lo=a ~ ' xoContent Descriptions for Clusters with Substantial ~Dcpcnd, c.n,,ceAccord<strong>in</strong>g to these criteria, dependence is identified a.~ moderate for five clustersand as substantial for seven clusters. For the illustrative purposes of this paper, di~ussionis ~o.vided only for the seven clusters with substantial dependence: RDSI, cluster 4(RDSI-4); RD81, cluster 3 (RD81-3); MC3I, cluster 3 (MC3 i-3); MC83, clusters 4 and 6(MC83-4, MC83~); and MP31, clusters 3 and 8 (MP'31-3, MP31-8). Table7 providesbrief content descriptions of the items <strong>in</strong> these clusters. In this section o/" the paper weprovide hypothesized explanations of the causes of dcpendcnce <strong>in</strong> these clusters.Insert Table 7 about hereRead<strong>in</strong>g cluster~.. RDSL-4 requires exam<strong>in</strong>ees to fill out an enrollment form ['or agymnastics class alter hav<strong>in</strong>g read a complete sto~" about family relationships whose ma<strong>in</strong>character is a young gid. These five items focus on exam<strong>in</strong>ee ability to read to perform atask. While it appears that exam<strong>in</strong>ees can answer each of these items <strong>in</strong>dependent of theother items, the <strong>in</strong>formation necessary, to answer the items can be found only <strong>in</strong> the stor?,'.Items on the 1991 MSPAP <strong>in</strong>tended to assess read<strong>in</strong>g to perform a task were <strong>in</strong>tentionallydeveloped to be text dependent. Neither exam<strong>in</strong>cc prior knowledge nor conjecture wouldhelp exam<strong>in</strong>ees form responses to these items. This dependence on the sto~" to answerthese questions may be the source of the dependence among these items. The source ofdependency <strong>in</strong> these items appears to be similar to sources of dependency <strong>in</strong> earlieranalyses of read<strong>in</strong>g comprehension items <strong>in</strong> multiple-choice tests (e.g., Green &


. )<strong>Assess<strong>in</strong>g</strong> <strong>Local</strong> <strong>Dependence</strong> - IILanghorst, 1986; Hanna & Oaster, 19B0; Nicholas & Brookshire, 1987; Schcrich &Hanna, 19'77).RD81-3 requires exam<strong>in</strong>ees to answcr one comparison and onc contrast questionabout the ma<strong>in</strong> characters from an <strong>in</strong>formational article and a short stor3", both with thetheme "meet<strong>in</strong>g challenges." The <strong>in</strong>terdcpcndence of the two items <strong>in</strong> this cluster may havetwo possible causes. First, both items require cxam<strong>in</strong>ccs to th<strong>in</strong>k about the characters <strong>in</strong>both read<strong>in</strong>g selections. As with RD51-4, respond<strong>in</strong>g to these two items is highlydependent on the read<strong>in</strong>gs. Second, it is likely that cxam<strong>in</strong>ccs would not be able to contrastthe two characters without first hav<strong>in</strong>g drawn comparisons between them. Thus,exam<strong>in</strong>ees would not be ablc to answer the contrast question without first hav<strong>in</strong>gresponded somewhat successfully to the comparison question. In subsequent editions ofthe MSPAP, compare and contrast questions have been developed as s<strong>in</strong>gle, rather thanseparate items.Mathematics content clustcrs. In MC31-3, exam<strong>in</strong>ees must read data regard<strong>in</strong>gnumbers of alum<strong>in</strong>um cans collected over sevcral weekends, re-arr'angc the data <strong>in</strong> a table,perform two different calculations us<strong>in</strong>g data from the re-arranged table, determ<strong>in</strong>e the totalamount of money made from collect<strong>in</strong>g the cans, and estimate how many more weekendsof collect<strong>in</strong>g cans are necessa~" to earn $4.00. F.:cam<strong>in</strong>ccs would not be abic to completesubsequent calculations <strong>in</strong> this cluster without complet<strong>in</strong>g previous calculations nor makethe required estimate without complet<strong>in</strong>g all calculations. All questions <strong>in</strong> the cluster aredependent on the data given <strong>in</strong> the orig<strong>in</strong>al table.All five responses <strong>in</strong> MC83-4 require exam<strong>in</strong>ees to understand the relationshipbetween the number of cuts made on a rectangular-shaped cake and the number of pieces ofcake that result. First, exam<strong>in</strong>ees describe the pattern <strong>in</strong> a given table (i.e., 1 cut <strong>in</strong> the cake= 2 pieces of cake, 2 cuts = 4 pieces, 3 cuts = 8 pieces). Then exam<strong>in</strong>ees extend the tablefor 4 and more cuts, identify from the extended table the fewest numbers of cuts necessaryto produce 32 pieces of cake, write an equation that represents the relationship between cuts


J,'Aue~<strong>in</strong>8 <strong>Local</strong> D=pendence - 1:o.and pieces of calce, predict the probability of f<strong>in</strong>d<strong>in</strong>g an odd number of pieces of cake us<strong>in</strong> 8the equation, and expla<strong>in</strong> [he predichon. 5 Responses to each item <strong>in</strong> this cluster are builtsequentially on previous responses and on understand<strong>in</strong>g the pattern <strong>in</strong> the orig<strong>in</strong>aJ table.MC83-6 follows almost [he same pattern as MC:83-4, but requires cxam<strong>in</strong>ccs tounderstand [he relationship between a serics of hexagons with a shared side and the numberof toothpicks required to create these hexagons. As above, cxarn<strong>in</strong>ccs are given thepattern, extend the pattern <strong>in</strong> a table, construct responds to questions from the new table,and write an equation. Also as above, responses to each item <strong>in</strong> this clustcr'arc builtsequentially on previous responses and on understand<strong>in</strong>g the orig<strong>in</strong>al pattcrn.Mathematics proccss.clustc~. Cluster MP3 l-3, a mathcmatics process cluster, islocated <strong>in</strong> [he same mathematics task as the mathematics con[cnt itcms <strong>in</strong> cluster MC.3 I-3.This task focuses on rccvcl<strong>in</strong>g alum<strong>in</strong>um cans. The two m.',thcmatics pr(x:ess questions <strong>in</strong>this cluster are built on the previous thrcc content questions. The pr~x:css qucstions rcquireexam<strong>in</strong>ees to solve a problem (i.e., estimate the numbcr of wcckcnds of collect<strong>in</strong>g cansrequired to reach a dollar g.oal, based on thc prcvious calcul.',tions) and rc~son about theestimate (i.e., expla<strong>in</strong> and justify the rc.',sonablcnc~ of the estimate). The dcpcndcnce ofthe tv,'o process items is probably a result of this estimation-explanation rclationship 6 andmay be magnified by their dependence on the three prcvious content items.The two process items <strong>in</strong> task 8 form the clustcr MP3 I-8. All content and processitems <strong>in</strong> this task. are based on given survey <strong>in</strong>formation on numbers of cxam<strong>in</strong>ees whowould buy various types of items <strong>in</strong> a school store. Howcvcr, only the process items arestatistically dependent. As with MP3 I-3 above, exam<strong>in</strong>ees are required to solve a problem(i.e., decide, based on the £.ivcn sur~'cy data, which items to elim<strong>in</strong>ate from the schoolstore) and reason (i.e., expla<strong>in</strong> and justify [he decision). And also as above, the<strong>in</strong>terdependence of these two items may be magnified by their likely dependence on thecontent items <strong>in</strong> the task.


As,~,s<strong>in</strong>ll <strong>Local</strong> l:~dmno=. 13DiscussionOur description of the contents of the seven item clusters with substantial lcx~!dependence po<strong>in</strong>ts to some likely causes of item <strong>in</strong>terdepcndcnce and suggest ways toavoid it. For example, test developers probably should expect compare-contrast items to belocally dependent because contrast<strong>in</strong>g two or more objcct~ probably requires attention totheir similarities. In addition, items that are <strong>in</strong>u:ntionally <strong>in</strong>terdependent, like themathematics content items here <strong>in</strong> which responses to items are built on each othersequentially, are likely to produce statistical local dependence. Also, reason<strong>in</strong>g itemswhich require exam<strong>in</strong>ees to expla<strong>in</strong> and justify previous responses are also likely to belocally dependcnc F<strong>in</strong>ally, any items that require attention to given <strong>in</strong>formation -- forexample, read<strong>in</strong>g passages and the data tables <strong>in</strong> the mathematics tasks <strong>in</strong> this stud,," -- arelikely to be locally dependent to some degree.Yen (<strong>in</strong> press), <strong>in</strong> a comprehensive exam<strong>in</strong>ation of local item dependence us<strong>in</strong>g datafrom the Comprehensive Tests of Basic Skills (CTBS/4) and the 1991 edition of theMSPAP, the same performance assessment used <strong>in</strong> this study, presents a compendium ofpossible causes of item dependence. She refers to these causes as external assistance or<strong>in</strong>terference; speededness; fatigue; practice; item or response format; passage dependence;item cha<strong>in</strong><strong>in</strong>g; explanation of previous answer; scor<strong>in</strong>g rubric or raters; and content.knowledge, and abilities. Several of the possible causes <strong>in</strong> her list appear consistent withthose hypothesized <strong>in</strong> this study (i.e., passage dependence, item cha<strong>in</strong><strong>in</strong>g, and explanationof previous answer). Other possible causes <strong>in</strong> her list (i.e., external assistance or<strong>in</strong>terference, speededness, fatigue, practice, scor<strong>in</strong>g rubric or raters) would not beidentified by exam<strong>in</strong><strong>in</strong>g items, as was done <strong>in</strong> this study. The other two causes <strong>in</strong> her list(item or response format; scor<strong>in</strong>g rubric or raters; and content, knowledge, and abilities)were not hypothesized <strong>in</strong> this study.


~D dD~L<strong>Assess<strong>in</strong>g</strong> <strong>Local</strong> <strong>Dependence</strong> - 17'Hanna, G.S., & Oaster, T.R. (1980). Studies of the ~riousnes.s of three threats topassage dependence. <strong>Educational</strong> and Psychological Measurement, CA, 583- 96.Huynh, H. (1993). On equivalence between a ixa'lial credit item and a set of <strong>in</strong>dependentRasch b<strong>in</strong>ary items. Manuscript submitted for publication.Masters, G.N. (1982). A Rasch model for partial credit scor<strong>in</strong>g. Ps',r.'chomerrika, 49, 269-272.Nicholas, L.E., & Brookshire, R.H. (1987). Error analysis and passage dependence oftest items from a standardized test of multiple-sentence read<strong>in</strong>g comprehension foraphasic and non-bra<strong>in</strong> damaged adults. Journal of Speech and Hear<strong>in</strong>g Disorders, 5.2,358-66.Scherich, H.H., & Hanna, G.S. (1977). Passage-dependence <strong>in</strong> the selection of read<strong>in</strong>gcomprehension test items. <strong>Educational</strong> and Psvchological Measurement, 37, 991-7.Shepard, L. A. (1991). Psychometricians' beliefs about learn<strong>in</strong>g. <strong>Educational</strong>Researcher, "0 2-16.Wa<strong>in</strong>er, H., & Kiely, G.L. (1987). Item clusters and computer adaptive test<strong>in</strong>g: A casefor testlets. Journal of <strong>Educational</strong> Measurement, 24 (3), 185-201.Wigg<strong>in</strong>s, G. (1989). A true test: Toward more authentic and equitable assessment. Ph_.ADelta Kappan, 7_!, 703-713.


". ...... Assm<strong>in</strong>s <strong>Local</strong> <strong>Dependence</strong> - ]8Wilson, Iv[. (1988). Detect<strong>in</strong>g and <strong>in</strong>terpret<strong>in</strong>g local item dcpendcncc us<strong>in</strong>g a family ofRash models. Ap~ied Ps)'cholggical Measurement, 12, 353-64.Wright, B.D. & Masters, G.N. (1982). Rat<strong>in</strong>g .~,.olc anal)'sis. Chicago: Thc Univcrsity ofChic.ago Mesa Press.Yen, W.M. (I984). Effects of local itcm dcpendcnce on thc fit and cquat<strong>in</strong>g pefformanccof the three-parameter logistic model. Applicd P~'chological Mc~,~urcmcnt, 8, 125-145.Yen, W.M. (<strong>in</strong> press). Scal<strong>in</strong>g performance asscssments: Slrategics for manag<strong>in</strong>g localitcm dcpcndcncc. Journal of <strong>Educational</strong>. Mc~urcmcn[.mac~teveXdepend.doc 7/25193


' - _ _ Aues,~<strong>in</strong>$ <strong>Local</strong> <strong>Dependence</strong>. 19Footnotes1 First <strong>in</strong>troduced <strong>in</strong> the context of computer-adaptive tcst<strong>in</strong>g, the term is used hcrcto refer to any group of items that is treated together a.s a s<strong>in</strong>gle unit or super ilcm.9" In the MSPAP, stimuli that elicit exam<strong>in</strong>ee responses arc referred to asassessment "activities" to dist<strong>in</strong>guish them from multiple-choice items. To faciiitatcdiscussion <strong>in</strong> this paper, such assessment activities will be rcfcrrcd to as "items."Similarly, "clusters" <strong>in</strong> this paper are referred to as "tasks" <strong>in</strong> the MSPAP.3 It should be noted that all items were used to group exam<strong>in</strong>ees.4 It may be noted that, for a sample size <strong>in</strong> the range of 1,600, the probability ofobta<strong>in</strong><strong>in</strong>g correlations of .08 and. l0 or highcr is less than .001 and therefore almostnegligible.5 This explanation is not dependent on the prediction because it is scaled as aprocess item while the prediction is scaled as a content item.6 The process items would probably also be statistically dependent on the contentitems <strong>in</strong> this and many other mathematics clusters. MSPAP mathematics content andmathematics process items are scaled separatcly to avoid this expected <strong>in</strong>terdependence.


elOk- ~Table 1Distribution of-ltems <strong>in</strong> the Seven Dam ScL~Variable RD36 RD51 RD81 MC31 MC83 MI:'31 MP81Total N of items 30 31 27 38 47 28 29N of clusters 4 4 4 9 9 9 9N of items <strong>in</strong>Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7Cluster 8Cluster 98 8 910 10 i08 8 24 5 83 3 2 64 5 3 3B3 5 2 34 5 3 35 7 4 13 5 2 45 5 3 44 5 3 27 7 6 3


~u 4,Table 2Raw Score Descriptive Statistics for Score GroupsGroup Statistic RD36 RD51 RDSI MC31 MC83 M l::r31 MPSIAll subjectsN of students 1473Range 0-50Mean 25.57S.D. 11.55Pet. of max. 51.14Skewness -0.331347 1410 1461 1613 1461 16130-49 0-50 0-48 0.59 0.41 0-3229.56 31.87 18.65 ! 9.78 9.05 9.149.24 9.84 9.65 10.27 7.,56 5.7460.33 63.74 38.85 33.53 22.07 28.56-0.61 -0.83 0.43 0.67 , 1.12 0.74Score group1 Range 0-14 0-21 0-23 0-9 0.1 ! 0.1 0-3N of Ss 298 287 285 347 334 189 3222 ...... Range 15-23 22-27 24-30 10-14 12-19 2-5 4-6N of Ss 319 251 278 258 355 374 3893 Range 24-29 28-32 31-35 15-19 20-27 6-8 7-9N of Ss 275 289 327 286 335 217 2994 Range 30-35 33-37 36-39 20-27 28-37 9-14 10-13N of Ss 277 306 236 307 328 294 2695 Range 36-50 38-49 40-50 28-48 38-59 15-41 14-32N of Ss 286 213 278 261 315 278 296Note. Pet. of max. = mean raw score divided by maximum possible raw score (see top of range <strong>in</strong>row 2 of upper panel above).


" OTable 3F.,'equency Distributions for BEtween-Cluster C00"elations Pooled Acrr~.~ Four Score GroupsCorrElation RD36 RD51 RD81 MC3 i MC83 MP31 MP81-0.14 1-0.12 1-0.11 1-0.10 1-0.08 2 1-0.0"/ ................................................................ I ............. 1 ............................. 5 ............. 4-0.06 1 3 I " 7 7-0.05 2 3 2 3 13 6-0.04 . . . . . . . 4 4 4 4 5 9 24-0.03 4 3 3 10 2 15 10-0,02 5 5 3 21 15 21 10-0.01 6 2 2 27 34 ~ 270.(30 ................................ 2 ............. 7 ............. 4 ........... 48 ........... 54 ........... 33 ........... 310.01 2 1 16 12 7 120.02 10 9 4 40.03 1 6 2 20.04 4 3 I 10.06 1 20.07 ................................................................................................................................ 1Nt:mber ofce,"rclationsAverage24 24 24 144 144 144 144-.017 -.020 -.029 -.006 -.003 -.019 -.020


ii I &Table 4Frequency Distributions for With<strong>in</strong>-Cluster Correlations Pooled Acrr~s Four Score GroupsCorrelation RD36 RD51 RD81 MC31 MC83 MF'31 MPSI-0.~ 1- ~ ° ~ H ~ ° ~ ° ° ~ ° ° ° ~ H * - ° ~ ° ~ Q ~ . . . . ~ . ~ o . ° ~ . ° . . . ~ . . ° ~ ° ° . ° . ° ~ . ~ . ° ~ ° ~ . ° . . ° . ~ . . .... ,, ......... , .......... . .............. ..--,.,. [-0.~ 1 1 !-0.02 ! ! 3 1-0.01 1 1 1 2 I0.00 ................................ 3 ............. 2 ............. 2 ............. 7 ............. 1 ........... "7 ............. 70.01 5 2 1 40.02 2 2 I0.03 2 2 2 60.04 3 2 10.05 1 1 I 30.06 40.07 ................................................................ 1 ............. 30.08 10.09 1 I0.I0 10.I1 10.12 1 !0.130.150.17 I0.180.200.22 I0.~0.270.30 10.32 30.33 10.34 10.454 3 42 " w14 2 42 1 42 34 l............. 4 ............. 1 ............. 13Il3 2111 111Number ofcorrelationsA v e rage16 16 16 36 36 34 310.025 0.050 0.076 0.036 0.070 0.051 0.025


a0.Table 6Average With<strong>in</strong>-Cluster Correlations <strong>in</strong> l=ach Score Group for the Mathematics Exam<strong>in</strong>ations,Sx maiioh- ...... Cluster Score Group Degree of2 3 4 5 dependenceMC31MC831 -0.O2 0.02 0.00 0.042 0.01 0.00 0.00 0.073 0.17 0.06 0.12 0.074 0.06 0.03 0.01 0.055 0.01 -0.01 0.08 0.066 0.07 0.09 0.03 0.057 0.03 -0.01 -0.03 0.038 -0.01 0.01 -0.04 -0.019 0.05 0.03 0.03 0.061 0.00 0.01 0.01 °0.042 0.04 0.03 0.06 0.013 0.04 0.08 0.08 0.074 0.17 0.22 0.23 O. 185 0.02 0.07 0.07 0.066 0.08 0.09 0.13 0.137 O. 10 0.05 0.06 0.058 0.15 0.07 0.06 0.039 0.02- 0.01 0.03 0.03SubstantialModerateSubstantialSubstantialMP311 * 0.07 0.08 0.08 Moderate2 0.01 0.00- 0.02 0.113 * -0.03 0.27 0.45 Substantial4 -0.02 0.00 0.09 O. 12 Moderate5 -0.01 -0.01 -0.01 O. I06 -0.03 0.03 -0.02 -0. 107 -0.03 0.03 -0.01 0.018 0.02 0.08 0.13 0.20 Substantial9 -0.04 0.01 -0.01 0.04MPSI1 0.01 0.01 0.01 0.032 °0.02 0.05 0.04 0.003 -0.01 0.04 0.01 0.074 0.08 0.04 0.02 0.066 * 0.03 0.138 O. 117 0.00 0.00 -0.01 0.058 -0.01 0.03 -0.08 -0.019 -0.03 0.04 0.03 0.05ModerateNote. Degree of dependence determ<strong>in</strong>ed by number of correlations greater than .08 and. 10. See textfor further details. "*" <strong>in</strong>dicates <strong>in</strong>sufficient data to compute correlation.


Table 5Average With<strong>in</strong>-Cluster Correlations <strong>in</strong> Each Score Group for the Read<strong>in</strong>g Exam<strong>in</strong>ationsExam<strong>in</strong>ation Cluster Score Group Degree of2 3 4 5 dependenceRD36 1 0.05 0.130 0.01 0.042 0.01 0.00 -0.01 0.013 0.04 0.01 0.03 0.034 O. 12 0.04 0.09 0.01 ModcrateRD51 1 0.02 0.01 -0.02 0.032 0.00 -0.01 0.01 0.033 0.04 0.02 0.05 0.044 0.33 0.22 0.11 0.34 SubstantialRD81 1 0.05 0.00 -0.02 0.012 0.07 0.02 -0.03 0.003 0.32 0.32 0.32 0.304 O. 10 0.03 0.02 0.03SubstantialNote. Degree of depcndcncc determ<strong>in</strong>ed by number of correlations greater than .08 and. 10. See textfor further details.


|," tTable 7Content Descriptions for Clusters, with Substantial <strong>Local</strong> <strong>Dependence</strong>F.~am<strong>in</strong>ation -- Cl~'ter Assessed Themeoutcome(s)-Responserequirement(s)1:li:)5I ................... 4Read toperform ataskFamilyrelationshipsFill <strong>in</strong> enrollmentform3Read for<strong>in</strong>formationand forliterawexperienceMcct<strong>in</strong>gchallengesShort open-endedwritten response:compare and contrastMC31 3Estimation.arithmeticoperations,numberrelationsRec3.'cl<strong>in</strong>galum<strong>in</strong>umcansArrange data <strong>in</strong> table;perform calculations;cstimateMC83 4Statistics,probability,patterns &relations,algebraCutt<strong>in</strong>g acakeDescribe pattern <strong>in</strong>table and writeequation; calculateprobabilityMC83 6Arithmeticoperations,geometD',measurement,patterns andrelations,algebraExplor<strong>in</strong>gpatternsShort open-endedwritten response, thentabulated; describerelationship and writean equation; drawhexagon to scaleMP'31 3Problemsolv<strong>in</strong>g,communication,reason<strong>in</strong>gRe~'cl<strong>in</strong>galum<strong>in</strong>umCallsEstimate and expla<strong>in</strong>estimate <strong>in</strong> wordsMP31 8Communication,reason<strong>in</strong>g,connectionsSchoolstoreDraw conclusion frombar graph; expla<strong>in</strong>conclusion~m~d©p zables/pm 7/25/93.............................................. ............

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!