Technical Manual - Renaissance Learning

More documents

Recommendations

Info

$Accelerated Math for Interventionâ¢ - Renaissance Learning$

Reliability and Measurement PrecisionReliability is a measure of the degree to which test scores are consistent acrossrepeated administrations of the same or similar tests to the same group orpopulation. To the extent that a test is reliable, its scores are free from errors ofmeasurement. In educational assessment, however, some degree ofmeasurement error is inevitable. One reason for this is that a student’sperformance may vary from one occasion to another. Another reason is thatvariation in the content of the test from one occasion to another may cause scoresto vary.In a computer-adaptive test such as STAR Early Literacy Enterprise, content variesfrom one administration to another, and also varies according to the level of eachstudent’s performance. Another feature of computer-adaptive tests based on IRT(Item Response Theory) is that the degree of measurement error can be expressedfor each student’s test individually.STAR Early Literacy Enterprise provides two ways to evaluate the reliability of itsscores: reliability coefficients, which indicate the overall precision of a set of testscores; and standard errors of measurement, which provide an index of the degreeof error in an individual test score. A reliability coefficient is a summary statisticthat reflects the average amount of measurement precision in a specific examineegroup or population as a whole. In STAR Early Literacy Enterprise, the conditionalstandard error of measurement (CSEM) is an estimate of the unreliability of eachindividual test score. A reliability coefficient is a single value that applies to theoverall test; in contrast, the magnitude of the CSEM may vary substantially fromone person’s test score to another.This section presents reliability coefficients of three different kinds: genericreliability, split-half, and test-retest, followed by statistics on the standard error ofmeasurement of STAR Early Literacy Enterprise test scores. Both generic reliabilityand split-half reliability are estimates of the internal consistency reliability of atest.Generic ReliabilityTest reliability is generally defined as the proportion of test score variance that isattributable to true variation in the trait the test measures. This can be expressedanalytically as:reliability = 1 – σ2 errorσ 2 totalSTAR Early LiteracyTechnical Manual44
Reliability and Measurement PrecisionSplit-Half Reliabilitywhere σ 2 error is the variance of the errors of measurement, and σ2 total is thevariance of the test scores. In STAR Early Literacy the variance of the test scores iseasily calculated from Scaled Score data. The variance of the errors ofmeasurement may be estimated from the conditional standard error ofmeasurement (CSEM) statistics that accompany each of the IRT-based test scores,including the Scaled Scores, as depicted below.σ 2 error=1nΣ nCSEM 2iwhere the summation is over the squared values of the reported CSEM for studentsi = 1 to n. In each STAR Early Literacy 3.x and higher test, CSEM is calculated along withthe IRT ability estimate and Scaled Score. Squaring and summing the CSEM valuesyields an estimate of total squared error; dividing by the number of observations yieldsan estimate of mean squared error, which in this case is tantamount to error variance.“Generic” reliability is then estimated by calculating the ratio of error variance toScaled Score variance, and subtracting that ratio from 1.Using this technique with the STAR Early Literacy 2.0 norming data resulted in thegeneric reliability estimates shown in the rightmost column of Table 13 onpage 49. Because this method is not susceptible to error variance introduced byrepeated testing, multiple occasions, and alternate forms, the resulting estimatesof reliability are generally higher than the more conservative alternate formsreliability coefficients. These generic reliability coefficients are, therefore,plausible upper bound estimates of the internal consistency reliability of the STAREarly Literacy versions prior to STAR Early Literacy Enterprise.While generic reliability does provide a plausible estimate of measurementprecision, it is a theoretical estimate, as opposed to traditional reliabilitycoefficients, which are more firmly based on item response data. Traditionalinternal consistency reliability coefficients such as Cronbach’s alpha andKuder-Richardson Formula 20 (KR-20) cannot be calculated for adaptive tests.However, another estimate of internal consistency reliability can be calculatedusing the split-half method. This is discussed in the next section.Split-Half ReliabilityIn classical test theory, before the advent of digital computers automated thecalculation of internal consistency reliability measures such as Cronbach’s alpha,approximations such as the split-half method were sometimes used. A split-halfreliability coefficient is calculated in three steps. First, the test is divided into twohalves, and scores are calculated for each half. Second, the correlation betweenthe two resulting sets of scores is calculated; this correlation is an estimate of thereliability of a half-length test. Third, the resulting reliability value is adjusted,STAR Early LiteracyTechnical Manual45
Page 1 and 2: STAR Early LiteracyTechnical Manual
Page 3: ContentsIntroduction . . . . . . .
Page 8 and 9: STAR Early LiteracyTechnical Manual
Page 10 and 11: IntroductionOverviewEnterprise inco
Page 12 and 13: IntroductionDesign of STAR Early Li
Page 14 and 15: IntroductionDesign of STAR Early Li
Page 16 and 17: IntroductionTest SecurityRepeating
Page 18 and 19: IntroductionPsychometric Characteri
Page 20 and 21: IntroductionPsychometric Characteri
Page 22 and 23: IntroductionSTAR Early Literacy Ent
Page 24 and 25: Content and Item DevelopmentThe STA
Page 32 and 33: Content and Item DevelopmentItem De
Page 44 and 45: Core Progress Learning Progression
Page 46 and 47: Psychometric Research Supporting ST
Page 54 and 55: Reliability and Measurement Precisi
Page 64 and 65: ValidityRelationship of STAR Early
Page 74 and 75: ValidityPost-Publication Study Data
Page 86 and 87: ValidityConcurrent Validity of Esti
Page 88 and 89: ValiditySummary of STAR Early Liter
Page 90 and 91: ValidityValidation Research Study P
Page 92 and 93: ValidityValidation Research Study P
Page 94 and 95: ValiditySTAR Early Literacy Enterpr
Page 100 and 101: ValidityThe Validity of Early Numer
Page 102 and 103:
ValidityThe Validity of Early Numer
Page 104 and 105:
ValidityRelationship of STAR Early
Page 106 and 107:
ValidityRelationship of STAR Early
Page 108 and 109:
NormingSTAR Early Literacy Enterpri
Page 110 and 111:
NormingDevelopment of Norms for STA
Page 112 and 113:
NormingData AnalysisTable 49:Gender
Page 114 and 115:
NormingGrowth Normswere smoothed us
Page 116 and 117:
Score DistributionsScaled Scores: S
Page 118 and 119:
Score DefinitionsFor its internal c
Page 120 and 121:
Score DefinitionsEstimated Oral Rea
Page 122 and 123:
STAR Early Literacy Enterprise in t
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
AppendixTable 61:STAR Early Literac
Page 162 and 163:
AppendixTable 61:STAR Early Literac
Page 164 and 165:
ReferencesAdams, M. J. (1990). Begi
Page 166 and 167:
IndexAAbsolute growth, 134Access le
Page 168 and 169:
IndexScaled Score norms, 100Scaled
Page 170:
IndexTest item design guidelinesans
show all

Technical Manual - Renaissance Learning

Create successful ePaper yourself

Delete template?

Save as template?