13.07.2015 Views

Technical Manual - Renaissance Learning

Technical Manual - Renaissance Learning

Technical Manual - Renaissance Learning

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Reliability and Measurement PrecisionTest-Retest Reliabilityusing the Spearman-Brown formula, 4 to estimate the reliability of the full-lengthtest.In internal simulation studies, the split-half method provided accurate estimatesof the internal consistency reliability of adaptive tests, and so it has been used toprovide estimates of STAR Early Literacy reliability. These split-half reliabilitycoefficients are independent of the generic reliability approach discussed belowand more firmly grounded in the item response data.The third column of Table 13 on page 49 contains split-half reliability estimates forSTAR Early Literacy, calculated from the Validation Study data. Split-half scoreswere based on the first 24 items of the test; scores based on the odd- and theeven-numbered items were calculated. The correlations between the two sets ofscores were corrected to a length of 25 items, yielding the split-half reliabilityestimates displayed in Table 13 on page 49.Test-Retest ReliabilityAnother method of evaluating the reliability of a test is to administer the test twiceto the same examinees. Next, a reliability coefficient is obtained by calculating thecorrelation between the two sets of test scores. This is called a retest reliabilitycoefficient. This approach was used for STAR Early Literacy in both the CalibrationStudy and the Validation Study. In the Calibration Study, the participating schoolswere asked to administer two forms of the calibration tests, each on a differentday, to a small fraction of the overall sample. This resulted in a test-retestreliability subsample of about 14,000 students who took different forms of the40-item calibration test. In the Validation Study, the schools were asked toadminister computer-adaptive STAR Early Literacy tests twice to every student.Over 90 percent of the Validation Study sample took two such tests over aninterval of several days. From the two studies, we have two different sets ofestimates of STAR Early Literacy retest reliability—one derived from twoadministrations of the 40-item non-adaptive Calibration Study tests, and onederived from two administrations of the 25-item adaptive Validation Study tests.The retest reliability data from the Calibration Study provide an approximatemeasure of the reliability of tests constructed from items of the kind developed foruse in the STAR Early Literacy item bank.The retest reliability data from the Validation Study provide a more definitivemeasure of STAR Early Literacy reliability, because the tests were adaptively4. See Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading, MA:Addison-Wesley, pp. 112–113.STAR Early Literacy<strong>Technical</strong> <strong>Manual</strong>46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!