05.01.2013 Views

April 2012 Volume 15 Number 2 - Educational Technology & Society

April 2012 Volume 15 Number 2 - Educational Technology & Society

April 2012 Volume 15 Number 2 - Educational Technology & Society

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figures 9 to 18 present the prediction accuracy and the use of test items of different adaptive testing algorithms with<br />

different training sample (10, 50, 100, and 200). The scale of the horizontal axis of IRS (thresholds 0.02, 0.04, . . .<br />

0.98) is different from those of Diagnosys and OT (thresholds 0.01, 0.02, . . . 0.50), so it is not displayed in the same<br />

graph. The horizontal axis represents the threshold, � , and the vertical axis represents the prediction accuracy, PA �<br />

(Figures 9, 11, 13, <strong>15</strong>, and 17) and the use of test items, UI � (Figures 10, 12, 14, 16 and 18). For example, in Figures<br />

9 and 10, if the knowledge structure of Diagnosys with � � 0.<br />

08 was applied, then ( PA 0.<br />

08,<br />

UI 0.<br />

08 ) � ( 0.<br />

821,<br />

1.<br />

20),<br />

under the training samples, (TS) = 10. In Figures 17 and 18, if the knowledge structure of IRS with � � 0.<br />

58 was<br />

applied, then ( PA 0.<br />

58,<br />

UI 0.<br />

58 ) � ( 0.<br />

896,<br />

6.<br />

35),<br />

( 0.<br />

952,<br />

14.<br />

33),<br />

( 0.<br />

970,<br />

19.<br />

12),<br />

( 0.<br />

984,<br />

23.<br />

77)<br />

under the training samples<br />

(TS) = 10, 50, 100, and 200, respectively. The prediction accuracy and the use of test items of the algorithm based on<br />

domain experts’ structure are 0.917 and 18, respectively. Since constructing the domain experts’ structure does not<br />

need thresholds, it does not vary by thresholds.<br />

Those figures show that:<br />

1. Overall, the prediction accuracy and use of test items of Diagnosys and OT increase as the threshold decreases.<br />

The prediction accuracy and use of test items of IRS increase as the threshold increases.<br />

2. Compared with the results from domain expert’s structure ( ( PA , UI)<br />

� ( 0.<br />

917,<br />

18)<br />

), IRS<br />

3.<br />

( ( PA 0.<br />

32,<br />

UI0.<br />

32)<br />

� ( 0.<br />

923,<br />

8.<br />

58)<br />

), and OT ( ( PA 0.<br />

08,<br />

UI0.<br />

08)<br />

� ( 0.<br />

922,<br />

8.<br />

37)<br />

) are able to achieve higher prediction<br />

accuracies with less use of test items.<br />

For three test adaptive algorithms, Diagnosys, OT and IRS, the Diagnosys requires more training samples and<br />

higher use of test items to achieve the same or almost the same prediction accuracy in comparison to OT and<br />

IRS.<br />

4. The performance of OT is less sensitive to the training sample size than that of IRS and Diagnosys.<br />

For reducing the paper length without loss the generality, only three cases (case 1, case 2, and case 3) of means and<br />

standard deviations of prediction accuracies and their corresponding use of test items under the training sample size,<br />

200 are displayed in Table 5. Case 1, case 2, and case 3 mean the prediction accuracy, 0.90, 0.92, and 0.94,<br />

respectively. The reason for choosing these cases in range of 0.90 to 0.94 is that this range is around the prediction<br />

accuracy of domain experts’ structure and 0.94 is the maximum prediction accuracy that Diagnosys can achieve.<br />

For example, in Diagnosys, when the average of prediction accuracy and its corresponding use of test items are 0.90<br />

and 25.68, respectively, the standard deviations are 0.023 and 5.67, respectively. The range of standard deviations for<br />

prediction accuracy is 0.004 to 0.023, indicating that this simulation model is reliable. The lowest standard<br />

deviations of the prediction accuracy and the use of test items are all for the OT, so the OT has better performance on<br />

stability.<br />

According to the results of the experiment, Diagnosys requires a large sample size and a larger use of test items to<br />

obtain better prediction accuracy, so it is not suggested for use. OT can obtain better prediction accuracy with less<br />

use of test items and training samples; hence OT is implemented into the KSAT system.<br />

Prediction Accuracy<br />

1<br />

0.98<br />

0.96<br />

0.94<br />

0.92<br />

0.9<br />

0.88<br />

0.86<br />

0.84<br />

0.82<br />

0.8<br />

0.49<br />

0.45<br />

0.41<br />

0.37<br />

0.33<br />

0.29<br />

0.25<br />

0.21<br />

Threshold<br />

0.17<br />

0.13<br />

0.09<br />

0.05<br />

0.01<br />

Diagnosys<br />

Figure 9. The prediction accuracy of Diagnosys, OT, and expert for training samples 10<br />

OT<br />

Expert<br />

81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!