09.12.2012 Views

I__. - International Military Testing Association

I__. - International Military Testing Association

I__. - International Military Testing Association

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

classification tables and CA's were also computed for each<br />

set of weights. The CA's using weights from the cross-validation<br />

samples ranged from 70% (908X0) to 92% (121X0); and for the<br />

validation samples, from 68% (231X0) to 92% (121X0). Of the<br />

4,104 tasks classified within the 26 AFSs, only four "D" tasks<br />

were classified as "A" tasks and only four @'A" tasks were<br />

classified as @rD@t tasks. Although it is desirable to have zero A<br />

to D or D to A misclassifications (because the test development<br />

team is being advised incorrectly to write or not write an item),<br />

infrequent misclassifications of this type should not adversely<br />

affect the construction of a valid SKT. The team can rectify<br />

these discrepancies with the permission of the group facilitator. *:<br />

CA's computed for the combined data ranged from 71% (908X0) to<br />

90% (112X0). In general, the predictive accuracies using<br />

combined samples were higher than those for the validation<br />

samples referred to earlier, but all differences were small (less<br />

than 6%). Only two @ID" tasks were classified as rcA't tasks and<br />

two *'Aft tasks were classified as "D" tasks. Squared and<br />

interactive predictor terms were added to the model for each AFS<br />

in an attempt to increase classification accuracy, but only small<br />

increases in accuracy were observed. In fact, for some AFSs,<br />

classification accuracy decreased.<br />

What is adequate classification accuracy in the context of<br />

generating an ATO? The table having the lowest CA value (68%) is<br />

shown in Figure 2. It was generated by applying the 112X0<br />

weights from the validation sample to the cross-validation<br />

sample. The impact of the misclassifications in the table is<br />

probably not too severe when it is recalled that the AT0 is a<br />

guide for SMEs to use in developing an SXT, and they are free to<br />

select tasks from any of the importance categories within the<br />

restrictions delineated above.<br />

Pfedl,otd CtuJfloation<br />

Flgwo 2. Claultlcatlon Tsbk wlth Lorn.1 a Valu.<br />

The r's for the combined data for each AFS ranged from .51<br />

(908X0) to .91 (112X0). A hierarchical clustering of the<br />

regression equations for ail 26 AFSs showed small decreases in r<br />

throughout most of the clustering process. For example, the<br />

overall r dropped from .84 at the 26-group stage (i.e., a<br />

separate regression equation for each of the 26 AFSs) to .79 at<br />

the 5-group stage. Thereafter, the drops in r to the l-group<br />

Stage were .02, .02, .04, and .12, respectively.<br />

The gradual drop in r's until the clustering at the l-group<br />

stage makes identification of an "optimal clustering stage"<br />

difficult. Therefore, classification was also examined at<br />

various stages. The CA's of equations at the l-group stage<br />

314

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!