09.12.2012 Views

I__. - International Military Testing Association

I__. - International Military Testing Association

I__. - International Military Testing Association

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Air Force enlisted specialties. The first objective was to determine the<br />

extent to which SME ratings on the CVR forms impact subsequent identification<br />

of items as acceptable or unacceptable for reuse. The second is to determine<br />

how SMEs and project psychologists perceive the value and usefulness of the<br />

forms. Ninety-four SMEs, representing 25 AFSs, assigned to USAFOMS for SKT<br />

rewrite duties were asked to rate test items from their respective E-5 and<br />

E-6/7 grade-level SKTs using the CVR forms. USAFOMS test development proce-<br />

dures requires the completion of this step prior to the SMEs’ designation of<br />

an item as either acceptable for continued use on subsequent SKTs or as unac-<br />

ceptable for reuse. Once again, these test items were rated using Lawshe’s<br />

3-point scale. A rating of 2 was given to items whose content was essential.<br />

A rating of 1 was assigned to those items whose content was useful, but not<br />

essential, and a rating of 0 was assigned to those items whose content was<br />

not necessary for successful performance in the AFS. In all, 19,700 ra~tings .<br />

were obtained from 94 raters for 2 SKT levels (E-5 and E-617).<br />

Results<br />

Intraclass correlations for each of the 25 AFSs were computed to determine<br />

the interrater reliabilities for the group of SMEs from each specialty. All<br />

but two of the calculated values had p < .05. (The higher reliability values<br />

obtained seemed to be associated with the more technologically specialized<br />

fields where there is little room for variance of procedures across the<br />

Air Force, thus leading to more agreement among SMEs on items which test essential<br />

knowledge. Lower values seemed to be associated with broader specialties<br />

where there is more variance in day-to-day jobs performed and hence,<br />

less agreement and lower values of reliability.)<br />

The average CVR for items chosen as acceptable and for those designated for<br />

deactivation were also calculated for each test project. The mean CVR value<br />

for all deactivated items was 1.28 and the mean CVR value for all acceptable<br />

items was 1.43. These results conformed to our expectations that on the<br />

whole, items selected for deactivation would have lower content validity ratings<br />

than those chosen as acceptable. The average CVR value for all 19,700<br />

ratings obtained was 1.40. This average reflects the fact that on the whole,<br />

Air Force SKTs are viewed as being relatively high in content validity.<br />

To determine the actual impact, if any, of the content validity ratings on<br />

the subsequent identification of an item for reuse, a chi-square test of statistical<br />

significance was computed. The null hypothesis (H 1 for this test<br />

states that there is no difference between the proportion o? items selected<br />

as acceptable and unacceptable in each rating category. The alternative hypothesis<br />

(H 1 is that the distribution of items in each rating category differs<br />

from t%e hypothesized one. The results, as shown in Table 1, indicate<br />

significant differences between expected and observed values for acceptable<br />

and deactivated items in each rating category, with the largest differences<br />

occurring between ratings of 2 and 0. As shown, 203 more ratings of 0 were<br />

observed for deactivated items than was expected, while 234 more ratings of 2<br />

were observed for acceptable items than was expected. A chi-square value of<br />

199.7 (df=2) was obtained, indicating significance at the .Ol level. On<br />

this basis, the null hypothesis was rejected, indicating a disproportional<br />

representation of items selected as acceptable and unacceptable in each rating<br />

category. This shows that item content validity did impact subsequent<br />

identification of item acceptability.<br />

A point-biserial correlation coefficient relating identification of an item<br />

as either acceptable or deactivated with the item's average content validity<br />

237

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!