09.12.2012 Views

I__. - International Military Testing Association

I__. - International Military Testing Association

I__. - International Military Testing Association

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

might influence task performance. TAs were trained on how to<br />

perform each task they were to evaluate and practiced them under<br />

the supervision of active duty subject matter experts. This<br />

included role playing and deliberate errors on the part of the<br />

V'examineelt to check TA consistency and develop standardized<br />

scoring of irregular responses. Once test administration was<br />

begun, there were periodic review of steps with low interrater<br />

reliabilities, with retraining where necessary.<br />

Rotation of TAs Across Tasks<br />

TAs were trained in multiple tasks to allow them to rotate<br />

among test stations. This lessened the effect of boredom,<br />

provided a cross check on the standardization of scoring in each<br />

task, and reduced the impact of TA differences on scoring.<br />

Shadow Scorinq<br />

Perhaps the most important quality control procedure, shadow<br />

scoring involved independent evaluation of an individual's task<br />

performance by two TAs simultaneously. Shadow scorers were used<br />

to monitor TA performance and test reliability, and were<br />

systematically scheduled to capture interactions among testing<br />

order and individual TA characteristics.<br />

On-Site Data Entrv Trend Analvsis<br />

A Hands-On Score Entry System (HOSES) was developed to enter,<br />

verify, and report analyses-of collected data. Daily on-site<br />

data entry enhanced completeness of data and allowed for early<br />

identification of problems with the tests, TA consistency, and<br />

score drift over time. HOSES generated three reports which were<br />

used by site hands-on managers to improve scoring reliability.<br />

1. Data Entry Report. All data were entered twice. This report<br />

verified that there were no discrepancies between the two<br />

entries. It also reported any missing data so the information<br />

could be tracked down on the day of original testing. This<br />

greatly reduced the amount of missing data.<br />

2. The Detailed Discrepancy Report listed al'1 steps where<br />

primary and shadow scorers disagreed. It also gave percent<br />

disagreement for each task, and overall daily total by TA.<br />

3. The Summary Report presented cumulative historical summaries<br />

by TA and task. TA summaries showed leniency and reliability<br />

information for each task administered by the TA. Leniency was<br />

measured as a deviation from the mean percentage of "GOATS for<br />

all TAs on each task. Reliability indicated disagreement with<br />

all other TA's on each task. These were valuable in identifying<br />

individual TA problems. Since this report could be broken out by<br />

time, it also provided trend information. Task summaries showed<br />

percent "Go" and disagreement for each step. This helped focus<br />

on test effect problems, i.e. those common across all TAs.<br />

537

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!