09.12.2012 Views

I__. - International Military Testing Association

I__. - International Military Testing Association

I__. - International Military Testing Association

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

For each administration, the tests were generated with a total test and each domain<br />

mean difficulty index (p-value) of -60. The tests are essentially power tests with<br />

three hours allowed. The cutting score for each test is based on 62.5% of the<br />

number of test questions (a score of 75) or the group mean, whichever is higher. The<br />

cutting score was 75 for each of the tests for each administration. The test items<br />

were selected in accordance with the following parameters, p-values between -25<br />

and .90 and biserials between .15 and .99.<br />

The tests are administered twice yearly, in the spring and fall, to enlisted Navy<br />

personnel in pay grades E-5 through E-9, with a minimum of nine months experience<br />

in an IMA activity. BM-0110 was developed in the summer of 1987 and placed into<br />

operational use in the fall of 1987. EM-4613 was developed in the fall of 1987 and<br />

placed into operational use in the spring of 1988. All tests in this program *were<br />

developed by subject-matter experts from each trade under the tutelage of a testing<br />

specialist. All of the tests are computer generated by an automated test processing<br />

system (TPS) that includes item banking, scoring, and analysis and updating of all<br />

test and item data.<br />

Procedure<br />

Three different administrations -- Spring 1989 (l-89), Fall 1989 (2-89), and Spring<br />

1990 (l-90) --were used for this study for both the BM-0110 and EM-4613 tests. Both<br />

the l-89 and I-90 tests were constructed under normal procedures, i.e., with items<br />

grouped by domain and presented from easiest to most difficult within each<br />

domain. For the 2-89 administrations, the test items were randomized without<br />

regard for content area or difficulty level.<br />

The items for each administration were generated by the TPS from the total item<br />

pool available for each test and therefore the items were not identical across<br />

administrations. Table II presents the number of items common to each pair of test<br />

administrations.<br />

Table II<br />

Common Items Between Administrations<br />

1-89 - 2-89 1-89 - l-90 2-89 - l-90<br />

BM-0110<br />

I<br />

71 77<br />

I<br />

89<br />

EM-461 3 66 67<br />

I<br />

67<br />

Under ideal conditions, the research design would have used the same items for each<br />

administration and both forms of the test would have been administered at the<br />

same time. However, due to a number of factors including fairly small N’s and<br />

numerous repeat candidates from one test administration to another, the ideal<br />

design was not possible. The test populations do tend to be quite stable from one<br />

administration to another, however, in terms of trade experience and numbers from<br />

each paygrade.<br />

The test results and item statistics from each administration for each test were<br />

compared with the other administrations from four different perspectives -- total<br />

test results, part test scores, common item comparisons, and individual item<br />

statistics. As previously stated, the objectives were to determine if randomizing the<br />

items would have any effect on total test performance, part (domain) test<br />

performance, and individual item statistics. A variety of statistical procedures were<br />

employed to analyze the data including Z-tests, two-tailed t-tests, and ANOVAs.<br />

275

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!