I__. - International Military Testing Association

32nd ANNUAL CONFERENCE 

OF THE 

MILITARY TESTING ASSOCIATION 

Orange Beach, Alabama 

5 - 9 November 1990 

Proceedings 

Hosted by the 

Naval Education anand Training 

Program Management Support Activity 

.

32nd A N N U A L C O N F E R E N C E O F T H E 

M I L I T A R Y T E S T I N G A S S O C I A T I O N 

H o s t e d b y the 

N a v a l E d u c a t i o n a n d T r a i n i n g P r o g r a m 

M a n a g e m e n t S u p p o r t A c t i v i t y 

O r a n g e Beac!h, A l a b a m a 

5 - 9 N o v e m b e r 1 9 9 0 

. 

.

32nd Annual Conferenae of the Military Testing Association 

Chairperson ,J 

Conference Coordinator 

Hosted by the 

Naval Education and Training Program 

Management Support Activity 

Orange Beach, Alabama 

5 9 November 1990 

Conference Committee 

Chair, Program and Publications Subcommittee 

Chair, Facilities Subcommittee 

Chair, Registration Subcommittee 

Chair, Social Subcommittee 

Chair, Public Relations Subcommittee 

Chair, Memento Subcommittee 

Chair, Finance Subcommittee 

Site Coordinator 

i 

Commander Mary A. Adams 

Mr. Robert King 

Mr, Donald Lupone 

Mr. William Adams 

Mr. Richard Lopez 

Mr. Robert Pallme 

Mr, David Slover 

Mr. Dean McCallum 

LT Gary L. Waters 

Dr. Charles Hesse

Acknowledgement8 

The 8UCe86 O f the MTA Conference can be attributed to the 

dedication of individual8 who worked many hours. The MTA 

Conference Committee members express their appreciation to the 

following people for their contribution8 to the Conference: 

Pailities 

Mr. William Adams (Chair) 

DMC Charles Alvare 

Ms. Sharon Benton 

CMCS Thomas A. Browning : 

LICM Robert Carr 

Mr. Dale Eckard 

Mr. Al Farr 

PHC Carl Hinkle 

MS, Jackie Hufman 

CECS Billy F. Johnson 

CECS John A. Lanclos 

Ms. Fay Landrum 

Mr. Frank Strayer 

Finance 

LT Gary L. Waters 

Memento 

Mr. Dean McCalum (Chair) 

Ms. Catherine Warfield 

Presentation Facilitator8 

Mr. Gerald Murphy (Chair) 

Ms. Sharon Benton 

FTCS Robert Bloomquist 

AWCS David M. Devarney 

ETCS R. Elliott 

OTACS Robert H, Howe 

RPC Jeffery L. Krlngle 

GSCS Robert Kuzirlan 

RPC Frank Logan 

CWO Camilo D. Lomibao 

OTAC Mark A. Lowe 

JOC George Markfelder 

VNC Gail M. Ravy 

MRC Kenneth Shaw 

AKCS William Sims 

IV 

Proaram and Publications 

Mr. Donald Lupone (Chair) . -. 

Mr. W. N. Presley Jr. 

Ms. Wilma Scofield 

Ms. Joanne Vendetti 

Publa Relations 

Mr. David Slover (Chair) 

ETC Steve Anderson 

DMC Charles Alvare 

SMC Vie Barera 

Mr. Dave Bodin 

Maxwell Buchanan 

Mr. Norman Champagne 

Code 05 Department 

ATCS Joel Garner 

Mr. Frank Harwood 

MUCM David Johnson 

Mr. Don Phillips 

yN3 Mark Shinkle 

AXCS Gary Spoon 

Mr. Donald Wiggins 

Mr. Emery Williams 

Ms. Mary Wing 

Resistration 

Mr. Richard Lopez (Chair) 

Mr. Earl F. Roe 

Mr. Michael Abney 

ISC P. Buchan 

STGCS P. D, Craig 

Mr. Ronald Dougherty 

Ms. Brenda Frederick 

Ms. Susan Godwln 

Mr. Larry Goldlng 

STGC J. M. Griffin 

Ms. Debbie Halberg 

RMC C. I. Hannah 

FTCS R. Langley

Registration (Continued) 

RMC M, McKay 

AWC M. A. Morris 

OTMC W. E. Parsons 

AWC T. T. Pearson 

MS, Jane Reich 

Ms. Laura Roberts 

Ms. Anne Sayers 

ISCM T. Schroeder 

STGC E. C. Smith 

AWCM J. R. Thompson 

Ms. Marjorie Warsing 

STSC J. C. Whitaker 

Ms. Jo Ellen Wolf 

OTMCS R. A, Wood 

FTCM M. Young 

V 

Site Coordinator 

Dr. Charles Hesse 

Social 

Mr, Robert Palme (Chair) 

GMCS Ricardo Andres 

Ms. Ginger Andrews 

Ms. Nora Matos 

Mr. Joseph Neidlg 

Mr. Charles Warner

FORWORD 

These Proceeding6 of the 32nd Anual Conference of the Military 

Testing ASOCiatiOn document the pr666ntatiOnS given at paper and 

panel 6e6iOn6 during the conference. The papers represent a 

broad range of topics by contributors from the military, 

industrial, and educational comunltles, both foreign and 

domestic. It should be noted that the papers reflect the 

opinion6 of the author6 and do not necessarily reflect the 

official policy of any Institution, government, or armed service.. . 

V i

TABLE OF CONTENTS 

1990 CONFERENCE COMMITTEE .................................. 

ACKNOWLEDGEMENTS ........................................... 

FOREWORD ................................................... 

TABLE OF CONTENTS .......................................... 

OPENING SESSION............................................ 

PAPER PRESENTATIONS - MANPOWER 

101. 

102. 

103. 

104. 

105. 

106. 

107. 

108 1 

TRUSCOTT, S., The' Canadian Reserves: Current and 

Future Manpower..................,...,~.............. 

MARTELL, LTC Kenneth A. and WINN, LTC Dennis H.. 

Accession Dynamics................................... 

Not Presented. 

REEVES, Liz(N) D. T., Ethnic Participation in the 

Canadian Forces: Demographic Trends....... . . . . . . . . . . 

ELIG, Timothy.W., 1990 Army Career Satisfaction 

Survey............................................... 

DEMPSEY, J. R., HARRIS, D. A., and WATERS, A. K., The 

Use of Artificial Neural Networks in Military 

Manpower Modeling.....,.....,..,...,~................ 

EDWARDS, Jack E., ROSENFELD, Paul, and THOMAS, 

Patricia J., Hispanics in Navy's Blue-Collar Civilian 

Workforce: A Pilot Study............................ 


PAPER PRESENTATIONS - OCCUPATIONAL ANALYSIS 

201. WALKER, C. L., Descriptors of Job Specialization 

Based on Job Knowledge Tests......................... 

202. RHEINSTEIN, Julie, O'LEARY, Brian S., and MCCAULEY, 

Jr., Donald E., Addressing the Issues of 

"Quantitative Overkill" in Job Analysis. . . . . . . . . . . . . . 

203. O'LEARY, Brian S., RHEINSTEIN, Julie, and MCCAULEY, 

Jr., Donald E., Developing Job Families Using 

Generalized Work Behaviors................ . . . . ,.. . .., 

vii 

iii 

iV 

vi 

Vii 

xvi 

i 

6 

12 

19 

25 

3 1 

37 

51 

SR

204. 

205. 

206. 

207. 

208. 

209. 

210. 

211. 

212. 

213. 

214, 

O'LEARY, Brian S., RHEINSTEIN, Julie, and MCCAULEY 

Jr., Donald E., A Comparison of Holistic and 

Traditional Job-Analytic Methods................... . . 

HUDSPETH, Dr. DeLayne It., FAYFICH, Paul R., and 

PRICE, John S., Squadron Leader, Automating the 

Administration of USAF Occupational Surveys.. . . . . . . . . 

MENCHACA, Capt Jose, Jr., GUTHALS, 2Lt Jody A., 

OLIVIER, Lou, and PFEIFFER, Glenda, MPT Enhancements 

to the Occupational Research Data Bank.. . . . . . . . . . . . . . 

PHALEN, William J., MITCHELL, Jimmy L., and HAND, 

Darryl K., ASCII CODAP: Progress Report on 

Applications of Advanced Analysis Software........... 

KLEIN, Paul, Professional Success of Former Officers 

in Civilian Occupations...........,.................. 

FINLEY, Dorothy L. and YORK, William J., Jr., A 

Military Occupational Specialty (MOS) Research and 

Development Program: Goals and Status............... 

YORK, William J., Jr. and FINLEY, Dorothy L., 

Application of the Job Ability Assessment System to 

Communication Systems Operators...,.. ,,.......,...... 

ARNDT, K., Preferences for Military Assignments in 

German Conscripts...,...........,..,,,,,..,.........; 

SCHAMBACH, S. B., Aptitude-Oriented Replacement of 

Conscript Manpower in the German Bundeswehr.... . . . . . . 

VAUGHAN, David S., MITCHELL, Jimmy L., KNIGHT, J. R., 

BENNETT, Winston R., and BUCKENMYER, David V., 

Developlng a Training Time and Proficiency Model for 

Estimating Air Force Specialty Training Requirements 

of New Weapon Systems...................,.,,,,,., ,,.,. 


PAPER PRESENTATIONS - TRAINING 

301. MCCORMICK, D. L. and JONES, P. L., Evaluating 

Training Program Modifications....................... 

302. Not Presented. 


304. DIEHL, Grover E., The Effect of Reading Difficulty on 

Correspondence Course Performance. ,.,,....,. . . . . . . . . . 

V i i i 

64 

70 

76 -' 

82 

88 

94 

99 

104 

110 

116 

122 

128

305. 

306. 

307. 

308. 

309 * 

310. 

311. 

312. 

313. 

314. 

315. 

316. 

317. 

318. 

319. 

PARCHMAN, Steve W., ELLIS, John A., and MONTAGUE, 

William E., Navy Basic Electricity Theory Training: 

Past, Present, and Future............. . . . . . . . . . . . . . . . 



STEPHENSON, S. D. and STEPHENSON, J. A., Using Event 

History Techniques to Analyze Task Perishability: A 

Simulation, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .., . . . . . 

STEPHENSON, S. D., A First Look at the Effect of 

Instructor Behavior in a Computer-Based Training 

Envlronment..........,,.......,,,,,..,..,............ 

BESSEMER, D. W., Transfer of Training with Networked 

Simulators.............,..,,.,.,.,....,. ,.....,...,.. 


DART, 1Lt Todd S., GUTHALS, 2Lt Jody A., and 

BERGQUIST, Maj Timothy M., Contingency Task Training 

Scenario Generator...............................,... 

MIRABELLA, Angelo, Cooperative Learning in the Army: 

Research and Application...... . . . . . . . . . . . . . . . . . . . . . . . 

EGGENBERGER. J. C., PhD, and CRAWFORD, R. L., PhD, 

Battle-Task/Battleboard Training Application Paradigm 

and Research Design.............,...,.............,.. 

LICKTEIG, Carl W., KOGER, Major Milton E., and 

HESLIN, Captain Thomas F., Combat Vehicle Commander's 

Situational Awareness: Assessment Techniques........ 

FEHLER, F., An Aviation Psychological System for 

Helicopter Pilot Selection and Training.. . . . . . . . ...,. 

SPECTOR, J. M. and MURAIDA. D. J., Analyzing User 

Interaction with Instructional Design Software. . . . . . . 

PFEIFFER, M. G. and EVANS, R. M., Forecasting 

Training Effectiveness (FORTE)... . . . . . . . . . ,., ..‘..... 

PHELPS, Dr. Ruth H. and ASHWORTH, MAJ Robert L., Jr., 

Cost-Effectiveness of Home Study Using Asynchronous 

Computer Conferencing for Reserve Component 

Training............................................. 

132 

138 

144 

150 

156 

162 

161 

174 

180 

185 

191 

199

- TESTING 

401. RUDOLPH, Sandra A., Test Design and Minimum Cutoff 

Scores...........,......,,.,,.,,.,,,,,.......,. ,..... 204 

402. KOBRICK, J. L., JOHNSON, R. F., and MCMENEMY, D. J., 

Subjective and Cognitive Reactions to Atropine/2-PAM, 

Heat, and BDU/MOPP-IV................,.,.,........... 210 

403. LESCREVE, F. and SLOWACK, W., Guts: A Belgian Gunner 

Testing System.......................,............... '216 . 



J. 

406. KENNEDY, R. S., DUNLAP, W. P., FOWLKES, J. E., and 

TURNAGE, J. J., Characterizing Responses to Stress 

Utilizing Dose Equivalency Methodology............... 220 


408. ARABIAN, Jane M. and SCHWARTZ, Amy C., Job Sets for 

Efficiency In Recruiting and Training (JSERT)... ‘., . . 226 

409. THAIN, John W., Development of a New Language 

Aptitude Battery........................,............ 231 

410. WILLIAMS, J. E., STANLEY, P. P., and PERRY, C. M., 

Implementation of Content Validity Ratings In Air 

Force Promotion Test Construction................. . . . 235 

411. JEZIOR, B, A., POPPER, R., LESHER, L. L., GREENE, C. 

A., and INCE, V., Interpreting Rating Scale Results: 

What Does a Mean Mean?.....................,... ,,.,,. 241 

412. SANDS, W. A., Joint-Service Computerized Aptitude 

Testlng............................,,................ 


414. O'BRIEN, L. H., Assessment of Aptitude Requirements 

for New or Modified Systems................,.,.,...,, 251 

415. Presented in Symposium 803D. 

416. SCHWARTZ, Amy C. and SILVA, Jay M., The Practical 

Impact of Selecting Tow Gunners with a Psychomotor 

Test..~.,.,.........,,,....,,..,,......,.............. 

417. BRADLEY, Capt. J. P., Validation of a Naval Officer 

Selection Board..................,.,...............,. 262 

X 

245 

256


419. HANSON, Mary Ann, and BORMAN, Walter C., 

A Situational Judgment Test of Supervisory 

Knowledge in'the U..S. Army............................ 268 

420. Presented in Symposium 803B. 


422. BUCK, Lawrence S., Context Effects on Multiple- 

Choice Test Performance... . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 

423. SALTER, MAJ Charles A., LESTER, Laurie S., LUTHER, 

Susan M., and LUISI, Theresa A., Dietary Effects on 

Test Performance. ..,.. .,,,....,. . . . . . . . . . . . . . . . . . . . . . 280 

424. MAEL, F. A., What Makes Biodata Biodata?............. 286 

425. VAN HEMEL, S., ALLEY, F., BAKER, H., and SWIRSKI, L., 

Job Sample Test for Navy Fire Controlman,............ 292 

426. BAKER, H., SANDS, M., and SPOKANE, A., ASVIP: An 

Interest Inventory Using Combined Armed Services 

Jobs................................................, 298 

427. SPIER, M., DHAMMANUNGUNE, S., BAKER, H., and SWIRSKI, 

L * , Predicting Performance with Biodata.............. 304 

428. ALBERT, W. G. and PHALEN, W. J., Development of 

Equations for Predicting Testing Importance of 

Tasks.......................................,....,... 310 

429. DITTMAR, Martin J., HAND, Darryl K., PHALEN. William 

J and ALBERT, W. G., Estimating Testing Importance 

oftTasks by Direct Task Factor Weighting..... . . . . . . . . 316 


431. BRADY, Elizabeth J. and RUMSEY, Michael G., Upper 

Body Strength and Performance in Army Enlisted MOS... 322 

432. PALMER, D. R., WHITE, L. A., and YOUNG, M, C., 

Response Distortion on the Adaptability Screening 

Profile (ASP)..,....,............,...........,..,.... 328 

433. BANDERET, L. E., SHUKITT-HALE, B. L., LIEBERMAN, H. 

R SIMPSON, LTC R. L., and PEREZ, CPT P. J.. 

Psychometric Properties of a Number Comparison Task: 

Medium and Format Effects.........................,.. 334 

Xi

434. 

435. 

436. 

437. 

438. 

439, 

440. 

441. 

442. 

443. 

444. 

445. 

446. 

BANDERET, L. E., O'MARA, M., PIMENTAL, N. A., RILEY, 

SGT R. H., DAUPHINEE, SSG D. T., WITT, SSG C. E., and 

TOYOTA, SGT R. M., Subjective States Questionnaire: 

Perceived Well-Being and Functional Capacity......... 

ROMAGLIA, CIC Diane L. and SKINNER, Jacobina, 

Validity of Grade Point Average: Does the College 

Make a Difference?...................,...,.,,,,,.,,,. 


HANSEN, H. D., Flight Psychological Selection 

System - FPS-80: A New Approach to the Selection 

of Aircrew Personnel,.,.......,.,,......... . . . . . . . . . . 

MELTER, A. H. and MENTGES, W., Leadership in Aptitude 

Tests and in Real-Life Situations.................. . . 

PUTZ-OSTERLOH, W., Computer-based Assessment of 

Strategies in Dynamic Decision Making........... . . . . . 

RODEL, G., The "Information and Counseling Action" 

(IBA) of the German Navy......~............,......... 

CONNER, Dr. Harry B., Troubleshooting Assessment and 

Enhancement (TAE) Program: Test and Evaluation 

Results.......................,.........,............. 

BUSCIGLIO, Henry H., Incrementing ASVAB Validity with 

Spatial and Perceptual-Psychomotor Tests......... . . . . 

RUSHANO, T. M., Item Content Validity: Its 

Relationship with Item Discrimination and 

Difficulty...............,,..,..,......,.....,....... 

FIEDLER, E., The Air Force Medical Evaluation Test, 

Basic Military Training, and Character of 

Separation......,,........ . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 

TRENT, T., QUENETTE, M. A., and LAMBS, G. J., 

Implementation of the Adaptability Screening Profile 

(ASP)..........,.........,..........,.....,..,..,..,. 

MCGEE, Steve D., Utilization of Word 

Processors/Computers vs Typewriter for U.S. Navy 

Typing Performance Tests.....................~.~..... 

PAPER PRESENTATIONS - HUMAN FACTORS 

501. THARION, W. J., MARLOWE, B. E., KITTREDGE, R., HOYT, 

R and CYMERMAN, A. Acute High Altitude Exposure 

and Exercise Decrease Marksmanship Accuracy.. . . . . . . . . 408 

X-ii 

339 

345 

351 

357 

362 

368 

372 

380 

386 

398 

404 

_


503. COLLINS, Dennis D., Human Performance Data for Combat 

Models.........,......,.....,.............,.,,.,.,.., 

504. TLJRNAGE, Janet J., KENNEDY, Robert S., and JONES, 

Marshall B., Trading Off Performance, Training, and 

Equipment Factors to Achieve Similar Performance..... 

505. BAYES, Andrew H., Final Report, Computer Assisted 

Guidance Information Systems.. . . . . . . . . . . . . . . . . . . . . . . . 

PAPER PRESENTATIONS - LEADERSHIP 

601. 

602. 

603. 

604. 

605. 

606. 

607. 

608. 

609. 

610. 


ALDERKS, Cathie E,, Vertical Cohesion Patterns in 

Light Infantry Units....;......... . . . . . . . . . . . . . . . . . . . 

LINDSAY, Twila J. and SIEBOLD, Guy L., The Use of 

Incentives in Light Infantry Units... . . . . . . . . . . . . . . . . 

SIEBOLD, Guy L., Cohesion in Context., . . . . . . . . . . . . . . . 

WALDKOETTER, R. O., WHITE, W, R., Sr., and VANDIVIER, 

P. L., Evaluation of the Army's Finance Support 

Command Organizational Concept....................... 

STEINBERG, Alma G. and LEAMAN, Julia A., Leader 

Initiative: From Doctrine to Practice............... 




CLARK, Herbert J,, Starting a TQM Program in an R&D 

Organization.............,....,,..................... 

PAPER PRESENTATIONS - MISCELLANEOUS TOPICS 


702. ROOZENDAAL, Col. G. J. C., An Officer, a Social 

Scientist, (and possibly a gentleman) in the Royal 

Netherlands Army (RNLA)..... . . . . . . . . . . . . . . . . . . . . . . . . . 

703. GOLDBERG, Edith Lynne, SHEPOSH, John P., and 

SHETTEL-NEUBER, Joyce, Acceptance of Change: An 

Empirical Test of a Causal Model....... . . . . .,........ 

Xiii 

414 

419 

425 

432 

438 

444 

450 

455 

460 

466 

474

PAPER PRESENTATIONS - SYMPOSIA (ALL CATEGORIES) 

801. TWEEDDALE, J. W., Symposium: The Naval Reserve 

Officers Training Corps (NROTC) Scholarship 

Selection System................,.............. . . . . . 480 

801A. TWEEDDALE, J. W., Research Needs for Naval Reserve 

Officers Training Corps Scholarship Selection... . . . . 480 

801B. HAWKINS, R. B., Gathering and Using Naval Reserve 

Officers Training Corps Scholarship Information,. . . . 481 

8OlC. EDWARDS, Jack E.9, BURCH, Regina L., and ABRAHAMS, 

Norman M., Validation of the Naval Reserve Officers 

Training Corps Quality Index..................,..... 486 

801D. BORMAN, Walter C., OWENS-KLJRTZ, C. K., and RUSSELL, 

T. L., Development and Implementation of a 

Structured Interview Program for NROTC Selection,... 492 

801E. HANSON, Mary Ann, PAULLIN, Cheryl, and BORMAN, 

Walter C., Development of an Experimental 

Biodata/Temperament Inventory for NROTC Selection.. . 498 

802. 802 through 8025 Not Presented. 

803. BORMAN, W., BOSSHARDT, M., DUBOIS, D., HOUSTON, J., 

CRAWFORD, K., WISKOFF, M., ZIMMERMAN, R., and 

SHERMAN, F., Psychological Applications to Ensuring 

Personnel Security: A Symposium....,.... . . , ,..., . . . 504 

803A. DUBOIS, D., BOSSHARDT, M., and WISKOFF, M., The 

Investigative Interview: A Review of Practice and 

Research...................,...........,.,..,....... 

803B. ZIMMERMAN, R. A. and WISKOFF, M. F., Utility of a 

Screening Questionnaire for Sensitive Military 

Occupations....... .,. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 

803C. BOSSHARDT, M., DUBOIS, D., and CRAWFORD, K., 

Continuing Assessment of Cleared Personnel In the 

Military Services.....................,.......,..,.. 516 

803D. HOUSTON, J., WISKOFF, M. and SHERMAN, F., 

A Measure of Behavioral Reliability for Marine 

Security Guards.... . . . . . . . . . . . . . . . . . . .,......,.,.... 522 

804. HARRIS, J. H., CAMPBELL, Charlotte H., and CAMPBELL, 

Roy C., Symposium: Job Performance Testing for 

Enlisted Personnel..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 

xiv 

505

804A. 

804B. 

ao4c. 

804D. 

805. 

DOYLE, Earl L. and CAMPBELL, R. C., Navy: Hands-On 

and Knowledge Tests for the Navy Radioman........,.. 529 

EXNER, Maj P. J,, CRAFTS, J. L., FELKER, D. B., 

BOWLER, E. C., and MAYBERRY, P. W., Interrater 

Reliability as an Indicator of HOPT Quality Control 

Effectlveness.............,...,....,,...,....,,,,,,, 535 


CAMPBELL, Charlotte H. and CAMPBELL, Roy C., Army: 

Job Performance Measures for Non-Commissioned 

Offlcers...,........,.~,.,.,.,,..................... 541 

BROOKS, J. T., CkLE, W. J., HARRIS, J. C., STANLEY 

II, P. P., and TARTELL, J, S., The USAF Occupational 

Measurement Squadron: Its Organization, Products, 

and Impact......,.,..,...........................,., 547 

PAPER PRESENTATIONS - VENDOR PRESENTATIONS 

901, BROWN, Gary C., The Examiner..,.......:............. 553 

CONFERENCE INFORMATION 

MINUTES OF THE STEERING COMMITTEE MEETING.. . . . . . . . . . ..a.... 560 

LIST OF STEERING COMMITTEE MEETING ATTENDEES ............... 562 

AGENCIES REPRESENTED BY MEMBERSHIP ON THE 

MTA STEERING COMMITTEE................................. 563 

BY-LAWS OF THE MILITARY TESTING ASSOCIATION ................ 568 

LIST OF CONFERENCE REGISTRANTS ............................. 573 

INDEX OF AUTHORS.....,............,........................ 585 

XV 

_.

32nd Annual Conference of the Military Testing Association 

Orange Beaah, Alabama 

5 November 1990 

,OPENING SESSION 

Opening Remarks: Commander Mary A. Adams, Head, Naval 

Advancement Center Department, Naval Education and Training 

Program Management Support Activity; Pensacola, Florida. . . . 

Welaome: Mr. George W. Tate, Executive Vice President, Orange 

Beach Chamber of Commerce; Orange Beach, Alabama 

* 

Keynote Addrees: Lt General Donald W. Jones, Deputy Assistant 

Secretary of Defense (Military Manpower and Personnel Policy) 

xvi

'I'li~; CANADIAN RESERVES: CURRENT X1JD FUTURE MANPOWER* 

Susan R. Truscott 

Directorate of Social and Economic Analysis 

Operational Research and Analysis Establishment 

Department of National Defence 

Ottawa, Canada 

BACKGROUND 

In 1587, the Canadian White Paper on Defence outlined 

numerous policy changes for the Canadian Forces. One of these 

was the Total Force Concept. In brief, it stated that the 

distinction between the regular force and the reserves is to be 

reduced and the responsibility for national defence is to be 

shared. To fulfil its commitments, Canada must look to a 

peacetime structure that can be rapidly and effectively 

augmented by a trained reserve force composed of part-time 

members. A mixed operational force is to be formed, where 

regular and reserve force personnel are integrated in units. 

The ratio of full-time to part-time personnel will be dependent 

on the nature and requirements of the unit. 

Currently, regular force members outnumber reservists by a 

ratio of more than three to one. To assume a greater role in 

the defence of Canada, the reserves are to be revitalized and 

expanded. The recruitment of a large number of reservists, and 

perhaps different types of reservists, over the next decade 

will present a challenge to the reserves, in light of current 

socio-demographic and economic trends such as a declining youth 

population and broader employment opportunities. Recruiting the 

required number of reservists may necessitate new initiatives - 

for example, the widening of the traditional recruiting 

population and the engagement of new recruiting and advertising 

strategies. 

Several studies have been undertaken to provide data on a 

force that has, at least from a research point of view, been 

largely ignored in recent years. The focus of this paper is on 

a three phase study of the Primary Reserves, conducted by the 

Directorate of Social and Economic Analysis. During Phase One, 

qualitative information was collected through interviews with 

key reserve personnel. In Phase TWO, a survey was administered 

to a random sample of reservists to identify the 

characteristics, attitudes and values of reservists. The study 

also focused on retention and the internal organization of the 

reserves. A national attitude survey of 6000 Canadians was 

conducted, in Phase Three, to assess knowledge of the reserves, 

attitudes toward the reserves and the propensity of Canadians 

to join the reserves. Preliminary results of this study, and 

' The views and opinions expressed in this paper are 

those of the author and not necessarily those of the 

Department of National Defence. 

1

their implications in light of socio-demographic trends in the 

Canadian population and organizational changes planned for the 

reserves, are highlighted in this paper. A profile of 

reservists is presented first. This is followed by data on the 

Canadian public's knowledge of, and attitudes toward, the 

reserves. 

FINDINGS 

A. SURVEY OF RESERVISTS 

The reserves are dominated by young, single males. At the 

time of the survey, thirty-one percent of the reservists were -. 

students and 18% were unemployed. Together, these two groups 

comprise almost one-half of the reserves. Of the remaining 51% 

who were employed, about 24% were Class B or C Reservists, and 

thus in continuous ful!l-time employment with the military. In 

comparison to 1976, there has been only a modest change in the 

percentage who are employed. However, there has been a 

substantial increase in the percentage who are unemployed and a 

decrease in the percentage who are students. This is an 

indication of how closely tied reserve recruitment and 

retention is to the employment situation in the Canadian 

economy, and in particular the regional economy - relationships 

well documented in regular force research. 

The reserves are attracting and/or retaining more personnel 

who have or are achieving post-secondary education. Of the 

reservists who were attending school in 1976, 66% were in high 

school, 17% were in college and 16% were enroled in university. 

The recent survey indicated that 50% of the students were in 

high school, 20% were in college and 30% were in university. 

The increase in reservists with, or attaining post-secondary 

education reflects the greater emphasis on education in 

society, the greater technical demands in some areas of the 

forces, and the use of reserve activity to subsidize 

post-secondary education costs. 

Many reservists have prior experience with the military - 

forty percent had been members of the cadets and 20% had 

previously been in another reserve unit. Ten percent of primary 

reservists had served in the 'regular force. Ex-service members 

provide expertise and training that is difficult; if not 

impossible, to recruit from the civilian work force. There are 

some 65,000 ex-regular force members who would be suitable for, 

but are not members of the reserves (Bossenmaier, 1987). 

Our study indicated that word of mouth was the most common 

first source of informatiori on the reserves. Only 7% of 

reservists reported that formal advertising had provided their 

first. information on the reserves. National advertising 

campaigns have not been the focus of reserve recruiting in the 

past, however, they are an effective means of directing a 

specific message to a target population. Indeed they may 

provide a very functional mechanism to enhance public awareness 

2

of ‘;llC 1; e f-;

proportion of the population benefit from receiving some 

military training and experience and be of use to the military 

if mobilization is required. In addition, they contribute to 

the "Defence Community" in Canada; that is, sub-groups of 

Canadians with military knowledge and experience and an 

understanding of the Defence mandate. The reserves have both 

organizational and societal responsibilities, thus public 

relations campaigns, should be designed to appeal to those who 

view the reserves as a part-time job and to those who view it 

as a professional calling. 

Based on estimates from Statistics Canada, the general -. 

population will continue to age due to birth rates below 

replacement levels. It is expected, however, that the growth 

rate in the countqy will be maintained through increased 

immigration. These projections carry major implications for the 

reserves. A decline in the youth population means that the 

traditional recruiting base will shrink and that the reserves 

may increasingly have to rely on women, older persons, the 

employed, ex-regular force members and first generation 

Canadians to fill its ranks. These are all subgroups of the 

population currently under-represented in the reserves. By 

extending its age restrictions, older members in certain trades 

may be encouraged to stay. 

B. NATIONAL ATTITUDE SURVEY 

The National Attitude Survey was administered to 6000 

Canadians between the ages of 15 and 50 to assess the level of 

awareness of the reserves, attitudes toward the reserves, and 

the propensity of various sub-groups of Canadians to join the 

reserves. Eighty percent of those interviewed had heard of the 

reserves, but few admit to having a great deal of awareness of 

the reserves or their activities. In fact, just over 40% of 

those interviewed said they were not at all, or not very aware 

of the reserves, or their activities. 

Many Canadians reported that word of mouth was their most 

significant source of information on the reserves. Forty-five 

percent of Canadians reported that friends, family members, 

relatives or teachers were their main source for information 

about the reserves. The media were reported as the most 

significant sources for 42% of those interviewed; thus far more 

important than was the case for reserve members. 

Twenty percent of those interviewed, and 25% of those 15- 

24 years of age, had considered joining the reserves within the 

previous year. Addressing the future, about 5% of the sample 

said that they were somewhat to very likely to considering 

joining the reserves. This was the case for 10% of those aged 

15-24. Interest in joining the reserves was highest among 15-24 

year olds in the Atlantic provinces and Quebec. Those who 

indicated a willingness to join the reserves, most frequently 

responded with patriotic reasons. Monetary/work experience 

4

easons, followed by social reasons were a's0 common responses. 

Older persons were more likely to report patriotic reasons for 

interest in the reserves, while pragmatic reasons were more 

common among zhe young. A lack of interest, family, school and 

work responsibilities, and age were the most common reasons 

provided by those not interested in joining the reserves. 

SUMMARY 

In summary, the reserves are still very dependent on young 

Canadians to fill its ranks. While there are benefits in 

recruiting from this sub-group of the population, there are 

also considerable drawbacks. The historical exclusion of 

students from mobilization, high attrition rates, and the 

resulting continual training requirements are examples. With 

little doubt, attrition rates will continue to remain high 

among the reserves. Principally, the factors that draw 

reservists out of the reserves are related to their age and 

stage of life. This suggests that attrition rates may be 

improved by attracting a different type of individual to the 

reserves, such as older persons, ex-regular force members or 

the civilian employed. Further efforts will be made to explore 

attitudes toward the reserves, and the propensity to join, 

across sub-groups of the population currently under-represented 

in the reserves, as well as factors which may currently limit 

or restrict participation. Reserve manpower and related manning 

issues should be reviewed in light of the new policy role for 

the reserves. 

REFERENCES 

Bossenmaier, G. (1987). Potential Manpower Resources For 

Mobilization Part 1 (ORAE Project Report No. PR434). 

Ottawa, 0ntario:Directorate of Manpower Analysis. 

Goodfellow, T.H. (1976). Reserve Force Survey (ORAE Project 

Report No. PR62). Ottawa, Ontario: Directorate of Social 

and Economic Analysis. 

Popoff, T. and Truscott, S. (1987). A Sociological Study of 

the Reserves: Phase Two Trends and Implications for the 

Future (ORAE Project Report No. PR440). Ottawa, Ontario: 

Directorate of Social and Economic Analysis. 

Sinaiko, W. H. (1985). Part-Time Soldiers, Sailors and Airmen, 

Reserve Force Manpower in Australia, Canada, New Zealand, 

the U. K. and the U.S. (Technical Panel 3 Report (UTP-3)). 

Washington, DC: The * Technical Cooperation Program, 

Subgroup U. 

Truscott, S. (1987). A Socioloqical Study of the Reserves: 

Phase Two Summary of Research Findinqs (DSEA Staff Note 

No. 4/88). Ottawa, Ontario: Directorate of Social and 

Economic Analysis. 

5

Martell and LTC Dennis Winr? 

Department of the Army Headquarters, 

Office of the Deputy Chief of Staff for Personnel 

Pentagon, Arlington VA 

In order for the recruiting command (USAREC) to achieve its aggregate accession 

mission, there must also be specific MOS requirements to match the accession mission. 

The personnel command (PERSCOM) develops these MOS requirements. This paper 

will briefly describe the process and interactions among the various systems currently used 

to get the right number and”mix of soldiers to support the Army’s end strength 

requirements. This paper will define the challenges at the MOS and aggregate level of 

detail, exacerbated by the current changing environment, faced by the models and programs. 

timing (planning/forecasting), structure versus MOS program reductions, training capacity, 

and the effects of abrupt execution year accessions changes will be covered. A brief 

description of some specific accession policies will also be addressed. In addition, potential 

accommodations by the system to changing demands will be presented. 

MOS Level of Detail 

Under current procedures, all Army enlistees are assigned a job or a Military 

Occupational Specialty (MOS) upon initially contracting. Thus, for Army Recruiting 

Command (USAREC) to achieve its aggregate accession mission, there must also be 

specific MOS requirements to match the accession mission. These MOS requirements by 

grade and quantity are developed centrally at the Army’s Personnel Command (PERSCOM) 

and in the aggregate are MOS programs. 

MOS requirements are identified using a planning model called MOSLS (MOS Level 

System). Inputs to MOSLS include: AAMMP (Active Army Military Manpower Program) 

developed from the ELIM-COMPLIP (Enlisted Loss Inventory-Computation of Manpower 

Using Linear Programming), projected authorization data from the PMAD (Personnel 

Management Authorization Document) or UAD (Updated Authorization Data), and 

inventory data from the EMF (Enlisted Master File). 

MOSLS then determihes the recommended MOS and grade mix for the MOS 

inventories, the gains to the MOS required to meet those inventories, and the training 

needed to support those gains. While some of the gains to the MOS will come through 

reenlistments and reclassifications, the majority will come through USAREC’s accession 

mission. These accessions are referred to as the MOS programs. 

I’apcr presented at the 32nd Annual Conference of the Military Testing Association, Novcmbcr lWi), 

Training to support these programs is obtained through the Structure Manning Decision 

Review (SMDR) process. The SMDR is held annu.ally and allows each of the Army 

components (Active Army, Reserves, and National Guard) to express the training needed 

to support its MOS programs. These program requirements are evaluated against the 

training capacity in the Training and Doctrine Command (TRADOC) and, once approved, 

becomes TRADOC’s training mission. The approved training requirements are referred to 

as the Army Program for Individual Training (ARPRINT) and identifies for the individual 

TRADOC schools what their training mission is for the fiscal year. 

The TRADOC schools in turn develop individual class schedules to support their 

training mission. These class schedules are placed into the Army Training Requirements 

and Resources System (ATRRS) and ultimately into the automated accessioning system, 

REQUEST. The total of the “seats” in the classes for a particular MOS for the year is equal 

to the MOS programs developed through M0SL.S and approved in the SMDR. USAREC 

recruits against these classes and by filling the individual class “seats” also fills the annual 

MOS programs. 

The above process works well in a stable, predictable environment; however, as seen 

in recent years and especially now with the uncertainties of reducing the manpower in the 

Army or “downsizing”, the environment is anything but stable or certain. Discussed below 

are some of the problems encountered in managing accessions during these unique times. 

Timing. The SMDR works in the future. For example, the SMDR held in April and 

May 1990 built the FY93 training programs. Although FY92 was revalidated and FY93 was 

given a first look, the major work was on FY93 and it is that year’s training which will be 

approved in the ARPRINT in the summer of 1990. Projections in the best of circumstances 

are chancy; in a downsizing environment, the training that is “bought” and approved in 

FY90 may no longer reflect the requirements when FY93 finally arrives. Critical to the 

MOSLS process are the known and projected authorizations (PMAD) based on projected 

force structure. If the structure changes then the MOS requirements change, and thus the 

training requirements. While there are mechanisms to make adjustments to the training 

programs, because the SMDR is so closely tied to the budget and resourcing process, 

significant changes may not be satisfied in a timely manner. 

Structure Reductions. PERSCOM can adjust its MOS programs throughout the year 

to match the accession mission changes. Generally, these changes have been reductions in 

USAREC’s mission. While the Deputy Chief of Staff for Personnel (DCSPER) Accession 

Division can easily reduce the aggregate requirement, PERSCOM cannot reduce the 

supporting MOS programs without knowing what changes are being made in structure. 

Experience in the past and currently is that decisions on structure reductions lag behind 

decisions to reduce the accession missions. PERSCOM then is left with a couple of 

alternatives: make a best guess, in coordination with the Office of Deputy Chief of Staff 

for Operations (ODCSOPS), on what structure is coming out and adjust accordingly, [but 

the risk is that the guess will be wrong and irreversible decisions on MOS level accessions 

will have been made]; or leave the MOS programs untouched with the result being more 

availabIe program and training than there is accession mission to support. If the accession 

mission is 1,000 but there are 2,000 MOS program available, onIy 1,000 will be recruited. 

With that excess, we allow USAREC and the applicant to dictate what MOS programs are 

filled. The risk is that the wrong MOS programs will be filled. 

7

. 

Trainine Canacitv. The training that was approved in the SMDR was at the annual 

level. The individual school or installation must convert that requirement to class schedules 

and spread the requirement across the year. Generally, that spread will be made on a 

straight line ,consistent with the capacity in each course. For example, if the requirement 

for an MOS is 120 with a class optimum size of 20, the TRADOC school will likely 

schedule 6 classes conducted every other month. The concept is the same for basic training. 

While TRADOC does have some surge capacity, this straight line scheduling is a reflection 

of the fact that TRADOC is budgeted and manned on an annual basis. The physical plant 

(e.g. billets) and training equipment (e.g. simulators, tanks) may also dictate a straight line 

schedule with limited surge capability. 

TRADOC’s capability in recent years has been stretched to the limit. Faced with 

structure and budget cuts itself, TRADOC has recently indicated that it can no longer surge. 

They have requested HQDA’s support in effecting a more even flow into the training base 

with about 35% of the annual training capacity in the 4th quarter. While this should allow 

all three components (Active Army, Reserves, and National Guard) to still meet their 

missions while taking advantage of the prime summer recruiting months,the ability of the 

Army to slide the accession mission into the 4th Quarter to save Military Personnel Accoum 

(MPA) dollars will be restricted. 

Execution Year Changes. When the mission is slid to the 4th Quarter to save dollars. 

that shift can cause critical training seats to be missed which perhaps cannot be made up 

in the future, Also, because of the way MOS programs are counted, shifting the mission 

into August or September can take that mission “across the training year” line into the next 

training year. This is because MOS programs are based on the start date of the MOS 

producing course. Thus, an accession in an Advanced Individual Training (AIT) course for 

an MOS in August 1990, counts against the FY91 program because the MOS course will 

start in October, 8 weeks after the soldier accessed and entered the 8 week Basic Training 

(BT) course; for MOS with One Station Unit Training (OSUT), BT and AIT are merged 

so that when the soidier accesses and enters OSUT, he/she is starting their MOS producing 

course. The result is too low a mission to support the MOS programs in one year and 

excess mission for the available training in the subsequent “training program year”. 

PERSCOM must then take action to align the aggregate programs with the new mission by 

reducing the program level. These MOS program reductions can have the same adverse 

affect discussed above in Structure Reductions. 

Aggregate Level of Detail 

Input to the ELIM is made during the budget process and includes expected 

requirements for each of the months of the year/years being developed. Key considerations 

in this process are: recruiting capability, budget, and training capability. 

Recruiting capability is defined as what USAREC believes it can handle for each 

month, quarter, and annual mission. The aggregate numbers for each of these periods is 

developed from contract recruiting history and not necessarily accession history. The 

contract capability is developed by looking at what USAREC expects it can achieve in the 

VariOUS specific mission categories: combinations of gender, nonprior service/prior service, 

~l&$l SC11001 graduation status, and AFQT (the Armed Forces Qualification Test, part of the 

Armed Services Vocation Aptitude Battery or ASVAB). Contracting missions to the

USAREC commanders in the field takes a 6 month lead time and is, in the aggregate, 

influenced by the accession mission. However, the accession flow to the training base is 

controlled by USAREC headquarters and at the Department of the Army level. In other 

words, the recruiting commanders are not greatly influenced by gyrations in the monthly 

accession requirements. 

The influence of the market, incentives, recruiter tools, economic environment but 

especially recent recruiting history drive the aggregate numbers. USAREC deals with 

Reception Station Months (RSM) vice Calendar months. The RSM allows flexibility for 

shipping recruits to the 8 (7 after closing of FT. Bliss Reception Battalion) reception 

battalions. Conversion of RSM to Calendar numbers is not an accurate science since it is 

based on projected rate of shipping during each RSW (week). The conversion is important 

to identify the costs of each calendar month which is determined by manyear costs prorated 

by months of the year. Present dollar amount for one full year is $17,657. Thus if 5,000 

recruits are shipped (now referred to as accessions) in March, the sixth month of the fiscal 

year, this number equates to a cost of 5,000 x 6/12 or 2,500 X $17,657 and is $44,142,500 

in MPA cost. On the whole the conversion has been relatively accurate. 

Budgetary requirements to save Military Personnel Account (MPA) dollars tend to 

influence greatly the enlisted accession requirements. For instance, large amounts of MPA 

dollars can be saved simply by shifting accessions from the beginning of a year to the end 

of the year. Movement of accessions to the 4th quarter has been done for the last few 

years and is already included in the FY91 accession requirements by calendar month. 

Ironically, this shift to the 4th quarter does not cause USAREC as much concern as it does 

TRADOC. The major cuts to the Army budget in the form of dollars and endstrength have 

a significant impact on the adjustments to the accession programs for the current and future 

years. 

USAREC preference is for the following quarterly RSM breakdown: 

Quarter Percentage 

FIRST 21-23 

SECOND 22 

THIRD 19 

FOURTH 37 

First and fourth quarters are the best when considering the market. Some dynamics to 

consider here, which are hidden in the numbers, include the fact that high quality soldiers 

are easier to enlist with shorter Terms of Service (TOS), the size of next year’s mission 

influences recruit entry into the DEP (Delayed Entry Program), and the higher the 

aggregate quality goals the tougher the recruiting (unless resources for recruiting are 

commensurately increased). 

TRADOC prefers an even flow into training at the rate of 8.3% per month of input 

from USAREC. They have stated that the surge of training requirements in the fourth 

quarters are near impossible to handle. The contention is based on 4th quarter surge which 

includes Active, Reserve and National Guard. Acknowledging that even-flow is impossible, 

the fourth quarter surges, closing on 40%, are not resource supportable, especially with the 

first three quarters under capacity. It should be noted that the Army is looking at feasibility 

of clustering or combining certain MOS, eliminating MOS, consolidating training, 

eliminating BT/AIT/Reception Battalions (e.g., FT. Bliss) and this, although long term, will 

9

have a positive effect on some of the problems mentioned. 

Specific Accessions 

Females. The female enlisted accession floor was initially set in FY88. The most 

recent floor is based on slight growth, but growth nevertheless, of 0.2% to 0.5% percent 

each year in the female enlisted end strength compared to the total enlisted endstrength. 

This allows female endstrength to be reduced as the Arpy downsizes but, perhaps, at a 

lower rate than males. Final content for FY96 will be 13.3% of the enlisted endstrength. 

Recruiting females is actually more difficult than recruiting males. There are over 240,000 

available MOS slots for females in which to enlist. However, females have a lower 

propensity to enlist (11% versus 17% for males, YATS89) and gravitate to the more 

attractive MOS such as medical (91), administrative (71), supply (76), and communication 

(31)-usually the top four MOS for females annually, In FY89 63% of the females were in 

AFQT CAT I-IIIA, all but a hatidful were high school graduates, 69% took the four-year 

terms, 54% were white, 40.5% were black. The female accession floor of 15,500 was 

exceeded by 4%. 

Establishing a gender neutral accession mission in the Army will likely lower, not raise, 

the number of females who enlist. Money for college and shorter terms of service are the 

two most important considerations for females and without exceptional resources and 

attention to these attractions, and without proper “gosling” of the recruiter, female 

accessions would probably be significantly lower, perhaps by as much as 60%. Contract 

goals for females in FY88 was 13.4% with 13.7% achieved; FY89 had 16.3% as a goal and 

18.1% of total contracts achieved. 

Prior Service(PS). With the changes in Army structure there may be more need for PS 

to fill resulting holes in the structure, more as a result of unexpected losses of personnel 

than from structured reductions in MOS. However, the present PS requirement of 3,000 

in FY91, and 2,000 for FY92 appears intact. USAREC and the CMF 18 initiative to 

identify specific requirements for special forces NCOs to reenlist is an example of what 

should be developed for MOS fill. Essentially, almost half (42% in FY 87 and 49% in 

FY88) required retraining. All PS are AFQT CAT I-IIIA and the majority (90%) take the 

four year term, 10.5% were females for FY89, 71.6% were white, and 23.9% were black for 

FY89. 

Ouality. Relevant, empirical research has clearly shown the need for quality (high 

school graduates scoring in the top fiftieth percentile on AFQT) in the Army. Quality is 

a valid predictor of persistence or likelihood to finish one’s term of service and of ability 

to train in the first term. As the body of research on the performance of high quality 

soldiers (Army Soldier Performance Research Project (SPRP) etc.) is disseminated, there 

will be greater understanding of the value of quality soldiers and the interest in job 

performance and aptitude and ability testing is likely to increase; second term and later 

performance has yet to be rigorously analyzed, Nevertheless, the amount of quality 

required is and will continue to be questioned by Army leadership, OSD, and Congress. 

Another issue is the logical leap required to go from individual soldier quality to unit 

performance and readiness, although the connections were articulated in 7th annual report 

to Congress on linking enlistment standards to job performance, the argument will be 

viewed skeptically for some time. The accession quality improvements to date must not be 

10

lost; but asking for more quality, i.e, 67% I-IIIA, 95% HSDG and 4% or less CAT IV, 

is stretching the previous well-founded research arguments to the breaking point. The 

marginal performance benefit is difficult to quantify, and the cost effectiveness of the 

increased quality may not stand up to any close scrutiny. 

The Future 

Since the endstrength reductions mandated by Congress for the 1991-95 downsizing 

precede USAREC structure cuts, USAREC is now recruiting fewer individuals with the 

same number of recruiters that had been dictated by higher annual qualitative and 

quantitative objectives. The decrease in recruiting difficulty is therefore only temporary; 

the, need for models to predict accession requirements based on endstrength goals is critical. 

Several accession and force structure models are in various stages of completion. 

ALENO (Alternate Enlistment Options) which is being developed by the Concepts Analysis 

Agency has the potential to provide future skill level one and two structure requirements 

to the MOS level of detail with the input of such variables as term of service, quality and 

accessions. ALENO also will translate endstrength requirements into accession inputs by 

quality and term of service. In addition, SRA Corporation has developed a prototype of 

its Army Force Structure Planning Model (AFSPM) which is aimed at determining 

accession requirements in the future considering quality and term of service inputs as well 

as retention/attrition rates. The ALENO and AFSPM models should be available within 

the next six months. Other models, perhaps less sophisticated, are being developed to 

answer the key concerns about accession missions for the future. 

In a rather straightforward manner the steady state accession mission can be determined 

by past accession ratios. Considering the variables of term of service mix, gender, quality 

and its collateral attrition/retention rates and accepting the fact that the new endstrengths 

place manpower management in completely foreign territory, the past ratio of enlisted 

accessions to endstrength has remained relatively static for many years. Applying ratios of 

mission to endstrength for the past six years to the 488,969 enlisted endstrength results in 

high and low estimates for the end state mission of 103,200 and 85,600 respectively. A 

reasonable estimate for the accession floor to support a 580,000 end state is therefore 

85,000. However, with higher quality projected in the outyears (less first term attrition) 

and lower average TOS mix, from the FY89 high of 3.88 years to the present average TOS 

for FY90 of 3.7 years (and dropping, mostly as the result of offering shorter terms to 

attract higher quality), the accession floor is expected to be closer to 90,000 by FY96. The 

relationships and effects of these variables to one another and to the accession mission is 

considerable. Establishing a Term of Service average objective for USAREC in the annual 

rnission letters could be used to better align the force for the future downsizing. Although 

the recruiting market dictates what can be sold in contracts, an overall TOS mix average 

set in aggregate from the MOS requirements/TOS mix would foster more control over the 

longevity (and experience) of the force. 

Overall, the systems for establishing and monitoring accessions is in place and has been 

effective. The accession objectives, although highly dynamic, can be achieved. The 

downsizing, however, leads the Army into completely unmapped terrain which will greatly 

test the systems, the personnel managers, and, most notably, the soldiers presently in the 

Army. 

11 

,

Reviewed by: 

D.S. Crooks 

Lieutenant-Commander 

Research Coordinator 

ETHNIC PARTICIPATION 

IN THE CANADIAN FORCES: DEMOGRAPHIC TRENDS 

Lieutenant (Naval) D.T. Reeves 

1 

Paper presented at the 32nd Annual Conference of the 

Military Testing Association, Orange Beach, 

Alabama, U.S.A., November 5-9, 1990. 

Canadian Forces Personnel Applied Research Unit 

Suite 600, 4900 Yonge Street 

Willowdale, Ontario 

M2N 6B7 

Approved by: 

F.P. Wilson 

Commander 

Commanding Officer

Background 

ETHNIC PARTKCIPATION 

IN THE THE CANADIAN FORCES: DEMOGRAPHIC TRENDS 

Lieutenant (Naval) D.T. Reeves 


Willowdale, Ontario, Canada 

INTRODUCTION 

Given the current Human Rights climate and the dwindling lal-mlr 

force, the perceived lack of ethnic minority representation in the 

Canadian Forces (CF) is of concern. Socio-demographic trends portend a 

Canadian population marked by cultural diversity ancl an aging, dwindling 

Iabaur force. In keeping with the multicultural policy OF the Canadian 

government, and in response to the proposed expansion of the Primary 

Reserves, the CF is reviewing its representation of ethnic minorities. 

Currently there is a dearth of most ethnic minorities in the CF 

compared to their representation in the general population (Febhraro G 

Reeves, 1990). This under-representation is of concern to National 

Defence in its efforts to ensure that the cultural diversity of the 

Canadian population is reflected in the composition of the CF. 

Purpose 

The purpose of this paper is to review the present ethnic 

composition of both the Canadian population and the CF, and provide a 

preliminary examination of immigrant and visible minority recruitment in 

one l.arge urban area. 

Definitions 

CANADIAN ETHNIC DIVERSITY 

In order to ensure concept clarity, the definitions of ethnic, 

immigrant and visible minority are AS follows (Multiculturalism and 

Citizenship Canada, 1990): 

R . Ethnic. The culture or country of origin of an individllal nr 

one's ancestors. 

b. Immigrant. Anyone who is not a Canadian citizen by birth. 

P-. Visible Minorities. Genera3ly, persons other than Aboriginal 

peoples, who are non-Caucasian in race or non-white in colour. 

There are in excess oE 100 ethnic groups in Canada and, exclud:ng 

persons of English and French origins, this represents 25% of the Canadian 

population (75% of the Canadian population are of English, French or 

multiple English and French origins). Ethnic group concentratinns ‘JR i--g 

from p-rovince to province and from city to city and many ethnic groups 

13

s record populations fewer than 10,000 members, with a considerable number 

Of grcJUps registering 3 , 0 0 0 Or less. In terms of large groups, there are 

only 11 which register more than 250,000 members excluding persons of 

British or French origins (Multiculturalism and Citizenship Canada, 1990). 

Canada's ethnic composition has changed substantially since the end 

of World War II. Duri.ng the earliest period of Canadian immigration 

history, immigrants arrived largely from Britain and France. After 1945, 

they came increasingly from other countries in Western and Eastern Europe 

and from the United States. More recently, immigrants to Canada have come 

primarily from Asia, Africa, the Caribbean, and Central and South America 

- although between 1973 and 1980, Europe was still the single largest 

source of immigrants. Immigration levels have fluctuated between 30,000 

in 1945 to a peak of 282,000 in 1957. Current immigration projections for 

the 1990s are between 150,000 and 175,000 per year. 

The composition of ethnic populations varies from province to 

province. While people with British origins make up the largest 

proportion of the population in all provinces except Quebec, the size of 

this proportion varies from 90% in Newfoundland to 30% in Manitoba and 

Saskatchewan. Persons horn outside of Canada currently comprise a larger 

part of the Canadian population than at any other time. This foreign born 

m-w, the majority (80%) of whom are Canadian citizens, now represents 

approximately 15% of the Canadian population. Most immigrants (53%) live 

in three cities: Toronto (32%), Montreal (12%) and Vancouver (lo%), 

although the specific ethnic mixes for each of these cities is different 

(Multiculturalism and Citizenship Canada, 1990). 

The main source of information about ethnic populations has heen 

the Canadian Census. In the past, census data have used such narrow 

indicators of ethnic origin as language spoken at home, mother tongue, 

Paternal ancestry, and country of origin. The 1986 Statistics Canada 

definition of ethnicity is based upon ethnic origin as it refers to one's 

cultural ancestral roots, and may therefore reflect ancestry, nationality, 

race, language or religion, hut should not he confused with citizenship or 

nationality in the strictest sense (Statistics Canada, 1988). In 1986, 

for the first time, the census recorded both single and multiple ethnic 

origins in order to establish a more accurate picture of the ethnic 

make-up of Canada's population. As a result, a substantial proportion 

(282) of Canadians jndicated multiple origins in their ancestry. Given 

the multiplicity and changing nature of the definition of ethnicity and 

the increasing numbers of multiple origin members, caution should be 

exercised not to use Canadian ethnic origins in any ahsolute way- 

Canada's largest ethnic origin groups are; 8.4 million British 

only, 6.1 million exclusively French origins, and 1.2 million both British 

and French. Almost 9.4 million Canadians indicated at least one ethnic 

origin other than British or French, and more than 6 million Canadians 

reported having non-British and non-French ethnic roots (Multiculturalism 

and Citizenship Canada, 1990). Table 1 is based on 1986 census data and 

shows the ten largest ethnic groups in Canada. The CF is dominated by 

members of British and French origin. The under-representation of most 

other ethnic groups is apparent by examining Table 2 census data 

14

Tahte 1 

The Ten Largest Ethnic Groups in Canada (1986 Census) 

1. 

2. 

3. 

4. 

5. 

6. 

2 

9. 

IO. 

Single Multiple 

Origins Origins Total %" %'l 

English 4,742,040 4,562,910 9,303,950 

French 6,087,310 2,027,945 8,115,255 

Scottish 865,450 3,052,605 3,918,055 

Irish 699,685 2,922,605 3,622,290 

German 896,715 1,570,340 2,467,055 

Italian 709,590 297,325 1,006,915 

Ukrainian .420,210 541,100 961,310 

Dutch (Netherlands) 351,760 530,170 881,930 

Polish 222,260 389,845 611,745 

North American 

Indian 286,230 262,730 548,960 

36.8 la.7 

32.1 24.1 

15.5 3.4 

14.3 2.8 

9.7 3.5 

3.9 2.8 

3.8 1.7 

3.5 1.4 

2 .4 0.9 

2.2 1. . 1 

Note. In all calculations, the figure used to represent the total Canadian 

population is 25,309,331. 

aInrlicating single and multiple origins (based upon total response data). 

'Percent of Canadian populatinn indicating a single origin. 

Tahle 2 

Representation of Selected Ethnic Groups in the CF (1986 Census) 

Ethnic Origin 

% of Canadian % of CF Officer % of CF Non-Officer 

Population Population Population 

British 25.0 29.0 29.3 

French 24.1 19.2 26.8 

German 3.5 2.9 2.3 

Italian 2.8 .4 

Ukrainian 1 . 7 1.3 :iZ 

Dutch 1.4 1.6 1.0 

Chinese 1.4 .3 .I 

South Asian .9 .2 .I 

Rlack" 1.0 .- 7 .2 

Ahoriginalsb 2.8 1.4 2.6 

Visible Minoritiesb 6.4 1.8 1.8 

aRlack represention estimates based upon personal communication with L.'iz! 

Director of Personnel Informations Systems, 1990. bBased upon Empl~~vm~nt 

and Immigration statistics (1989). 

15

(Statistics Canada, 1988). Almost all of these figures are helow their 

corresponding statistics in the general Canadian population with the 

Italian, Chinese, South Asian and Black groups being the most underrepresented. 

IMMIGRANT ANT) VISIDLE MINORITY RECRUITMENT 

Non-Commissioned Members - Regular Force 

A recent review of regular force non-commissioned (NCM) recruit 

applications (conducted at Canadian Forces Recruiting Centre (cFRC) 

Toronto in August 1990) indicated that 91.4% of applicants were Canadian 

born. Foreign born (immigrant) applications were only 8.6% of the total 

of those applying, and well below their Census Metropolitan Area Toronto 

representation level of 36% of the population. A further breakdown of 

foreign horns revealed thdt 55% of this applicant group (or 4.7% of the 

total applicant population) were mcmhers of a visible minority. This 

compares with a national visible minority representation of approximately 

6% and a Census Metropolitan Area Toronto representation of 13% (visible 

minority status of the Canadian born group could not he 'established using 

file information). Typically, the average number of regular force NCM 

applicants who go on to become enrolled is approximately 30%. In this 

most recent revi.ew, however, only 4% of visible minority applicants and 8% 

of foreign horn applicants were enrolled. Although these figures are 

hased on active files, and therefore more will prohably enrot before the 

end of 1.990, they will. still. remain below the foreign horn and visible 

minority representation levels for Census Metropolitan Area Toronto and 

the nation as a whole. Out of a total of 51 foreign horn regular force 

NCM applicants, there were four enrolments; and while 28 of these foreign 

horn applicants were visihle minorities, only one was enrolled. 

Officers - Regular Porte 

Regular force officer applicants were 80.1% Canadian horn and 19.9% 

foreign horn with 34.3% (or 6.8% of the total population) of the foreign 

horn group being visible minorities. As was noted with the NCM regular 

force applicants, both the foreign born and visihle minoritry groups were 

well below their respective representative figures for Census Metropolitan 

Area Toronto. The pattern of low enrolment seen with regular force NCM 

candidates, however, was ameliorated for regular force officer applicants, 

of which 17.1% foreign borna, 20.8% visible minorities, and 16.7% Canadian 

horns, enrolled. Out of 70 foreign horn regular force officer applicants, 

there were 12 enrolments; and while 24 of these Eoreign horn applicants 

were visihle minorities, five were enrolled. 

Non-Commissioned Memhers - Reserve Force 

In contrast to the regular force NCM recruiting, a review of NCM 

reserve force files indicated a much more positive picture, with Canadian 

horn applicants representing 56X, foreign horns 44% (8% ahove Census 

Metropolitan Area statistics), and visible minorites 68.8% (or 30.1% of 

the total applicant population). NCM enrolment percentages for Canadian 

horns (45.9%) were highest, with enrolments of 33.8% and 32.1% for foreign 

horns and the visible minorities, respectively. It was also noteworthy 

that 12.6% of reserve force NCM applicants (22.7% of whom were enrollerl) 

16

or 25.6% of foreign horns, :Jero actually non-Canadinns, i.r?., new arrivals 

til C.anada . Cut nf 154 foreign horn reserve Eorce NCM applicants, there 

were 52 enrolmcnts; and while 196 of these foreign horn i1ppJicant.s were 

vi.sihle minorities, 34 were enrnlle%l. 

Officers - Reserve Force 

Reserve force officer applications roughly par.aJlcl that nf reo11?;1r 

0 

force officer applications wi.th 76.8% heing Canadian horn, 23.2% Fi)l-;~i~ll 

horn and 38.5% members of a visihle minority (or 3.9% of the tot:+J 

applicant population). Enrolments for the reserve force officers wer(' 

similar for both Canadian horns (37.2%) and foreign horns (38.5%), and 

much lnigher for visible minority members (60%). Out of 13 foreign horn 

reserve force officer applicants, five were enrolled; and while five of 

these foreign horn applicants were visi:>le minorities, three were enrnJ.lerl. 

nISCUSSION 

T!w present review indicates that the CF regular Eorce does not 

reflect the Canadian cultllral mosaic. Amongst the under-represented 

groups, census figures indicate that Italian, Chinese, Rlack and Sollth 

Asian origins tend to he the lowest. This under-representation, combined 

with substantial Canadian populations, make these groups of ~peel~1'1 

interest for more focussed research. In terms of recruiting initiatives, 

all four groups represent a nota3lo, and as yet untapped, snurce 0 f 

personnel (Chinese, Black an!1 South Asians make up the largest and f.astest 

growing visible minority groups in Cana:i.a). 

Although immigrant recruitment at CFRC Toronto does not necessarily 

reflect national recruiting norms, tliis preliminary review suggests th.s!: 

the NCM reguJar force does not attract immigrants and visi5lc minority 

members at a representative rate. mis situation is somewhat improved for 

regular force and reserve force officers, and although they remain significantly 

below those levels required for representativeness in Censlls 

Metropolitan Area Toronto, they are ahove both the 3.986 census and nation;>! 

representation levels for CF visible rn1norit.v representation. In contrast, 

findings for the reserve force NCM applicant group in Toronto suggest that 

this group is actually ever-represented hy both immigrants and v-lsiblll 

minority members. 

The will.ingneas of individuals from immigrant groups tn apply for 

NCSI reserve force service in relatively large numhers, wllile at the ww 

time, avoiding regular force application, suggests that the attitrldes held 

by these groups regarding regular force employment may he suhstantinlly 

different. Since immigrant and viaiSle minority members have nnt shown an 

antipathy to apply for military duty per se, it is important to determine 

the specific attitudes which are held by these groups which may be acting 

as barriers to regular force enrolment. Knowledge gained ahout t':.lcsk> 

groups in terms of distinct ethnic attitudes toward the CF may he used t:> 

mndify the recrlliting approach to other under-represented groups for w'-~i(*l? 

study may he problematic (smaller numhers and wider geographic 

dispersion). These findings will have important cnnsequences for futilr~~~ 

effective ethnic recruiting initiatives. 

17

. REFERENCES 

Employment and Immigration Canada. (1.989). Employment equity availahility 

data report on designated groups. Technical Services Employment 

Equity Branch. Ottawa: Minister of Supply and Services. 

Febbraro, A., & Reeves D.T. (1990). A literature review OF ethnic attftude 

formation: Implications for Canadian Forcqs recruitment (Working 

Paper 90-2). Willowdale, Ontario: Canadian Forces Personnel 

Applied Research Unit. 

Multiculturalism and Citizenship Canada. (1990). Multicultural Canada; 'A 

Graphic Overview. PO1 icy and Research Multiculturalism Sector 

Multiculturalism and Citizenship Canada. Ottawa: Minister of 

Supply and Services, 

Statistics Canada. (1988). Census handhook. Ottawa: Minister of Supply 

and Services. 

18

1990 ARMY CAREER SATISFACTION SURVEY 

Timothy W. Elig’ 

U.S. Army Research Institute 

To help personnel officials prepare for the eventual downsizing of the Army, the Chief of Staff, 

Army (CSA) directed that a survey of soldiers be conducted rapidly. “The downsizing of the U.S. Army is 

inevitable,” BG Stroup wrote in a memorandum requesting the Army Research Institute (ARI) to conduct a 

survey “. . . to determine the attitudes and concerns of our soldiers about the changes that will take place.” 

Even as events in Southwest Asia and Operation Desert Shield have dominated the news headlines, 

other important events have continued. Discussions about federal budget deficits, the end of the Cold War 

era, increased cooperation between the U.S. and the U.S.S.R, and German reunification are also front page 

news that lead to speculation about a reduction in the size of U.S. military forces. 

Soldiers may feel that their careers are being victimized by their contributions to the successful 

conclusion of the Cold War even as they are asked to risk their lives for their country: Many of the soldiers’ 

concerns about their career and prospects for downsizing may in fact be made worse by recent events that 

have fostered even more uncertainty and curtailed the flow of information on the future make-up of the 

Army. Thus it is important to understand the morale of the force as it was just prior to Operation Desert 

Shield, in order to understand how soldiers are likely to.respond to continuing career uncertainties. 

About this Survey 

The 1990 Army Career Satisfaction Survey (ACSS) was designed by AR1 to answer several questions 

raised by the CSA and by DA personnel policy makers and analysts. Administration costs were paid by 

HQDA though the Army Research Office’s Scientific Services Program. 

This survey was designed to provide an overview of soldiers’ attitudes, perceptions, and intentions 

concerning Army downsizing. While not all of these topics are discussed here, the survey included items on: 

career plans and intentions; advice to others on joining the Army; the Army experience as preparation for 

civilian jobs; organizational commitment and trust; reactions to European thaw in cold war and to 

downsizing; expectations about what a smaller Army would mean and what the Army would be like over the 

next live years; soldiers’ sources of information on downsizing and their trust in the sources; specific personal 

and family concerns about involuntary separation and resources needed to cope with unexpected separation; 

financial and emotional resources for separation; reactions to specific personnel management policies that 

could be implemented for downsizing; and propensity to accept “early-outs.” 

Thirty thousand soldiers (15,000 in enlisted, 10,000 in commissioned, and 5,000 in warrant ranks) 

were surveyed in June and July 1990. The main sample of 28,071 represents soldiers at all ranks countable 

toward the active strength of the Army on 31 March 1990, with the following exclusions: a) general officers, 

b) soldiers with less than 12 months of service, and c) soldiers in the process of separation or retirement. 

Another 1,929 soldiers who had been surveyed in previous efforts were also sent this survey in order to 

measure attitude changes over the last four years. 

Preliminary results from partial returns were provided to HQ, Department of the Army, in late July 

and early August. The final results presented here are based on 17,326 returned surveys from 6,997 

‘The findings in this report are not to be construed as an official Department of the Army position, unless SO designated 

by other authorized documents. 

19

- commissioned officers, 3,596 warrant officers, and 6,733 enlisted soldiers in the main sample. These data 

have been weighted to be representative of the Army. 

On the basis of both response rates and margins oi error, this survey provides accurate attitude 

estimates for the entire Army and for relatively small subgroups. Response rates for the survey were 

extremely good. Completed surveys were returned by 58% of the main sample. When adjusted for postal 

non-delivery and late returns of completed surveys the overall response rate is 65% (80% of warrant officers, 

76% of commissioned officers, and 51% of enlisted). 

The overall margin of error is less than 1.3% indicating that 95% of the time a sample estimate of 

50% is within 1.3% of how the entire population would respond if surveyed. Margins of error are also quite 

small for each of the three main groups (1.3 for commissioned officers, 1.7 for warrant officers, and 1.6 for 

enlisted) and for subgroups of soldiers defined by categories such as gender or rank. 

Soldiers Are Positive About Themselves, Military Service, and Their Skills 

Soldiers are positive about military service for themselves, There is a strong core of committed 

soldiers (57% of commissioned officers, 63% of warrant officers, 45% of enlisted) who want to serve for 20 

or more years even if they could retire earlier. For many of these soldiers, the kind of work they most enjoy 

is available only or primarily in the military. This is most strongly characteristic of commissioned officers. 

Soldiers are confident of their own job performance; over three quarters of them said they were well 

prepared or very well prepared to perform the tasks in their wartime jobs. Three-quarters of soldiers also 

rated their units as combat ready. Soldiers’ confidence in their job performance and military skills is also 

reflected in their evaluation of civilian-relevant skills. When asked if they agreed or disagreed with the 

statement “I have been taught valuable skills in the Army that I can use later in civilian jobs” 70% expressed 

agreement. Soldiers were even more positive about the effects of their Army experiences on skills and 

characteristics that would help them obfairz civilian jobs; 80% felt that the Army had a positive effect on 

specific job knowledge, skills, and abilities, while 86% felt that the Army had a positive effect on personal 

characteristics and attitudes. In another recent AR1 survey, Benedict (1990) found that even first-term 

soldiers recognized the value of their Army experience with 64% to 77% rating the Army as having a positive 

effect. 

Despite the unsettling times of the first half of 1990, soldiers were positive about recommending 

military service to others. When asked what they would tell a good friend who asked for advice on seeing a 

military recruiter, soldiers were ttearly eight tirves as likely to tell them that it was a good idea (46%) as to tell 

Iltetn that it was a waste of time (6%). The rest (47%) would tell their friend that it was up to him or her, 

apparently recognizing that military service is not for everyone. When asked specifically about enlistment in 

the Army, soldiers were twice as likely to recommend Army enlistment (60%) as enlistment in another 

service (27%). Only 13% would recommend not enlisting in any military service. Even on the very personal 

issue of their own children joining the military, soldiers were also fairly positive. Although less than onethird 

would like to see their daughter join the military, over two-thirds would like to see their son join the 

military at some point. 

Army Downsizing and Career Opportunities 

As we would expect, most ofticers (61% of commissioned and 55% of warrant) and many enlisted 

(40%) said that the chances of war with the Soviet Union were reduced by recent changes in East Germany, 

Hungary, Poland, and Czechoslovakia. However, there is still a perceived threat of war because of internal 

problems in the Soviet Union (economic problems, Lithuanian independence movement, ethnic unrest and 

20

clashes, etc.). It may be that the 61% of officers and 49% of enlisted who said yes to an increased chance of 

war with the Soviet Union were in fact just responding to this item as an increased chance of war, perhaps 

civil and not with the U.S. 

Although some soldiers said recent world events would probably affect what they do in the Army, 

the most likely impacts were seen in force size and promotion potential. As a result of recent world events 

48% said it was likely that demands on their time would increase. These soldiers could worry that mission 

statements will not be scaled back as resources and structure are cut, or that work details may replace 

training time for troops as many experienced during the draw-down of the early 1970’s. Further, two-thirds of 

soldiers (79% of commissioned officers, 69% of warrant officers, and 65% of enlisted) said it is likely that 

promotion oppomolities will decrease as a result of receilt world events. AS one officer commented, “How 

ironic that the very soldiers who brought about this peace dividend are the ones who have to suffer.” 

Reductions in force requirements have decreased soldiers’ confidence in their ability to be promoted 

and to have the opportunity to complete at least 20 years of Army service. Only 46% of commissioned 

officers, 59% of warrant officers, and 52% of enlisted were confident that as the Army becomes smaller they 

would be able to stay in the Army and be promoted on or ahead of schedule. 

Concerning the size of the future Army, soldiers were asked to predict the likelihood of several 

percentage reductions. Voting for as many size-cuts as they felt were likely. Three-fourths of soldiers 

believe today’s Army will be cut by up to 10%; over one-half believe the cuts will be about 20%; and nearly 

one-third believe the cuts will be at least 30%. Commissioned officers voted the reduction as likely to bc 

considerably larger than did the enIisted and the warrant officers. 

For the majority of soldiers, interest in 

serving in the Army is not strongly influenced by 

the size of force reductions that may be imposed. 

However, of the 40% of officers whose interest in 

serving is influenced by the size of the force, threefourths 

are less interested in serving in a pareddown 

Army and only one-fourth are more 

interested (Figure 1). This may well be related to 

fears about quality and career opportunities. Less 

than half of officer, warant and enlisted soldiers are 

conjidettt that the best officers, NCOs, and junior 

(skill level 1) enlisted will stay as the Army becomes 

smaller. 

While the question was not asked directly, 

it is possible that opportunities to exercise 

leadership may also be seen as decreasing in a 

smaller Army, especially by those less interested in 

serving in a smaller Army. It may be important to 

point out to those interested in developing their 

leadership skills that requirements for creative, 

effective leadership are likely to increase during a 

transition; and that opportunities to learn these 

difficult leadership skills will remain high even in a 

smaller Army. It is also likely that these skills will 

be in much greater demand in a civilian sector 

facing its own pressures for streamlining and 

efficiency. 

21

. 

Although less than 10% say they are leaving due to potential changes and cuts, many more arc 

concerned, and as many as 20% think it was a mistake to stay beyond their original obligation. Further 3O’;No 

say it’would take a lot to keep them beyond their current obligation. While only 12% have applied for a job 

in the last year, 41% have sought information about civilian jobs in case they leave the Army. 

One-fourth expect to be RIFed. Even 

more expect to be offered an early out (34% of 

commissioned officers and 44% of enlisted). See 

Figure 2. At least one-half were more concerned 

than a year ago about their long-term opportunities 

in the Army (62%), the kind of work they will go 

into when they leave the Army (56%), whether or 

not they would be able to quickly get a civilian job 

if needed (62%), and financial burden on self and 

family should they have to leave the Army 

unexpectedly (69%). Debt exceeds available 

savings for enlisted and warrant officers. Onefourth 

would also lose other family member income 

because of relocation if separated unexpectedly. 

Over three-fourths reported that it would be 

difficult or very difficult financially to be 

unemployed for two or three months. 

Soldiers are also pessimistic about what the 

future holds. Compared to how satislied they said 

they are today, fewer soldiers expect to be satisfied 

with the Army of 5 years from now in respect to 

job security (57% vs 38%), benctits (57% vs 43%), 

overall quality of life (49% vs 39%), and 

opportunities to do work liked (49% vs 41%). 

Officers also expect to be less satisfied with pay 

and allowances (55Y0 vs 44% for commissioned 

officers and 43% vs 38% for warrant officers). The same percentage (38%) of enlisted are satisfied with pay 

and allowances now as expect to be satisfied with pay and allowances five years from now. Beliefs about the 

future may determine interest in remaining in the Army. Expected satisfaction with future pay, benefits, job 

security, quality of life, and opportunities to do work one likes are each correlated with being more 

interested in serving in a smaller Army. 

Further, soldiers are even more likely to see the Army as suffering from a rapid draw-down than to 

see themselves as suffering. More soldiers agreed that the Army will cut strength so quickly that readiness 

(62%) and morale (68%) will suffer than agreed that they (36%) or their family (41%) will suffer. 

Information Flow 

Three-fourths of soldiers said they are not getting the right amount of information on future 

personnel reductions in the Army; in fact 15% of soldiers said they are getting no information. They tend 10 

credit the A~~J.J Times or other media with providing what information they do obtain. One soldier 

commented that “Our main source of information for issues on RIF, closure of bases, etc. is the mass 

media.” Only about one-half of soldiers think information on cuts in Army strength is reliable when obtained 

from the chain of command; one-third said they did not get information on cuts from the chain of command. 

Overall, 57% think information on the future of the Army that they receive from the Army itself (chain of 

command, post newspapers, etc) is accurate while 40% think it is timely. Roughly five percent of the 

22 

I

espondents were so concerned about this lack of information that they wrote comments about it on the 

questionnaire. 

Although 3% felt they were getting too much information on future personnel reductions, the 

overwhelming majority of soldiers want more information from the chain of command and asked for it in 

their comments: “Please keep us informed! Do not keep us in suspense ’ and “I think families of soldiers 

should have more information about their spouses’ careers and pay raises, early outs, pay cuts etc.” 

Of course, what they really want is for the dust to settle and for all the decisions to have been made. 

As one soldier put it: “I feel that the Army has hurt morale by coming out and saying the Army must 

decrease, way before it is time.” Another soldier expressed it in this way: “I feel that the military is moving 

kind of fast and who knows what the future holds.” Other comments reflect a perception by some that the 

cuts are being made already: “Forget the go slow method . . . Make the cuts/RIFs in one year and get’it ovci 

with . . . using promotion boards in lieu of RIFs is having a terrible effect on morale.” 

Concerns and Needs if 

Involuntarily Separated 

While most of the questionnaire 

dealt with the current attitudes of Army 

soldiers, it also contained questions on 

what soldiers’ concerns would be if 

involuntarily separated as well what help 

they would need in transitioning to a new 

career. Overall, if involuntarily separated, 

more than one-half of pcrsonncl would be 

very concerned or extremely concerned 

about separation pay (70%), health and 

dental care (63%), securing a job (61%), 

unemployment compensation (GO%), and 

health insurance (58%). Further, more 

than one-third were very or extremely 

concerned about advancing their education 

(48%), finding a place to live (46%), child 

care and schools (37%), and spouse 

employment (36%). Because some 

concerns may not be widespread, but may 

be vitally important to those who do have 

the concern, soldiers were also asked what 

were their three most important concerns. 

These most important concerns (adding 

together the three selections) are securing 

a job (over 80%), finding a place to live 

(450/o), separation pay (over 30%), and 

health and dental care (about 30%). ASO 

while only 37% and 36% of soldiers 

overall are very or extremely concerned 

about child care/schools and spouse employment, the percentages . jump . _-_.a to 57% r and ,. 50% I. respectively if we 

consider only those to whom these questions apply. And while only 28% or au ennsted are very or extremely 

concerned about enrollment in GI Bill by paying $1200, 46% of those who are not already eligible are very 

or extremely concerned about this. (Note that officers were not asked about Montgomery GI Bill benefits.) 

If they were to be involuntarily separated, soldiers saw a variety of job search tools as important 

23

including: labor market information and job banks, time-off (not charged to leave) for interviews and 

relocation planning, training and counseling. Specific needs as well as preferences for where services are 

provided will become part of the information base used in planning transition services. 

Personnel Policies and Other Issues 

A major section of each form of the questionnaire (enlisted, commissioned officer, and warrant 

officer) dealt with specific personnel policies and concerns. These issues are being examined by the 

appropriate divisions of the Office of the Deputy Chief of Staff for Personnel, with continuing support from 

ARI. 

Work is continuing on demographic differences and analyses of such issues soldiers’ career intentions 

and perceived vulnerability to involuntary separation. We are also examining the issue of where soldiers 

would move to if invoIuntarily separated. This affects how much the Army would have to pay for 

unemployment compensation and could affect recruiting markets as well. The data are also being made 

available to the Army War College and to Army oflicers at the Naval Postgraduate School for student 

research. 

Several of the survey questions were previously used in AR1 research efforts. Most importantly, 

many of the career intention and commitment items were contributed by ARl’s Longitudinal Research on 

Officer Career (LROC) project (Carney, In Preparation). AR13 Army Family Research Program (AFRP) 

contributed items on readiness, morale, and family situations (Bell, In Preparation). These research groups 

at AR1 are currently including these items in their analyses. 

BIBLIOGRAPHY 

Baker, T. (In Preparation). Potential GeodemokTaphic Effects of Army Force Reduction: mere Soldiers Pfan 

to Move ifseparated. Alexandria, VA: U.S. Army Research Institute. 

Bell, B. (In Preparation). The Amy Family Research Program (AFRP) Slmrvey, Alexandria, VA: U.S. Army 

Research Institute. 

Benedict, M. E. (1990). llle 1989 ARI Recruit Experience Trackirtg Survey: Descriptive Statistics of NPS 

(Active) Army Soldiers (AR1 Research Product 90-16). Alexandria, VA: U.S. Army Research Institute. 

Carney, C. (In Preparation). Longitudinal Research 011 Ofjicer Careers. Alexandria, VA: U.S. Army Research 

Institute. 

Elig, T. W., & Martell, K. A. (1990, October). 77le 199OAmly Career Satisfacfion Slrrvey (AR1 Special 

Report).‘Alexandria, VA: U.S. Army Research Institute. 

Elig, T. W. (In Preparation). 77le 1990 Army Career Satisfaction Survey: Descriptive Stutistics for 

Commissiorled OfJcer, Warrant Officer, and Enlisted Soldiers. Alexandria, VA: U.S. Army Research 

Institute. 

Elig, T. W. (In Preparation). The 1990 Amly Career Satisfaction Survey Technical Manual. Alexandria, VA: 

U.S. Army Research Institute. 

Elig, T. W., Benedict, M. E., & Gilroy, C. L. (1990, June). ARl 1990 Employer Survey Summary RepoJl (AR1 

Special Report). Alexandria, VA: U.S. Army Research Institute. 

Hay, M. S., Sr Middlestead, C. G. (In Preparation). Amtv Force Reductiorts, Soldiers’ Career htentions, anil 

Perceptions of Vltlterability. Alexandria, VA: U.S. Aimy Research Institute. 

24

The Use of Artificial Neural Networks 

in Military Manpower Modeling 

Jack R. Dempsey, D.A. Harris, and Brian K. Waters 

Human Resources Research Organization 

A new ides is delicate. It can be killed by a sneer or a yawn: it can be stabbed to death by a 

quip, and worried to death by a frown on the right man’s brow. 

-Charlie Brower-- 

The military has been a trailblazer in the realm of manpower modeling and personnel measurement. According 

to an old saying, “Necessity is the mother of invention.” Well, due to the formidable recruiting and selection tasks 

facing the Services, pioneering efforts have been made and continue to push the military to or past the state of the 

art. There arc again innovative techniques which the Services are (or should be) considering to aid military selection 

strategies. 

Military selection policies are a topic of high level interest and scrutiny. Each of the Services sets standards 

for selection on the basis of citizenship, age, moral character, physical fitness. aptitude, and education credential. 

The latter two entry criteria are the ‘most visible screening mechanisms and the ones which the Department of 

Defense (DOD) uses to define and report recruit quality levels to Congress and other interested parties. Aptitude, as 

measured by composite scores from the Armed Services Vocational Aptitude Battery (ASVAB), is used to predict 

military technical school performance. Education credentials are used for adaptability screening. That is, they assess 

the likelihood of attrition, or positively, that a recruit will complete an obligated term of service. Both aptitude and 

education crcdcntial standards have been called into question of late by Congressional watchdogs. Actually, the 

flurry of interest in aptitude standards dates back to 1980 when Congress learned that between 1976 and 1980 the 

ASVAB norms were incorrect. This resulted in accepting hundreds of thousands of recruits who did not meet the 

intended minimum aptitude standards. Furthermore, Congress learned, much to its dismay, that enlistment standards 

were validated against training performance not actual job performance. Congress continues to inquire: What is the 

relationship between aptitude and job performance? And, how much quality is needed to ensure adequate job 

performance? A Herculean, on-going, multi-year job performance measurement (JPM) project has provided answers 

to the fist question while the answer to the second is in progress. 

More recently, education standards have come under attack by Congress and educational lobbying groups. 

Currently the plethora of credentials are categorized into one of three tiers based upon attrition rates. Each tier has 

differential aptitude standards and recruiting preferences. While education credential is the single best predictor of 

attrition, objections to this policy revolve around the fact that many individual members of the non-preferred tiers 

arc successful in service and are therefore wrongfully denied enlistment on the basis of group membership. 

The dual problems of linking quality requirements to job performance and implementing more cquiuble 

adaptability screening methods requires innovation. Classical statistical techniques may not provide the answer. 

These military selection questions require more sophisticated and less familiar modeling techniques. Just how do 

techniques such as neural networks complement the more common modeling procedures? The performance prediction 

and attrition screening applications described below provide at least a little food for thought and may suggest that 

a more in-depth look is required. 

Linking Standards to Job Performance 

This project’s purpose is to bring the Joint-Service Job Performance Measurement/Enlistment Standards (JPM) 

Project to fruition. This will be accomplished through four lines of endeavor. First, the military’s recruit selection 

measures (e.g., ASVAB) must be related to job performance in virtually all occupations. Second, a methodology 

must be developed so that empirical data can inform the setting of enlistment standards. That is, me expecled job 

performance of recruits, over their first term of enlistment, should match total job performance requirements. Third, 

improved. trade-off model(s) must be developed so that force quality requirements--based on empirically grounded 

job performance requirements--are considered along with related costs in the determination of enlistment standards. 

Finally, the Services’ personnel allocation systems must be made responsive to empirical information about the 

pcrformancc requirements of particular jobs. 

The data used for the Linkage Project consisted of 8,464 individual scrvicc mcmbcrs in 24 different occupations 

who had been administered hands-on performance tests as a part of the IPM Project. Each record contained ASVAB 

25 

.

subtests scores, time-in-Service, and education credential or diploma status. 

Job characteristics were obtained from an existing Department of Labor data base and represent an assortment 

of information about civilian jobs. The data base contained ratings on work complexity: training times; worker 

aptitude, temperament, and interest requirements; physical demands: and environmental conditions. Over 12,000 jobs 

were rated as part of a massive job analysis project culminating in the publication of the Dictionary of Occupational 

Titles (DOT) (U.S. Department of Labor, 1977). 

Direct ratings of the occupational characteristics of military jobs were not available, however, their ratings 

were estimated from matching equivalent civilian jobs. The results of the Military Occupational Crosscode Project 

(Lancaster, 1984; Wright. 1984) were used to determine military-civilian equivalence. Subsequent to ascribing 

civilian job characteristics, to the population of military jobs were factor analyzed. Initially, a five factor orthogonal 

rotation was adopted as the most appropriate, interpretable, and parsimonious solution. (These factors are referred 

to as PCl-5 in the network that follows). 

Regression Anproach 

Linking standards to job performance requires a performance prediction model. Using performance SCO~CS for 

24 jobs from the JPM project, individual characteristics, and AFQT scores, we examined a model in which each job 

was allowed to have its own intercept. This model was the baseline against which other models and techniques wcrc 

compared. This model gives the performance Pij of individual i in job j as: 

where: 

Pij = hands on performance test score, 

Tij = ASVAB technical composite score, 

Et, = education, 

X, = experience. 

Pij = aj + fij T, + rj E, + Sj X, + 4, 

Note that the subscript j for job on 4, fij, x and 8, implies that there is a different coefficient for each job, which 

arc treated for the moment as fixed. 

The coefficients in this model were estimated with ordinary least-squares by using a vector of dummy variables 

D for jobs and entering D, T, D x T, E, D x E, X, and D x X, The overall R* for this model is R2 = S96 with 

93 degrees of freedom for the model and 8370 residual degrees of freedom. Though a substantial amount of 

variance was accounted for, this model is not generalizable to jobs outside of the particular ones included in the 

model. 

To ensure generalizability, we examined a model in which job characteristics were used to predict various jobspecific 

effects. This is a fixed effects two-level model which uses job characteristics, expressed as a vector Mj to 

predict the job-specific intercepts and coefficients of the individual characteristics. The two-level form of the model 

was expressed as: 

where 

Pij = uj + pj Tij + 1: E, + 8, X, + &ij 

where n,, no, or, and n6 are row vectors of regression coefficients. The fixed effects model actually estimalcd 

Wit.% 

Ej = a + P Tij + y E, + 6 X, + A M, + B Mj Tij + r Mj E,j 

+ A M, X, + E, , 

26

where: 

A = (A,, . . . . A,) , B = (B,, . . . . B,) , l- = (l-,, . . . . I?,) , and A = (A,, . . . . A) 

are vectors of regression coefficients. The RZ for this model is .350. This is considerably smaller than the R* z 

.59 achieved when intercepts were completely unconstrained. This suggests that the job characteristic factor scores 

explain a portion, but by no means all, of the variability in the job specilic intercepts. 

The Neural Network Approach 

Having witnessed the rather large degradation in variance explained between Model I and Model II, i.e., 

R2=.595 to R*=.350, a neural network paradigm was investigated. Once the candidate explanatory variables were 

determined from the second model, the next step was to construct a neural network capable of analyzing the problem. 

Actual construction involved the following five steps which specified: 

o network type 

o number of nerodes in the output and hidden layers 

o training and cross validation gamples 

o transfer function at each layer and global error function 

o scaling, learning, momentum, epoch size parameters 

Network Architecture 

Since the problem involved a (hetero-associative) mapping of continuous, dichotomous, and polytomous 

explanatory variables to a bounded continuous criterion measure of hand-on-performance, a forward-feed backward 

error propogation network was chosen. 

Ostensibly, the single output “neurode” was hand-on-performance test score. Because the number of neurodes 

in the hidden-layer of the feedforward network determines the complexity of the function the network is capable of 

mapping, 26 was determined to yield a sufficiently complex network. Notably, it has been shown that any 

continuous function or ”.,.mapping can be approximately realized by Rumelhart-Hinton-Williams’ multilayer neural 

network with at least one hidden layer whose output functions are sigmoid functions (Funahashi, 1989; Homik, 

1989).” 

The data were randomly split 60140 into two sets. The first (N=5,078) was used to train the network and the 

second (N=3,386) was used to validate the network. The transfer function for the output neurode was logistic, while 

the transfer functions for the hidden neurodes were hyperbolic. Although any error function which is continuously 

differentiable could have been used, we selected the squared deviation between the observed and prcdictcd output 

values. Graphically, the network is shown in Figure 1 below. 

reject Llnkage Cumulntfuo Back-l’mpagatIpn fictuark OH Ilyperbolic Transfer 

z 

.,:~~,,;..I..~.j:.(-:.:.. 

J;;,lx _ 

.,.:~eLy;..;.‘?, ,: ! :‘, .; ‘.:.:.::?I;: ..:. .i-*a,. .:‘...i ‘!,.. 

.~.:.~:,~:~.~.~~;.“: ;‘,. ; i

?he dara were scaled to network values between -0.85 and +0.85. Scaling ensured that the neurodes would not 

home saturated at the transfer function extremes. When this occurs learning ceases because the gradient of the 

error function approaches zero asmytotically. To guard against this, a nominal offset of 0.005 was added to each 

derivative. Finally, the leaming coefficient was initially set at 0.9 and gradually reduced as learning progressed. 

A momentum term of 0.5 was initially used and also gradually reduced. The learning rule chosen was the 

normalized cumulative delta rule’with and epoch size of five hundred. 

Results 

Once the network was trained on approximately one million random presentations of observations, the network 

was then evaluated against the cross-validation sample. The results are presented below. 

RZ R 

Model I .595 .77 

Model II .350 .59 

Neural Network ,574 .76 

:. 

A Chi Square Goodness-of-Fit test was then performed and the hypothesis that the predicted and observed came 

from diffcrcnt populations could be rejected at the .95 level of confidence. Notably, the neural network crossvalidated 

cocfficicnt practically matches the unvalidated coefficient for Model I. 

The results achieved using a neural network to predict job performance were far superior to regression based 

approaches when generalizability is considered. The above results provide an impetus to expand the investigation 

of neural networks to the Adaptability Screening Project. 

Adaptability Screening Project 

The Adaptability Screening Profile (ASP) project has been described in detail in previous work (Sellman, 1989; 

Trcnt, 1987). Succinctly, the purpose of the project is to: (1) develop a biographic instrument capable of assessing 

an individual’s propensity to adapt to military life; (2) determine its operational utility in predicting an individual’s 

likelihood of successful completion of an initial term of enlistment: and (3) utililize the instrument as part of an 

enlistment screening procedure, Because biodata instruments are assumed to be fakable and/or coachable, as part 

of any ultimate implementation, there must be a mechanism to detect and correct for response pattern distortion. 

Certainly, this is a difficult task. As reported by Walker (1989), the Army’s previous attempt at large-scale biodata 

implementation of the Military Applicant Profile (MAP) failed. The failure resulted from several factors, including 

lack of an on-going score monitoring system capable of detecting response pattern distortion. For whatever reason, 

the MAP validity for predicting attrition fell to zero, that is, it became useless for decision-making about individual 

applicants. To prevent a full-scale implementation of ASP from suffering a similar fate, NPRDC directed HumRRO 

to develop a score monitoring system. The purpose of the ASP score monitoring system is to: (1) deter faking and 

coaching; (2) detect response pattern distortion if and when it occurred; and (3) to estimate the effects of such 

distortion so that statistical adjustments could be made to counter the effects of the distortion. Because a more 

complete discussion of the score monitoring system is contained in Waters (1989), the following discussion will 

concentrate on attacking the distortion problem using neural networks. 

Armed Services Aoolicant Profile 

The ASAP data base consists of 120,175 applicants to the four Services. Administration occurred during the 

Lhrce month period commencing December 1985 and ending February 1986. Of the applicants, 55,675 were 

acccsscd. These records form a cohort file which will be appended with additional demographic data elements from 

the Military Enlistment Processing Reporting System (MEPRS) and the Defense Manpower Data Center Edited 

Enlisted Active Duty Master file. Each record will be updated with the inter-Service separation code (ISC) which 

will form the basis for 48-month criterion development. Faking/coaching will be simulated by intcnlionally 

distorting response patterns to varying degrees. 

Chsskal Approach to Response Pattern Disrtortion 

One approach to detecting response distortion is to develop a regrcsion based prcdiclion system which rclatcs 

background characteristics of applicants with point estimates of ASP score means, variances, skew and kurtosis 

indices. Demographic information on race, gender, education, home of record, age, number of dcpcndcnts and many 

other variables are available as predictors of ASP score. Accurate prediction would permit analysts of how well 

operational ASP data bchavcd as compared with “norming” group data, for the total group as well as subgroups. 

28

In attempting to relate ASP score to demographic characteristics, an ordinary least squares regression was run. 

The results yielded an ti =.213 and a root mean square error of 9.198. The large standard error is providing the 

motivation to determine whether these results can be improved upon using a neural network approach that attempts 

to map responses as opposed to total scores. 

Neural Network Approach to Distortion Detection 

The network paradigm that is currently being investigated is the cumulative backward error propagation 

network. The network has fifty outputs representing individual responses. The hidden layer includes 120 ncurodes 

and each uses a hyperbolic transfer function. The inputs include the same demographic information that was 

hypothesized to be related to the ASP score in the earlier regressions. The network is shown graphically in Figure 

2. 

:umulativs Backward Error-Prnpagatton Network 

Figure 2. Response Pattern Distortion Detection Network. 

Due to the number of calculations that are involved in training the above network to recognize response pattern 

distortion, a mainframe version of the cumulative backward error propagation neural network has been written for 

the IBM 4381 and implemented at the Navy Personnel Research Development Center (NPRDC). The current 

implementation is written in FORTRAN 77. Other network paradigms such as Grossberg’s Outstar , counter 

propagation, and others are in the process of being added. Although, results arc extremely encouraging, it is 

premature to report them at this time. 

Summary 

CertainIy neural network technology is still in its youth, nevertheless it has experienced significant growth in 

recent years and the momentum shows no signs of slowing. Initially, the technology had a “black box” image, but 

recent articles such as those by Ho&c and Funahashi demonstrate that neural networks is well founded in 

mathematical theory and has statistical roots. That is to say, that a simple ordinary lcast squares regression can bc 

expressed as a neural network, albeit a simple one. Neural networks have the potential for providing unique 

approaches and insights into, heretofore, intractable problems. In the context of military manpower research, the 

jury isn’t still deliberating, because all the evidence has not yet been presented. But when it is, we may find WC 

have new answers to old problems. 

29

REFERENCES 

Funahashi, K., (1989) On the Approximate Realization of Continuous Mappings by Neural Networks. Neural 

Networks. Z(3) 183-92. 

Homik, I

Hispanics in Navy’s Blue-Collar Civilian Workforce: A Pilot Study1 

Jack E. Edwards, Paul Rosenfeld, Patricia J. Thomas 

Navy Personnel Research and Development Center 

San Diego, CA 

The 1964 Civil Rights Act, Title VII mandated equal employment opportunity (EEO) for all persons rcgardlcss 

of race, color, creed, national origin, or gender. Congress amended the Civil Rights Act in 1972 to require most 

fcdcral agcncics to have programs that would help implement EEO policies. During the quarter of a century.since 

the passage of the Civil Rights Act, Blacks, as a group, have made significant inroads into both previously 

segregated organizations and segregated jobs within integrated organizations. Hispanics, however, have not been as 

successful in attaining employment opportunities. 

The Department of the Navy has been’unable to attract Hispanics in proportion to their representation in the 

U.S. labor force. In 1980, Hispanic representation in the civilian Navy work force was 3.2% compared to 6.4% in 

the total U.S. civilian labor force (CLF). Since 1980, the Navy’s civilian Hispanic rcprcscntation has incrcascd by 

only 0.3 percentage points to 3.5% while Hispanics in the CLF have increased 1.8 percentage points to 8.2% 

Moreover, the Navy’s 3.5% rate of Hispanic employment in civilian positions lags behind Hispanic representation 

rates of the Air Force (9.5%), Army (5.0%). and other federal agenicies (5.2%) (Secretary of the Navy, 

memorandum of 16 May 1989). Given projections that by the year 2000 Hispanics will constitute nearly 11% of the 

total U.S. population (Koretz, 1989), it is clear that the Navy needs to “intensify efforts to increase the number of 

Hispanics in the civilian work force” (Secretary of the Navy, memorandum of 16 May 1989). 

The underutilization of Hispanics, the projections of dramatic Hispanic population growth, and the potential 

benefits to the Navy of greater Hispanic rcprcsentation attest to the need for focused research on the Hispanic undcrrepresentation 

problem. An initial step toward the better utilization of this valuable human resource is to identify 

the barriers that have prevented Hispanics from obtaining parity in the work place. Toward this end, the Navy 

instituted a four-year EEO Enbancemcnt Research Project to increase Hispanics’ opportunities for employment 

parity. Previous project work has focused on the difficulties of accurately defining the Hispanic underrcprescntation 

problem (Edwards & Thomas, 1989; Thomas, 1987), a literature review on the relationships of attitudes and 

demographics to work outcomes (Edwards, 1988), and the geographic mobility of Hispanics for employment 

(Edwards, Thomas, Rosenfeld, & Bowers, 1989). 

Although Navy-related studies of Hispanics have been rare, one previous intensive research effort was 

concerned with the barriers faced by Hispanic Navy recruits (cf., Triandis, 1985). In a summary report of their 

Navy-funded studies, Triandis (1985) noted that he and his colleagues had found more similarities than diffcrenccs 

in comparisons among Hispanic, Black, and Anglo recruits. Triandis suggested that Hispanic Navy recruits of the 

early 1980s were not typical of Hispanics in the general population. In several reports, Triandis and colleagues 

argued that their research participants were so acculturated as to be indistinguishable from the mainstream of 

American culture. An‘important job-related component of acculturation is the ability to communicate in English. 

The National Commission on Employment Policy (1982) noted that poor English skills and lack of education arc 

two major reasons for Hispanic labor-market difficulties. 

Acculturation should be considered when determining whether Hispanic employees are different from their 

Anglo peers. Consideration of acculturation is also important in determining whether an organization is recruiting 

from the full Hispanic population or only from an acculturated portion as Triandis (1985) suggested. A need exists 

to determine whether there are differences among the Navy’s acculturated Hispanics, ~SS acculturated Hispanics, 

and Anglo majority group in its civilian workforce. 

‘The opinions expressed in this manuscript are those of the authors. They are not official and do not represent the 

views of the Navy Department. The authors gratefully acknowledge the assistance of Luis Joseph, Jerome Bower 

and Walt Peterson. 

31

. 

Met hod 

Samwle 

vrecruits. The sample was selected from newly hired men in semi-skilled or journey-person jobs as 

Department of Navy craftsmen, mechanics, operatives or service workers at 14 Navy activities in the contincnal 

United States. Each Hispanic male who entered one of the jobs was asked to voluntarily complete a questionnaire 

during his first week of work. A comparison Anglo male was also surveyed whenever his entry into a similar job at 

the same activity followed the entry of a surveyed Hispanic male. 

Resnondents. Six of the 160 completed questionnaires were discarded because the persons who idcntificd 

themselves as Hispanic indicated that either (a) his primary language was something other than English or Spanish 

or (b) his country of origin (e.g., Lebanon) was not such that findings from those individuals would generalize to 

persons from more commonly identified Hispanic lands. The surveys for three additional Hispanics could not be 

used because the participants did not supply responses to the acculturation index. As a result, 76 Hispanic and 75 

Anglo surveys were analyzed. 

Survey Instrument 

The questionnaire contained 111 items some of which were included as part of a longitudinal study. Results 

pertaining to only four of the categories: demographics, acculturation, need for clarity, and potential factors 

considered when taking a job, are reviewed in this paper. A pre-test of the survey determined that it could bc 

completed in less than 30 minutes. The average readability of the questionnaire was below the sixth grade rcading 

level. 

Acculturation. The four-item acculturation scale was pattcmcd after Kuvlcsky and Patella’s (1971) five-item, 

ethnic-identification scale. Respondents indicated how frequently they used a language other than English when 

they talked to family members, talked to friends, read a newspaper, or listened to a radio or TV. The anchors for the 

rating scale were never (I), almost never (2), sometimes (3), usuallv (4), and always (5). 

Need for clarity. Lyon’s (1971) four-item, need-for-clarity index asked respondents how important it was to 

know in detail: what is to be done, how the job is supposed to be done, the limits of the respondent’s authority, and 

how well the respondent is doing. Respondents completed the need-for-clarity items using the following rating 

format: not imwortant (l), neither unimoortant nor important (2), somewhat imoortant (3). important (4). and !&ty 

imnortant (5). Respondents were also given the option of indicating that an item was not true (0); such answers 

were treated as missing data. 

Potential factors considered when taking a job. Four types of factors were investigated: importance of jobrelated 

factors, work-group composition, sources of recruitment, and job-search activities. 

Procedure 

Definine Hiswanic acculturation Prouns. The Hispanic respondents were grouped into high a = 35) and low @ 

= 41) acculturation groups based upon their responses to the four-item scale. For all analyses, respondents whose 

mean acculturation scores were 2.00 or less (i.e., the respondents who a or almost never used Spanish) were 

classified as high acculturation Hispanics (HAHs): the remainder of the Hispanic respondents were classified as low 

acculturation Hispanics (LAHs). 

Analvses. Whenever percentages are shown in a table, a chi-square test of independence was conducted to 

examine whether a relationship existed between group membership (Anglos, HAH, and LAH) and responses to an 

item or a composite. Whenever means are shown, a one-way analysis of variance (ANOVA) was perform@ with 

group membership as the independent variable and an item response or a composite as the dependent variable. A 

significant ANOVA result was followed by a Scheffe post hoc test to determine the source(s) of the difference. For 

all primary and secondary analyses, the probability level was set at .Ol. This significance level was chosen as a 

balance for three considerations: the exploratory nature of the research, the huge number of contrasts performed, 

and the already low statistical power caused by the sample sizes. 

Results and Discussion 

Dcmoeranhics 

In general, the Anglo and Hispanic groups were very similar (see Table 1). All three groups averaged about 34 

years of age, more than 12 years of education, and approximately 17 years of working for pay. Almost all of the 

respondents reported that they had been employed previously on a full-time basis and that they were not currently 

members of a union. The members of each group averaged similar amounts of time (between 4.50 and 6.75 years) 

in their last full-time job. 

32

34.81 33.60 34.00 

12.60 12.54 12.28 

17.92 16.69 17.16 

1.4% 2.9% 10.0% 

6.64 4.59 6.66 

40.0% 34.3% 48.8% 

9.1% 16.7% 16.7% 

20.0% 22.9% 37.5% 

Table 1 

Demographics 

4. Age (Mean number of years) 

5. What is the highest grade you completed in school or college? Count a 

GED as 12 years. 

6. Since you became 16, how many years have you worked for pay? 

56. Is this your first full-time job? (Answered “Yes”) 

If “No” how long were you employed full time in your last job? (years) 

10. Are you a veteran? (Answered “No”) 

11. Are you a member of a union? (Answered “Yes”) 

12. Have you worked for the Navy in some other civilian jobs? (Answcrcd 

” yes”) 

Two interesting but non-significant differences were observed. Compared to both Anglos and HAHs, a larger 

proportion of the LAHs reported having worked in other civilian Navy jobs. Second, 65.7% of the HAHs wcrc 

veterans. That proportion is higher than either the 60.0% for Anglos or the 5 1.2% for the LAHs. 

The overall similarity of the three groups with regard to demographics both clarifies and cautions the 

interpretation of subsequent findings. The similarity weakens any argument that demographic differences wcrc at 

least partially responsible for any subsequent difference among the groups. For example, the similarity with regard 

to veteran status lessens the possibility that the additional points awarded to veterans would differentially affect the 

time between application and employment for one or more groups. Still, caution must be cxcrcised in the 

interpretation of these and subsequent findings. One reason for caution is the atypicality of the Hispanics in this 

sample with regard to education. The Census Bureau (U. S. Department of Commerce, September 7, 1988) reported 

that 51% of all Hispanics aged 25 and above had completed high school and/or college during 1987 and 1988. 

Although this is an all-time high for Hispanics, it is still markedly lower than the 78% completion rate for non- 

Hispanics. Therefore, even though the three groups in this study are similar in terms of education, this study’s 

Hispanic sample is different from the Hispanic population. Second, conclusions are tenuous because of the small 

sample and low statistical power. 

Need for Clarity 

All three groups indicated a very high need for clarity, with LAHs reporting the highest need for clarity. The 

need-for-clarity scale mean for LAHs (4.72) was significantly higher than the mean for Anglos (4.33) and 

nonsignificantly higher than that of HAHs (4.49). The situation in the Hispanic population may be more extreme 

than implied by that small difference. The lower education level of the Hispanic population, in comparison to the 

sample participating in the present study, may result in yet more need for clarity by less-educated Hispanics. 

Gould (1982, p. 97) cited several studies that have shown that “Mexican-Americans do not tolerate ambiguity 

and uncertainty well”. The strong authoritarian role of fathers and emphases on sex roles and discipline in such 

families were suggested as possible reasons for Gould’s findings. The significant need-for-clarity difference found 

in this study also supports Ash, Levine, and Edgell’s (1979) finding that when given a chance to choose tasks, 

Hispanic (more so than Black or Anglo) job applicants disproportionately indicated a preference for jobs in which 

others would tell them what to do next. 

Potential Factors To Be Considered When Takine a Job 

Importance of iob-related factors. Table 2 shows the mean ratings for each group for each of the 10 factors. In 

addition to all three groups evaluating each factor at essentially the same level of importance, the average ratings for 

the factors showed the same pattcm across the three groups. The 10 Anglo means correlated .93 @ < .OOl) with the 

10 corresponding HAH means and .94 (p < .OOl) with the 10 LAH means. The HAH and LAH means correlated .84 

(r! < .OOl). The most important factor for Anglos and HAHs, and nearly the most important factor for LAHs, was 

the job security provided by the government. These findings show that all three groups valued the same rewards 

and outcomes and that the average value placed on any factor did not vary by group when ethnicity and 

acculturation were examined. 

33

. 

Anglo HAC &g g&g 

4.00 4.48 4.33 41. 

3.98 4.00 4.12 48. 

3.97 4.23 4.37 46. 

3.93 4.29 4.38 45. 

. 3.83 4.20 4.22 42. 

3.75 3.65 4.28 43. 

3.74 4.12 4.23 40. 

3.65 4.03 4.17 47. 

2.93 3.48 3.05 44. 

2.33 2.24 3.04 49. 

13.78 12.51 14.24 

4.64 3.15 4.08 

Table 2 

Potential Factors to Be Considered When Taking a Job 

Importance of Job-Related Factors 

Working for the government provides a lot of job security. 

I think the job will be interesting or challenging. 

The government provides EEO for promotions, training, etc. 

Benefits (time off, health ins., etc.) are good. 

The pay is good. 

The hours of my work schedule arc good. 

I badly need a job. 

I can learn a new skill. 

I don’t have to drive too far or can take a bus. 

I’have friends or relatives working here. 

Work-Group Size Preferences 

65. What size group would you like to work in? That is, how many pcoplc, 

counting yourself, would you like your boss to supervise? 

66. Imagine you were working with 10 other people everyday. How many of 

those people would you like to be of your race and ethnic group? 

Recruitment: How did you find out about this job? 

% Indicating Source@ (Place an “X” by as many answers as apply and write in the information 

asked.) 

48.6% 42.9% 56.1% 17. From a friend or relative 

21.6% 22.9% 12.2% 16. Federal job listing 

12.2% 11.4% 14.6% 15. Newspaper ad 

10.8% 11.4% 14.6% 22. Employment office or program 

10.8% 17.1% 12.2% 23. Other 

2.7% 0.0% 0.0% 21. School counselor or training program 

2.7% 0.0% 0.0% 19. I was a trainee or intern for this job. 

1.4% 0.0% 7.3% 18. From the union 

1.4% 2.9% 12.2% 20. EEO office 

.’ Job Search 

3.22 2.21 3.31 57. How many months passed between the final day of work on your last fuhtime 

job and your firs& day at work on this Navy job? 

5.02 3.60 4.23 58. How many months did it take from the time you filed your application for 

this job and your first day of work. 

2.44 4.00 3.89 59. How many times during the last 3 months did you check the Federal 

govcmmcnt job listings? 

1.35 1.45 I.97 60. During the last 12 months, how many Federal govcmment jobs did you 

apply for? 

4.47 3.26 4.02 61. During the last 12 months, how many other jobs did you apply for? 

Nole: @ The totals for rhe Recruitment columns we greater rhan 100% because respondenls could indicate more 

rhim one source. 

34 

.

Work-g;rouD comoosition. The average dcsircd number of persons sharing the respondent’s race/cthnicity was 

the same across the three groups (see Table 2). On average, Anglos dcsircd to work in groups that were 46.4% 

Anglos; HAHs, 31.5% Hispanics; and LAHs, 40.8% Hispanics. 

Given that less than 10% of the current U.S. population is Hispanic, the average desirable composition of the 

work groups for Hispanics may be unobtainable (even in locations such as those in this study that exceeded Ihc 

current national average). Furthermore, assigning a disproportionately high number of Hispanics to the same work 

group could result in segregated work groups and open an organization to discrimination complaints. 

Sources of recruitment. Nine chi-square tests of independence found no significant relationship bctwcen group 

membership and method of recruitment (see Table 2). Nearly half of all the respondents indicated that they found 

their jobs through a friend or relative. Because there are proportionally a great many more Anglos than members of 

other ethnic/racial groups working for the Navy and because the Navy already suffers from Hispanic 

underrepresentation, continued reliance on this recruitment method may perpetuate the current representation 

problems. Also noteworthy is the fact that so few persons were recruited by employment and EEO offices. 

Affirmative action recruitment apparently was not being done or at least was not being done effectively. 

Job search. Group means for the months spent getting the current job and the activeness with which the newly 

hired employees were previously pursuing kmployment opportunities are shown in Table 2. The short time bctwccn 

leaving a previous full-time job and obtaining employment with the Navy suggests that many of the newly hired 

employees from all three groups were working elsewhere until the time that they were hired by the Navy. For the 

olher non-significant difference for a time-related variable, both Hispanic groups were, on average, marginally 

faster than Anglos in obtaining their new jobs. Together, these time-based questions seem to indicate that Hispanics 

and Anglos are being treated equally during the hiring phase whenever they have similar job-related demographic 

chtiacteristics such as education and veteran’s preference. 

No ethnic or acculturation difference was detected for the three items measuring how actively the respondents 

were seeking their jobs. During the year prior to completion of the survey, the average number of jobs applied for 

was 6.00 or less for all three groups. 

Conclusions and Recommendations 

A goal of the present study was to identify factors among newly hired personnel that might help to explain tic 

reasons for Hispanic underrepresentation in the Navy’s blue-collar civilian work force. Overall, the results indicate 

Lhat both high- and low-acculturated Hispanics were more similar to Anglos than they were different. These 

similarities were obtained for both demographic variables and factors potentially influencing decisions to take a new 

postion. Echoing Triandis’ (1985) findings with Hispanic Navy recruits, the results of the present study indicate that 

the Navy is attracting Hispanics into its blue-collar workforce who are indistinguishable on a variety of dimensions 

from the majority (Anglo) group. As research on Hispanics in work settings continues to grow (e.g., Knouse, 

Rosenfeld, & Culbertson, in preparation), it will be of interest to see whether Hispanics entering other govemmenl 

and private-sector organizational settings are likewise similar to Anglos on key psychological and organizational 

dimensions. If indeed these Hispanics are, then organizations may need to refocus their efforts to attract those 

individuals whose characteristics are more reflective of the Hispanic population rather than a subgroup who arc 

indistinguishable from Anglos. 

This investigation did, however, reveal one organi;lational practice (recruitment) and one individual-diffcrcncc 

variable (need for clarity) that could be contributing to the lack of parity for Hispanics. The following interventions 

are suggested for dealing with those issues. 

1, Usemore An investment 

in formal recruitment (e.g., advertisements and job fairs designed especially for Hispanic communities) could ease 

future recruitment costs as Hispanic numbers continue to increase. If no change in recruitment procedure occurs, 

these findings suggest that the Navy will continue to experience non-parity for Hispanics. The Office of Pcrsonncl 

Management’s recently formed “Project Partnership”, an alliance with the Hispanic Association of Colleges and 

Universities and National Image, Inc., may prove useful as a means of increasing the number of Hispanics 

recruited(Weeklv Federal Emnlovees News Digest, March 19,199O). 

2.Enhance 

clarity. The Navy already has the required vehicle for implementing such training in the form of supervisory EEO 

training sessions. Supervisors could be presented with (a) methods for structuring tasks and duties and (b) the 

35 

.

. 

processes used in mentoring. While these intcrvcntions may be specifically designed to aid less acculturated 

Hispanics, they also can help employees from other ethnic and racial groups. 

References 

Ash, R. A., Levine, E. L., & Edgell, S. L. (1979). Exploratory study of a matching approach to personnel sclcction: 

The impact of ethnicity. Journal of ADDlied Psvchology, @, 354 I. 

Bumam, M.A., Telles, C.A., Kamo, M., Hough R.L., & Escobar, J.I. (1987). Measurement of acculturation in a 

community population of Mexican Americans. Hisoanic Journal of Behavioral Sciences, 9, 105-130. 

Edwards, J. E. (1988). Work outcomes as nredicted by attitudes and demogTaohics of ‘Hisoanics and nonHisDanics: 

A literature review (NPRDC Tech. Note 88-23). San Diego, CA: Navy Personnel Research and Devclopmcnt 

Center. 

Edwards, J. E., & Thomas, P. J. (1989>6 Hispanics: When has equal employment been achieved? Pcrsonncl 

Journal. 68, 144, 147-149. 

Edwards, J. E., Thomas, P. J., Rosenfeld, P., & Bower, J. L. (1989, August). Movine for emnlovmcnt: Are 

Hispanics less geogranhicnllv mobile than AnPlos and Blacks.3 Paper presented at the meeting of the Academy 

of Management, Washington, DC. 

Gould, S. (1982). Correlates of career progression among Mexican-American college graduates. Journal of 

Vocational Behavior, 3.93-l 10. 

Knouse. S.B., Rosenfeld, P., & Culbcrtson, A. (Eds.). (in preparation). Hispanics and work. Ncwbury Park, CA: 

Sage. 

Koretz, G. (1989, February 20). How the Hispanic population boom will hit the work force Business Week, 21. 

Kuvlesky, W. P., & Patella, V. M. (1971). Degree of cthnicity and aspirations for upward social mobility among 

Mexican American youth. Journal of Vocational Behavior, 1,231-244. 

Lyons, T. F. (1971). Role clarity, need for clarity, satisfaction, tension, and withdrawal. Organizational Behavior 

and Human Pcrformancc, 4,99-l 10. 

Marin, G., Sabogal, F., Marin, B. V., Gtero-Sabogal, R., & Perez-Stable, E. J. (1987). Development of a short 

acculturation scale for Hispanics. Hisnanic Journal of Behavioral Sciences, &183-205. 

National Commission on Employment Policy. (1982). Hisnanics and iobs: Barriers to nrotzrams. Washington, DC: 

Author. 

Rojas, L. A. 1982. Salient mainstream and Hisnanic values in a Navy training environment: An anlhroDolocical 

descrintion (Tech. Rep. No. ONR-22). Champaign, IL: University of Illinois, Department of Psychology. 

Secretary of the Navy (1989, May 16). Memorandum on HisDanic EIIIDlOYment. 

Thomas, P. J. (1987). Hisoanic underreoresentation in the Navy’s civilian work force: Definine the Droblcm (Tech. 

Note No. TN 87-31). San Diego, CA: Navy Personnel Research and Development Center. 

Triandis, H. C. (1985). An examination of Hispanic and peneral nonulation DerCCDtiOnS of OrmkatiOnai 

environments: Final reoort to the Office of Naval Research. Champaign, IL: University of Illinois, 

Department of Psychology. 

U. S. Department of Commerce, Bureau of the Census. (1988, September 7). Hisoanic educational attainment 

highest ever, Census Bureau Reoorts. Press Release from United States Department of Commerce News, 

Bureau of the Census. 

U. S. Department of Commerce, Bureau of the Census. (1985). Persons of Soanish Origin in the United States: 

March 1985 (Advance Report). Washington, DC: U. S. Government Printing Oflice. 

Weekly Fe&al EmDlOvecs News Digest. (1990, March 19). p. 4. 

36 

.

1 . INTRODKTICi? 

DESCRIPTORS OF JOB SPECIALIZATION 

BASED ON JOB KNOWLEDGE TESTS 

by 

C. Lee Walker, Omnibus Technical Services 

Jeffery A. Cantor, Lehman College, CUNY 

This study was un&rtaken to detxmine if job knowledge t2s.t and training 

history data could be usad to define a billet sutstructurs riflectirq specialization 

on certain equipments for which a Navy Enlisted Classification 

(NEC) was responsible. It was hypothesized that such specialization, if 

existing, I:ould be recognized by score patterns in System Achievement 

TCX6 (SATs) and by related patterns in the use of advanced training. 

Specinlizat.ion, thus id5ntif ied , could be confirmed by limit.& fleet sur- 

veying. The investigation produced data which suggested specialization but 

more particularly it produced tours e use patterns and course/SAT score r2lationships 

which provide insight into the way ships use advancad training 

to support readiness. This paper presents information on the methodology 

thus derived with the hope that it will provide a point of departure for 

other parsons faced with devaloping training analysis methodologies. 

2. APPROACH 

The Poseidon Fire Control Technician (NEC 3303) was chosen as the subject 

for the investigation &cause with six members per crew it provided an adequate 

population for developing a methodology that could then be applied 

to to larger populations. lWo basic methods of investigation were used: 

(1) Reviews of Personnel and Training program Evaluation Program(PT2P) 

Personnal Data ,Qstern (PDS) data and (2) discussions with training petty 

officers. The PDS review data was used as a guide in the discussions With 

fleet personnel. 

37

2 .1 PERSONNEL DATA SYSTEN INFORMATION. Scores, course attendance, and 

duty stations were extracted from the PDJc system for all FTB 3303 pereonnel 

. 

2.1.1 Studs PopulatiorA. The study hypclthesis required that the records of 

personnel be reviewed for events or changes occurring over the course.of a 

person’ 6 career in order to determine at what point in service spccializa- 

tion on an equipment or gqoups of equipments took plac?. For the study, 

this speciaiization or sub structure was of interest for personnel in sub- 

marine crsws . These comprise the bulk of the NEC. because of th* continu- 

ing evolution of equipment, training and measurement, some time limit 

needed to applied to the data used so that analysis results would be rel- 

evant . tiuilding on these general requirements three criteria were cstab- 

1iShtd for selection of records for analysis. For each record s~lcct& ti 

person h&d t0 have : 

a . Graduated from “0” school after the proscribed start date. 

b. Reported to the crew of an operating SSbN directly after “C’ school. 

c . Taken four or mora SATs . 

Review of data extracted indicated that few personnel in paygrade E-4 had 

sufficient SAT records to be useful for analysis. Personnel in pbygrtide E- 

6 had duty station histories which made a “cause-effect ” analysis of 

school racords not very uscf ul . It. was tl-ieref ore determined to use a sur- 

vey population of persons serving in paygrade E-5. Ninety-six persons in 

paygradc E-5 met the criteria and all were used in the study. preliminary 

analysis was done by randomly dividing the the population into two equal 

groups to look for consistency of results. Thi study methGdology was also 

applied to 16 persons in paygrade E-6 meeting similar critsria to ensure 

thbt no discontinuities were introduced by limiting the population to piygrade 

E-5. 

38

c:- TORc OF JOB SPEr:aON RAm ON JOS KNOWLEIXE TESTING 

IL crder LO analyze for the point in a persons career at which respcneibility 

for or expertise on a particular equipment was achieved, events of 

interest were assigned to relative Time Groups. The Time Groups were based 

on patrol Gycl es after graduation f ram Vu School. Schools or SAT scores 

were recorded for each individual in the patrol cycle sequence to which 

they belonged rather than being assigned on a calendar year basis. 

Since some records reflected more patrol cycles than others, the part of 

the survey population still present diminished in the later Time Groups. 

Table 1, shows the Time Groups used and the records with informatiax, for 

each Time Group. 

Table 1. Timi: Grout, Pooulation 

- - -- 

Time GL-GUP 1 2 3 4 5 6 7 5 

PoJuiat ion 96 96 56 96 67 45 I 29 _ -- 11 l 

The events, 1, e, , sco’res, schools, were then summarized based on the Time 

Group into which they fell. 

2.1.3. ,sE?‘J’ St:ore Q.-J- lty.qiq . The SATs us& for this aIia&SiS were administxed 

upon completion of “CJt Schoo14’ and during each SSBP? “Off Crew” period. 

the tests are broken into several equipment dependent areas. Test 

versions are changed every five months with occasional changes in the number 

and size of areas between test versions. The score analysis was bses 

on test areas with scores recorded in their appropriate Time Groups. Since 

individuals entered the system at different times, results from several 

test versions were included in each Time Group. To dampen the difference 

between test versions normalized scores were used as the basis of anaiyais. 

It was hypothesized that specialization related to a billet sub6tructure 

should be ref lect-ed in higher SAT scores in an area. This 

39 

.

1~sp2cializat iori”scc~rc el?vat ion should occur in a w&y to lx idantifisd 

from the normal increase in scorers 

associated With time. Both overall SAT scores and scores one standard dz- 

viation above the mean (r&I) were examined as possible indicators. Scores 

of 60 or greater, cotiJiIied with other training related data were selected 

as the most sarviceabls indicators of specialization. 

2.1.4. Course .2na3v.=is. Attendance at advanced FTE! rating training courses 

was analyzed. Course attendance was recorded in its appropriate Time Group 

and also identified in relgtion to specific SAT administrations. 

2.1.5. Score., Course, and Experi+-nre In&stors . Many relationships be- 

tween scores, courses,and time were examined as possible indicators of 

speciaiizstion , It was felt that the indictors used should be straightforward 

to derive and easy to interpret . The following paragraphs detail 

the indicators chosen and their role in interpretation. 

2.1.5.1. Scores Equal to or Greater than Sixty (60s) . This is a count for 

either grk:lupa or test areas of the number of scores above 60 occurring. 

Scores over 60 in any Time Group which differ greatly from those expected 

in a normal distribution suggest that specialization is occurring at. that. 

point. Conversely, a less than expected number of 60s suggest limited employm~nt 

in %hat area during the Time Group in question. 

2. i .s .2. krScJilS P.eceiving Scores Equal to or Greater Than Sixty (6OP). 

This is a count of the persons receiving scores equal to or greater than 

60. Each person i s counted only U-12 first %ime he recaives a 60 ~:br 

greater. This is a means of determining if the same or diffcr?nt people 

are yett ing the high scores. 

. 

2 . 1 . fi . 3 . kmber of Sixties per Person (6096OP) . This is a ratio of thi 

number of scores of 60 or above to the people getting the ScoreE. The runher 

must always be one or greater, with higher numbers indicating mart 

repetition of 60 scores. Repetition suggests continuing on the job r?in- 

Iorcement . 

40

2.1.5.4. Pcrcelitage of Persons Rxeiving a Score of Sixty or Greater 

(6OPS) . This shows for each Time Group the percentage of the survey population 

receiving a actjre of GO or greater. It is used in addit i3i to the 

actual number of persons (6OP) receiving a score of 60 or greater to en- 

6blc comparison in the higher Time Groups where the population drops off. 

2.1.5.5. Persons Receiving a Score of 60 or Greater on the First SAT T&en 

after Advanced Training. TVo indicators were developed based on the test 

perf crmance of personnel on the first. SAT following advanced training. 

These relate the high scores to the number of people being trained and 

t-he scores of trained people to the ~erall high scores. Figure 2. is a 

modified Vann diagram depicting the relationship of the- indicators. 

2.1 .5.5.1. Number of. Persons Receiving a Score of Sixty or Greater on the 

First SAT Takers after Advanced Training Relative to the Numbx +f Pxaons 

Attending Advanced Training(Sc60/ScP) , This is an indication of the relationsliip 

between the advanced course nnd the SAT area. A large numbir suggests 

a close content relationship between the course and the SAT. 

2.1.5.5 '3 

. . * Number of Persons Receiving a Score of Sixty or Greater on the 

First SAT Taken after At.t.ending A&&riced Training Relative t-0 Th2 Num&r 

of Persons Receiving a Score of Sixty or Greater. (Sc6OIFjOP) . This is an 

indication I:Jf the effect of schools on performance as measured by SATs . 

When using 60s as an indicator of specialization it is important to kLiO’vJ 

if the high scores tire strongly school influenced or if they reflect pri - 

marily work experience. 

3 . FINDIPJGS 

The initial focus of this study was on the ident if icac ion of equipment 

specialization within an NEC. The research, in addition, yielded data oh 

school ut i 1 i zat ixi and per sonnel per-f ormance which, although not dircsctly 

related to specialization, provide important insights on the training sys.tern, 

FirLtings in all t.hree areas have been included in the report. 

3.1 SPECIALIZATION. Specialists are persons specifically respxsiblc, or 

ioc;ked to, for operaLion or maintenance of some limitcti part of the entire 

41 

.

- C = sc61) 

A 60P 

SCP % 

60P % 

O- 

.+ 

- C = sc6o 

B SCP 

Where A = The number of persons receiving a score of 60 or greater (60P) 

B = The number of persons attending advanced training (ScP) 

C = The number of persons who both attend advanced training and 

receive a score of 60 or greater. In deriving C the further 

qualification was placed that the score of 60 be on the first 

SAT following advanced training (Sc60) 

Figure 2. Relationship of Indicators Based on Scores After School

----- 

DESCRIPTORS OF JOE SPECIALIZATION BASED ON JOB ~OblJnEJXZE TIXW 

NEC respons ibi 1 ity . The existence of specialization was identified from 

perSanne1 data and was confirmed by personnel survey. Speci&ization may 

relate to either time or ability. 

3.1.1, :: ,e$‘- ‘7- .‘o):es~ec:t. to Time. Specialization with respect to 

time mean5 that certain equipmant.s or areas become t,h2 r:espms ibilit,y ,:,,f . 

technicians primarily based on experience level. In this type of specialization 

mwt people with similar experience levels can be expected to be 

assigned responsibility for a specific equipment. In general. time spe- 

, 

oiali zat ion breaks di~wn into equipments assigned to newly reported personi 

nel and some reserved for highly emerisnced personnel. 

I 

I 

3.1.2. S~~eoialiZ~t.ion with Resmect. t.o Abi1it.v. This type of Specialization 

; 

. ref lccts~ 36m12 special apt ituda. ,q2ecialization of this type does not re- 

1 

3 

fl5ct a spxific a of ability that leads a person to &come in- 

. . f 

volveil in most corriccive maintenance in a.n ares. ,~ecialiZatiorl of this 

nature will begin as soon as the ability is recognized and cant inue 

throughout a person s tenure with the command. 

3.1.3. ]qo SrxciaZation. Areas with no specialization may be worked by 

any technician without recourse to Specialists. These are areas. that are 

either SimLlI e eIilm$-J or frequently t3imcJh worked that a SUf f ici~lit Ckgree 

of competence can be expected and utilized in all technicians. 

3.2. COURSE AT-lXI?DANCE. There were thirteen advanced courses applicable 

to the WC. Very few people attend all courses and attendance is cenc5ntraced 

in the first part of a technician’s first sea duty tour. Three dist 

inct patterns of attendant r= _ may be derived from the various courses. 

3.2.1 u Att_endance. These courses showed an attendance pattern with 

very hsavy usage prior to and following cha first patrol and then dirniniShin~ 

rapidly tG little: or no use, 

43

. 

DE.,,RIPTORS 

cr c c .J? 

3 .2.2. Normal Xt?n&nce. This attendance pattern begins with little attendance 

prior to the first patrol, peaks during the second or third off 

crew, and thin tapers off slowly. 

3.2.3. L~ol Attendance. This attendance pattern shows relatively steady 

attendance over five or six off crews, Each of the courses that had level 

attendant+ also had below average attendance. 

3 .3 . FERSCjiWEL FEF$~jP~QKE, Personnel performance as measur+~ by the cho- 

sen indicators varied greatly between equipments. 

3.3.1 Ferc+:ntaac of Persons Peceiving a Score of Sixtv or Greater f:n the 

Test Following Advanced Trainina. This relationship varied from a low of 

17% to a high of 40 %. This largely reflects the relationship between the 

SAT and the course. 

3 .3.2.Qof + -o.- - e re’ ’ S a $A.T ScorP _ cf 7 i,r.t-v ok 

De .e w‘q lX Sn Following F.dv~ ted Trainln(L 

I . 

This relationship goes from 

a 1,:~ ,:if 21s;; tt] a high of 57%. The average value was 34% Which implies 

that most persons achieving high scores have not been directly influenced 

by advanced training. 

4. EXAMPLES 

Thirteen courses were analyzed as part of the study. Diagrams and discus- 

sion are provided on five courses. Two courses show specialization with 

respect tcl t.ime and more particularly use early in an assignment, TM0 

courses shots no strong evidence of specialization. One course suggests 

specialization with respect. to ability. For each a rectangular Venn diagram 

and tabular data are provided on the u s ixc ies ” related indicators. 

Th* tabular data shows tht averagt value in each category for all thirteer, 

courses in the study and the value for the particular course or area. In 

addition t.here is A graphical presentation of school attendance (Cl d nd 

” sixty 1) scores (60) with respect to time periods. The diagrams and accompanying 

di:.cussion are each presented on a separate palge. 

44 

.

coum? I 

1.1 TIME SPECIALI7,TION, COURSE 1. Specialization with respect tG time iz 

indicated by a high initial schooling rate dropping ‘of rapidly and the 

number of high scores occurring during the 2nd, 3rd, and 4th time periods. 

The percentage of psople getting high scores following schooling, the relative 

shapes Of the “CU and ‘60” curves and the low repetition rate of 

high scores all indicate that experience is of greater importance to high 

scores than school. The low score repetition rate suggests that people are 

rotated through responsibility for this equipment and receive little reinforcement 

on the equipment when not specifically assigned. Interviews 

wir;h fleet .zp.zrienced personnel confirmed that this equipment is geherally 

assigned to nzw personnel as a place to get them gently Started. 

45

El SCP % I I sceo w i i l cJOP% 

I 2 3 4 5 b 7 6 

I 96 96 96 9~ 61 49 19 II 

~ouolArotn~~~ wnod? 

4.2 TIM?, SPECIALIZATION, COURSE 2 t This is ix-1 area of QtXialization for 

newly reported personnel. This is shcwn by the high initial schooling rate 

and the number of “60” scores obtained during the 2nd time period. At 72% 

this is one of the most used courses by the NEC, however, high scores fol- 

lowing the course occur at little b&tsr than the chance rate. Most of the 

high scores rcf lect experience not schooling. The high score repetition 

rats of 1 .50 is the lowest of any of the thirteen areas in the study and 

suggests thic spacialization is short lived and that continuing reinforcement 

in later patrols is not present. 

46

DESCRIPTORS OF JOR SPECIEJ,IZATIaN f&SG!n C?kJ JOR KNOWJXDGE TESTI= 

r 

SCPX 

SC00 

scp 

Q@ g@ 

6GP 6GP 

BUP% 

*vrg 59 28 1.78 .34 .48 

Are0 51 .18 2.05 .21 -15 

1 2 3 J 5 b 7 6 

n 96 96 9 6 96 67 4 9 29 II 

r~lroI/Trom~ngwnoOo 

4.3 NO SPECIALIZATION, COURSE 3. The score pattern shown on the graph is 

what might be expected with the normal progress of a maturing population. 

(It must be remembered that the tapering off to the right of the graphs 

reflects decreasing population rather than a lower percentage of high 

scores. ) School use peaks in the second time period and tapers off 

rapidly. Thi: number of people getting “sixties” after the course is b&rely 

above the chance level of a.normal distribution indicating that the course 

a.nd test were not well aligned.’ The relatively high “60s” repetition rate 

(2.05) suggest a good relationship between the test and the actual Vorl: 

being performed. 

47

1 1 I I , 

lob 5CP I b 

Area b2 .36 1 ,a7 .57 41 

I 2 3 J 5 6 1 0 

w 96 96 96 96 61 49 29 II 

Course 4 

4.4 NO SPECIALIZATIOLJ, CCXJFGE 4. Tast-3 and personnel &ta provide no indication 

of spXialization in platform positioning within the survs;y populacion. 

Th2 course appears LO be closely related to test conWit. Th2 very 

high percentage of persons achieving a 60 who do so following schonl (57% 1 

suggests a stronger relationship batween school and the area than is present 

for most other coursss . Discussions with senior petty officers suggest 

that this area may actually represent an area of time specialization 

with specialization occurring outside the study population. 

4 8

I ? 3 4 

5 

n 96 96 96 96 67 49 19 II 

~~tr~vrh~fv~ penoas 

COUf5d 5 

4.5 SPECIALIZATION BY ABILITY, COURSE 5. D!+zCialiZatiijlI iI1 this area iS 

suggested by th-+ high percentage of persons who get good scores following 

the 

schooling and the very high IIGOU repetition rate (2.30, hlghezt for 

thirteen ‘study courses). This repetition rate 

suggests continuing on the 

job reinforcement of school material, The cours$+ although very productive, 

is used on only a limited basis suggesting&is employed principally 

when a replacement is wanted for the current specialist. 

The indicators derived in this study to analyze th* r?lationshigs k~iec::?2:n 

training, _ j& _ knowledge testing and job performance provide analytical 

49

. 

nESCRLP’TOl?S OF JOR SPB ON JOF! K&2&&G- 

c 

conclusions which are consistent with survey responses. That is if you 

form an hypothesis from the indicators it can consistently be confirmed by 

survey . The indicator values used here were manually derived from lists of 

computer data but, once proven, lend themselves to computer analysis on a 

regular basis. The rectangular Venn diagram was adopted as a solution to 

the puzzling problem of how to easily show accurate overlap of circles of 

different sizes. It takes a little practice but they are easy to. use; 

Interpretation of the various indicators was facilitated by the divergence 

of patterns. Proposed interpretations could be confirmed by the changes of 

the indicators with different use patterns, One could say, “if the relat 

ionship changas in way X then the indicator will change in way Yll and 

confirm the hypothesis with another pattern from the study. This use of 

test and course usage data permits an objective, in depth analysis of the 

training relationships that can usually be achieved only by extensive data 

analysis and survey. While not eliminating all need for on site survey in 

training e*;raluation it supports less and better focussed survey time. 

50

ADDRESSING THE ISSUES OF “QUANTITATIVE OVERKILL" IN JOB ANALYSIS 

Julie Rheinstein 

Brian S. O'Leary 

Donald E. McCauley, Jr. 

U.S. Office of Personnel Management 

Washington, D.C. 

Paper presented at the 32nd Annual Conference of the Military Testing 

Association, November 5-9, 1990, Orange Beach, Alabama. 

51 

. .

ADDRESSING THE ISSUES OF "QUANTITATIVE OVERKILL" IN JOB ANALYSIS 





Schmidt, Hunter and Pearlman (1981) have indicate3 that molecular job . 

analyses are unnecessary in selection research involving traditional aptitude 

tests. Fine-grained, detailed job analyses tend to create ttle appeararce of 

large differences in jobs, whereas, in fact, the differences are of no practical 

significance in selection' Our recent job analysis research has focussed 011 

looking at how job analysis projects can be less detailed and less cumbersome 

while still allowing one to obtain the necessary information for test 

development. 

O'Leary, Rheinstein and McCauley (1989, 1990) discussed several "holistic" 

job-analytic approaches used in fcrming job families. Their research suggests 

that the traditional fins-yrained,job-analytic approach may not 'always bt: 

necessary, especially when one is in a fast reaction situation. 

In the first phase of a project for tt;e development of an examination for 

Federal proiussional and administrative career occupations, job families were 

formed tising a procedure developed by Rosse, Borman, Campbell and Osburn (1985) 

(see O'Leary, Rheinstein, and McCauley, 1990,for a detailed explanation of the 

formation of job families). Once the families had been established it wiis 

necessary to determine the importance of various abilities for job performance, 

and which abilities to measure by a written test. 

The "inferential leap,- (i.e., the inferring of htiiitan abilities important 

for job performance) is traditionally performed by a panel of "subject matter' 

experts." However, there is little gtiidsnce in the literature concerning the 

composition of this panel of experts. As Ldndy (1988) has so ably iridicated, 

incumbents are the ones n,ost famiiiar with the job itself but are ofteli 

unfamiliar- with the conceptual or operational characteristics of tile abilities. 

On the other- hand, job analysts (often psychologists) are familiar with tr& 

characteristics of the abilities but are often not very familiar- with the job 

itself. 

The recent work of Butler and Harvey (1988) and Harvey (1939) showing that 

different kinds of tixparts (e.g., incumbents versus supervisor-s) provide 

different views of a job, arid often conflicting information, would seeni tc 

suggest that one might yet different results in job-ability linkage studies 

depending upon the composition of the panel of experts. We were able to address 

this isstie by comparing the job-ability linkage ratings made by personnel 

resewCtl psychologists to the scirne ratings made by job incumbents. 

When one conducts a traditional job analysis, the question becomes how much 

informAtion should be collected. Often raters are asked to rate tasks on several 

scales such as importance, time spent, difficulty, or physical demands. 

Weismuller, Staley and West's (1989) research indicates that ratings one scale 

al-e contaminated by ratings on other scales. Anecdoctal findings from job 

analysts indicates that obtaining ratings on importance, time spent, etc. i:; 

unnecessary in most cases because the ratings are highly correlated across 

scales. 

This paper will look at several aspects of job analysis and how the 

traditional, fine-grained methods may result in "quantitative overkill." We 

Will present data on sever-a? techniques for- determir?iny tha importmce of 

x+-... 

52 

---!!y

abilities for test development (Studies 1 and 21, as well as look at the relationship 

between relative importance and relative time spent ratings (Study 3). 

1STUDY 

Data Collection: Ninety four professional and administrative occupations in the 

civilian Federal work force were studied. A list of major duties was developed 

for each occupation. Then a list of abilities was developed by reviewing the 

construct literature (Northrop, 1989; French, Ekstrom and Price, 1363; and 

Peterson and Bowans, 1982). Abilities that could not be assessed through a . 

written test were not included. The resulting list contained seven abilities: 

verbal comprehension, general reasoning, number facility, logical reasoning, 

perceptual speed, spatial orientation, and visualization. 

Using the job-specific duty lists that were developed for each occupation, 

five research psychologists rated the seven abilities for their importance to 

SdCh overall job using a five-point scale. The scale ranged from "l-Unimportant" 

to "5-Crucial." It should be stressed that rather than rate each ability against 

each duty for every job, the psychologists were asked to read each duty list in 

its entirety and make a "holistic" judgment concerning the importance of aach 

of the abilities for the overall job. The psychologists made holistic ratings 

for tl-le occupations. Based on each psychologist's overall ratings for each job, 

averages for each ability were computed for each of the six job families. 

Approximately 6,000 job incumbents completed and returned the inventory. 

As part of this inventory, job incumbents were asked to rate each of the seven 

abilities using the sama five-point scale used by the psychologists. The 

inventory stressed that the incumbents rate each ability as it related to their 

overall job. That is, they were asked to make a "holistic" judgment about the 

importance of each ability. Based on each ir,cumbant's overall rating for the‘;r 

job, averages were computed for each job family for each ability. 

RESULTS 

The mean overall ability ratings of the psychologists and the job 

incumbents, for two of the largest job families, can be found in Table 1. 

Table 1. Comparisons of ability ratings for psychologists (N=5) and job 

incumbents by job family. 

Psychologists Incumbents 

mS.D. 

Business, Finance & Management Occupations 

Mean S.D. 

Verbal Comprehension 4.56 (.41) 4.40 (1.10) 2306 

General Reasoning 4.60 (.41) 4.13 (1.13) 2306 

Number Facility 3.99 C.71) 3.54 (1.27) 2306 

Logical Reasoning 4.38 (.48) 3.59 (1.29) 23c5 

Perceptual Speed 1.82 (1.12) 2.72 (1.33) 2306 

Spatial Orientation 1.68 c.62) 1.57 (1.29) 2306 

Visualization 1.55 c.38) 1.95 (1.46) 2306 

Personnel, Administration & Computer Occupations 

Verbal Comprehension 4.64 l.44) 4.51 (1.03) 1197 

Ganeral Reasoning 4.59 (.40) 4.25 (1.07) 1197 

Number Facility 3.18 l.56) 3.07 (1.31) 1197 

Loyical Reasoning 4.44 (.50) 3.59 (1.29) 1197 

Perceptual Speed 1.66 c.92) 2. 72 (1.33) 1197 

Spatial Orientation 1.34 (.49) 1.57 (1.29) 1197 

Visualization 1.26 (.46) 1.95 (1.46) 1197 

53

The average estimate of reliability (Cronbach's alpha) of the ratings across 

the six job families was .99 for the psychologists and .84 for the incumbents. 

There was very high agreement between the psychologists and the job 

incumbents in terms of the relative importance of the abilities to the jobs in 

each job family. The product-moment correlations among the mean ability ratings 

for the two groups of raters ranged from .96 to .98 and rank order correlations 

ranged from .89 to .96. 

To investigate whether or not the psychologists and the job incumbents 

agreed in terms of the absolute importance ratings given to the abilities, tests 

of the significance of the difference between the means for the two groups of 

raters were performed. In the majority of cases, the pairs of mearls were found 

to be significantly different. It should be borne in mind, however, that due 

to the large numbers in the incumbent group, even very small absolute differences 

will be statistically significant. 

When the mean ratings for both groups were dichotomized into those 

determined to be important (equal to, or greater than, 3.0--"Important" on the 

five-point scale) and those determined not to be important (less than 3.0 on the 

five-point scale), the two groups of raters were found to be in perfect 

agreement. 

STUDY 2 

Wernimont (1988) has indicated that governmental guidelines on employee 

selection still emphasize the necessity of focussing on job tasks and duties in 

job analysis, followed by documentation and justification for the inferences made 

about needed abilities. Perhaps this is a function of the fact that, as Schmidt 

(1988) points out, the empirical data upon which these governmental guidelines 

are based are inadequate in many areas, particularly job analysis. 

Our job-analytic research provided an opportunity to add to the empirical 

database by determining if job-ability linkage results obtdined in a "holistic" 

manner (i.e., by having job incumbents rate the importance of an ability to 

overall job success) were comparable to the job-ability linkages results obtained 

by requiring incumbents to rate the importance of abilities for each duty they 

perform. If the results obtained from the two methods were found to be similar, 

significant reductions in the cost, as well as the intrusiveness, of the job 

analysis process for test development could be possible. 

Data Collection: As indicated earlier, in the job analysis inventory the 

incumbents were asked to rate each of the seven abilities, using a five-point 

rating scale, for their importance to overall job performance (i.e., the holistic 

approach). Average ability importance ratings were then computed for each job 

family. 

After rating the importance of the ability to the overall job, these sama 

incumbents were asked to rate the importance of each of the seven abilities to 

the performance of each individual job duty they had previously indicated they 

pet-formed (i.e., the traditional fine-grained, duty-ability linkage approach). 

Using this traditional approach, a mean ability rating was determined by summing 

each incumbent's ability rating for each duty performed and then dividing by the 

number of duties that the incumbent performed. Averages were then computed fcr 

each job family for each ability. 

RESULTS 

The average ratings for each ability across duties performed, for the same 

job families, are presented in Table 2. For ease of comparison, the mean overall

atings for the incumbents given in Table 1 are repeated in Table 2. 

Table 2. Comparison of incumbents' mean ability ratings across joo specific 

duties and mean overall ability ratings, by job family. 

Job Specific Holistic 

Mean S.D. 1 

Business, Finance & Management Occupations 

Verbal Comprehension 4.01 (.77) 2288 

General Reasoning 3.99 (.65) 2287 

Number Facility 3.22 (.85) 2250 

Logical Reasoning ,I 3.59 (.80) 2246 

Perceptual Speed 2.57 (.91) 2101 

Spatial Orientation 1.96 (.96) 1737 

Visualization 2.38 (1.08) 1897 

Personnel, Administration & Computer Occupations 

Verbal Comprehension 4.12 (.75) 1197 

General Reasoning 4.11 (.62) 1194 

Number Facility 2.66 (.94) 1155 

Logical Reasoning 3.74 (.77) 1174 

Perceptual Speed 2.20 t.96) 1048 

Spatial Orientation 1.73 (.88) 899 

Visualization 2.08 (1.05) 938 

S.D. Mean a 

4.40 (1.10) 2306 

4.13 (1.13) 2306 

3.54 (1.27) 2306 

3.59 (1.29) 2305 

2.72 (1.33) 2306 

1.57 (1.29) 2306 

1.95 (1.46) 2306 

4.51 (1.03) 1197 

4.25 (1.07) 1197 

3.07 (1.31) 1197 

3.59 (1.29) 1197 

2.72 (1.33) 1197 

1.57 (1.29) 1197 

1.95 (1.46) 1197 

The average estimate of reliability (Cronbach's alpha) of the ratings across 

the six job families was .80 for the incumbents' ratings across duties and .84 

for the incumbents' holistic ratings. 

There was very high agreement between the incumbents holistic ratings and 

the average job-specific duty ratings in terms of relative importance. The 

product-moment correlations among the mean ability ratings from the two types 

of ratings ranged from .98 to .99 and rank order correlations ranged from .85 

to 1.00. 

In terms of the absolute importance ratings given to the abilities, tests 

of the significance of the difference between the means for -ti-,e two types of 

ratings revealed tilat, in all but one case, the pairs of means were statistically 

different. 

When the mean ratings for both types of ratings were dichotomized into those 

determined to be important (again, equal to or greater than 3.0--"Important" on 

the five-point scale) and those determined not to be important (less than 3.0 

on the five-point scale), the two types of ratings were found to be in agreement 

in all but three instances. 

Study 3 

Data' Collection: As part of the five-section inventory, incumbents were asked 

to rate 57 generalized 

work behaviors (GWB'S) developed specifically for the 113 

professional and administrative occupations (See O'Leary, Rheinstein, and 

McCauley, 1990 for a detailed discussion of the development of the GWB's). The 

GWB's were rated for relative importance and relative time spent. Incumbents 

were first asked to check the GWB's they perform. Then, they rated the ones they 

checked using a 5-point relative importance scale ranging from "1 - Unimportant" 

to "5 - Crucial" and a 5-point relative time spent scale ranging from "1 - Very 

much below average time" to "5 - Very much above average time." Each of the 

55

atings for the 57 GWB's were correlated across occupations yielding 57 

correlations. 

RESULTS 

Correlations between the rating on the two scales ranged from .77 to .93 

for the 57 GWB's, with a mean c of .89, indicating a strong relationship between 

relative importance and relative time spent ratings. 

DISCUSSION 

Sackett, Cornelius, and Carron (1981), Cornelius, Schmidt, and'carron 

(1984), and othars have shown in a classification setting that holistic judgments 

compared favorably with those made on the basis of large-scale job analyses. 

Study 2, described above, showed similar results in that the relative 

importance of abilities as measured through linkages with job-specific duties 

was nearly identical to that obtained from linkages with the job as a whole. 

The results obtained in Study 1 suggest that similar ratings of the importance 

of abilities to job performance can be obtained from holistic ratings made by 

two types of raters--psychologists and job incumbents. The deterrllination of 

which abilities were important to job performance was identical for the two 

groups of raters. 

At first glance, these findings would appear to make a strong case for 

saying there is overkill in the job analysis process and that it is possible to 

streamline job analysis procedures for test development. In situations where 

one needs results in a hurry, holistic methods can be used. In addition, in this 

it was found that it is not necessary to have incumbents rate both importance 

and time spent, unless occupations such as police officer or fireman are being 

studied. It is well known that it is important for police officers to be able 

to use a gun properly, even though they may not spend a lot of time doing it. 

However, the equivalence of the results obtained from the three sources and, 

thus, the interchangeability of the sources, ultimately depends upon the use to 

which the information will be put. As was mentioned above, if job anhlysts want 

to determine which abilities are important for job performance, the three sources 

of data produce virtually equivalent results. If other types of decisions are 

to be made (e.g., weighting the parts of an ability test battery to achieve a 

composite score), the absolute differences among the mean ratings could produce 

different results. While one could not claim that the results obtained by the 

three different methods were equivalent in the terms outlined by Gulliksen 

(1968), it would seem that thay could be used inter-changeably in some 

circumstances. 

REFERENCES 

Butler, S.K. and Harvey, R.J. (1988). A comparison of holistic versus decomposed 

rating of Position Analysis Questionnaire work dimensions. Personnel Psvcholoay, 

41, 761-771. 

Cornelius, E.T., Schmidt, F.L., and Carron, T.J. (1984). Job classification 

approaches and the implementation of validity generalization results. Personnel 

Psvcholosv, 37, 247-260. 

French, J.W., Ekstrom, R.B., and Price, L.A. (1963). Kit of reference tests for 

coqnitive factors. Princeton, N.J.: Educational Testing Service. 

Gulliksen, H. (196a). Methods for detarmining equivalence of measures. 

56

Psycholoqical Bulletin, 70(61, 534-544. 

Harvey, R.J. (1989). Incumbent versus supervisor ratings of task inventories: 

Overrating, underrating, contamination, and deficiency. In press. 

Landy, F.J. (1988). Selection procedure development and usage. In S. Gael 

(Ed.), The iob analysis handbook for business, industrv and qovernment. New 

York: John Wiley and Sons, Inc. 

Northrop, L.C. (1989). The psychometric history of selected ability constructs. 

U.S. Office of Personnel Management. 

O'Leary, B.S., Rheinstein, J., and McCauley, D.E. (1990). Developing job families 

using generalized work behaviors. Proceedings of Annual MTA Conference, Orange 

Beach, AL. 

Peterson, N.G. and Bowans, D.A. (1982). Skill, task, structure, and performance 

acquisition. In Dunnette, M.D. and Fleishman, E.A. (Eds.), Human performance 

and oroductivitv: Human caoability assessment. Hillsdale, N.J. Lawrence Erlbaum 

Associates. 

Rosse, R.L., Barman, W.C., Campbell, C.H., and Osbur-n, W.C. (1985). Grouping 

Army occupational specialties by judged similarity. Unpublished paper, 1984. 

Sackett, P.R., Cornelius, E.T., and Carron, T.J. (1981). A comparison of global 

judgment versus task-oriented approaches to job classification. Personnel 

Psvcholosy, 34, 791-804. 

Schmidt, F.L., Hunter, J.E., and Pearlman, K. (1981). Task differences as 

moderators of aptitude test validity in selection: A red herring. Journal of 

Apolied Psvcholoqv, 66, 166-185. 

Schmitt, N. (1987). Principles III: Research issues. Paper presented at the 

second annual conference of the Society for Industrial and Organizational 

Psychology. 

U.S. Equal Employment Opportunity Commission, U.S. Civil Service Commission, U.S. 

Department of Labor, and U.S. Department of Justice. (1978). Uniform quidelines 

on emolovee selection procedures. Federal Register, 43(166), 38290-38309. 

Weismuller', J.J., Staley, M.R. & West, S. (1989). CODAP: A comparison of single 

versus multi-factor task inventories. Proceedings of Annual Military Testing 

Association Conference, San Antonio, TX. 

Werni'mont, P.F. (1988). Recruitment, selection and placement. In S. Gael (Ed.), 

The iob analysis handbook for business, industry, and oovernment. New York: 

John Wiley and Sons, Inc. 

57 

/ 

--~

Introduction 

Developing Job Families Using Generalized Work Behaviors 





This paper describes one phase of a large-scale research project aimed at 

developing and refining a list of work behaviors common to approximately 100 

different Federal professi&nal and administrative occupations. In this phase 

of our research, we were attempting to form job families and to describe how 

these job families differ in terms of the relative time spent on yeneral work 

behaviors. 

Traditional systems for describing jobs have usually focussed on dascribing a 

single job rather than attempting to determine the similarity among jobs. 

Thus, many of the traditional means of describing jobs, such as task analysis, 

are somewhat limited when one tries to compare across jobs. 

Since one of the ultimate uses for our research was the development and 

documentation of selection tests we needed a method of comparing jobs using 

some form of work behavior as a unit of measurement. Our goal was to develop 

a method of comparing jobs in such a way as to be consistent witt, provisions 

of the Uniform Guidelines. The Guidelines define "work behavior" in the 

following manner: "an activity performed to achieve the objectives of the 

job. Work behaviors involve observable (physical) components and unobservable 

(mental) components. A work behavior consists of the performance of one or 

more tasks. Knowledges, skills, and abilities are not behaviors although they 

may be applied in work behaviors" (Section 16, 43FR38308). 

Develooment of the Generalized 

Work Behaviors (GWB's) 

First it was proposed that a li st of occupation-specific duties be 

constructed. A list of general izable work behaviors could then be generated 

by grouping the occupationally specific duties in terms of common underlying 

work behaviors. 

We extended the work of Outerbridge (1987) in the development of our GWB's. 

Outerbridge had developed a list of 32 GWB's. She used duty statements 

contained in the occupational definitions in the Dictionary of Occupational 

Titles (DOT) for 24 populous Federal professional and administrative 

occupations. 

A list of 223 duty statements were extracted from the DOT. Each one was 

placed on a separate card. These duty statements were then sorted into 

categories describing similar work behaviors, first by a group of 10 perSOrlnel 

psychologists and later by a group of 10 occupational specialists. Nineteen 

sorters provided usable data. The 19 separate sorts were summarized and 

compared after transformation into matrix form. Matrix representation allowed 

the development of final work behavior categories using cluster analySiS to 

discovel- the structure within the summarized data and also allowed the

quantitative comparison of the sorter categorizations. A list of 32 defined 

work behaviors was developed. Final categories were named and definitions 

were added tl, suggest the commonality among duty statements making up each 

category. 

In the present study we began by reviewing OPM's Classification and 

Qualification Standards for each of 113 professional and administrative 

occupations. For- each occupation, the major job specific duty stataments were 

extracted from the Standards. Approximately 10 to 15 major duty statements 

were obtained for each occupation. In total, over 1,400 job-specific duty 

statements were developed. 

Using the 32 GWB's developed by Outerbridge, we had four psychologists sort 

each of the 13OOt job specific duties into the 32 GWB';, if applicable. 

Sorters were instructed to sort the duties on the basis of war-k behaviors. 

Job-specific duties that could not be sorted into the 32 GM's were placed 

into a miscellaneous category. Sorters were advised to put job-specific 

duties into the miscellaneous category if they had reservations about placing 

them in any one of the generalized work behavior categories. Sorters were 

also instructed to develop new generalized work behavior categories if they 

found that several job-specific duties did not fit into any of the GkB's 

categories but seemed to describe a common underlying work behavior. 

For the group of sorters, the average time required fcr the sorting tash was 

approximately 8 hrs. Sorters generally broke up the task into 2 half-day 

segments. If three out of the four sorters classified a specific job duty into 

a GWB category, we considered it to be a match. Using this criterion,, about 

75% of the 1400t job specific duties were able to be classified into the 32 

GWB's. 

The original sorters also developed 18 additional GWE categories. using the 

332 job specific duties that could not be sorted into the original 32 GWB's, 

another group of 4 psychologists sorted these job specific duties into ttle 18 

additiorlal GWB's and were also told to develop new GWB's if necessary and 

appropriate. 

In total, 25 additional GWB's were developed. Using the same 75% agreement 

criterion, 290 more job-specific duties wet-e classified into & generalized 

work behavior category. Out of a total of 1,438 job-specific duties, only 42, 

or about 3% could not be classified into d generalized work behavior. T&ble 1 

shows two examples out of the total 57 GM's developed. 

Table 1 Examples of Generalized Work Behaviors 

1. Presents information about work of the organization to others: e.g., 

Describes agency programs and servictis to individuals or groups in community 

or to higher management. 

2. Applies regulations to organizational programs and activities: e.g., 

Selects and interprets laws to ensure uniform application on wage and hour or 

safety and occupational health issues and in the sale arid leasing of property. 

59 

.

Ratinq the Generalized Work Behavior-s 

These 57 GWB's were included in a five-section inventory that was sent to 

about 14,000 incumbents in 113 occupations. Approximately 7,000 inventories 

were completed and returned. Of the 7,000 inventories that were received 

about 6,000 were from the 94 occupations under study herein. As part of the 

inventory incumbents were first asked to read all the GWB's and then check the 

ones they perform. 

One of the first questions we investigated was what types of GW8's are 

performed most often across jobs. Table 2 presents the six G'h'B's that are 

performed the most as well 2s the six that are performed the least. 

. 

Table 2 Most Frequently and Least Frequently Performed Generalized Work 

Behaviors 

Most frequently performed 

Writes correspondence, memoranda, manuals, technical reports, or reports 

of activities and findings. 

Interviews or confers with persons to obtain information not otherwise 

conver,iently available or gathers facts on specific issues from 

knowledgeable persons: e.g., Interviews persons, visits establishments, 

or confers with technical or professional specialists to obtain 

information or clarify facts. 

Analyzes and interprets information and makes recommendations based on 

findings: the information can be numerical or presented in verbal or 

pictorial form. 

Responds to inquirlas from the pzdblic, other agencies, Congress, etc., 

concerning the work of the activity. 

Keeps records and compiles statistical reports. 

Reviews documents for conformance to standard procedures verifying 

correctness and completeness of data and authenticity of documents: 

e.g., May audit financial data. 

Least frequently performed 

Performs policing functions such as arresting and detaining persons and 

seizing contraband. 

Writes, tests, and documents computer programs. 

Sells property or at-ranges for disposal of property, supplies or 

records: e.g., Inventories, advertises and sells d delinquent taxpdyer's 

seized property or disposes of archival records. 

Plans and directs organization's public relations function. 

Inspects persons, baggage, or other materials. Inspection involves St 

least some physical action by the inspector. 

Drafts regulations based on an analysis of information: e.g., Drafts 

regulations on transportation systems or employment and training 

legislation. 

AS can be seen in Table 2, writing, interviewing, record-keepirig, ensuring 

ComPliaftCe of rsgulations and providing information tu the public are tne 

GWB'S that are most frequently performad across these professional and 

60

administrative occupations. The least frequently performed GW3's are those 

that are more specific to a particular occupation such as police w

I General management and supervisory functions 

II Evaluating programs and ensuring compliance with regulations 

III Dissemination of information 

IV Gathering, classifying, and organizing information 

V Budgeting and accounting functions 

VI Application of rules and regulations - making determinations 

VII Planning and developing policy and procedures 

VIII Computer utilization 

IX Police functions 

X Investigating and arbitrating 

XI Inter-viewing 

The next question addressed was "how do the job clusters differ on the 11 

factors of GWB's?" A dimension score was calculated for each factor by 

summing the item scores which loaded on that factor. These scores were then 

standardized. A mean profile on the 11 factors was computed for each of the 

six job clusters formed from the Q-factor analysis of the GWB's. TabJe 3 

lists for each job cluster the GWB factors that were rated above the mean for 

relative time spent. 

Table 3 Important generalized work behavior factors by occupational cluster 

I. 

II. 

III. 

IV. 

V. 

VI. 

General Business and Administration 

A. General management and supervisory functions 

8. Evaluating programs and ensuring compliance with regulaticns 

C. Gathering, classifying and organizing information 

D. Budgeting and accounting functions 

E. Planning and developing policy and procedures 

F. Computer utilization 

Claims Examining Occupations 

A. Applications of rules and regulations - making determinations 

B. Investigating and arbitrating 

Law Enforcement Occupations 

A. Police functions 

B. Investigating and arbitrating 

C. Interviewing 

0. Application of rules and regulations - making determinations 

Public Information Occupations 

A. Dissemination of information 

B. Interviewing 

Industrial/Labor Relations 

A. Investigating and arbitrating 

B. Interviewing 

C. Evaluating programs and ensuring compliance with regulations 

Specialized Program Analysis 

A. Gathering, classifying and organizing informAtion 

62

SUMMARY 

This exploratory study was one of the first applications of the GWB's. 

Certainly, the GWB's need refinement but, at this stage of development,, the 

results look promising. The results obtained in this study make sense 

intuitively in terrns of the GWB's performed the most and least across jobs, 

job dimensions, and clusters of related jobs. 

REFERENCES 

Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory 

factor analysis in applied. psychology: A critical review and analysis. 

Personnel Psvcholoqv, 39, 291-314. 

Leaman, J. and Steinberg, A.G. (1990). Factor analysis versus CODAP 

hierarchical clustering for a leadership task analysis. Paper presented at 

the 98th Annual American Psychological Association Conference, Boston, MA. 

Outerbridge, A.N. (1981). The development of seneralizable work behavior 

categories for a synthetic validity model. Washington, O.C.: U.S. Office of 

Personnel Management. Personnel Research and Development Center. 

SAS Institute, Inc. (1985). SAS user's quide: Statistics (Version 51. Car-y, 

NC: SAS Institute, Inc. 

U.S. Equal Employment Opportunity Commission, U.S. Civil Service Commission, 

U.S. Department of Labor, 8, U.S. Department of Justice. (1978). Uniform 

guidelines on emolovee selection orocedures. Federal Register, 43(166), 

38290-38303. 

63

A COMPARISON OF HOLISTIC AND TRADITIONAL 

JOB-ANALYTIC METHODS 

Brian S. O'Leary, Julie Rheinstein, and Donald E. McCauley, Jr. 


Washin+on, D.C. 

INTRODUCTION 

Job analysis is the foundation of many personnel systems including . . 

selection, performance appraisal, and training. Most often, 

lengthy inventories are developed and administered to job 

incumbents. This process can be very time-consuming and costintensive. 

Several researchers have looked at methods of reducing the time and 

the cost of job analysis. Grouping jobs on the basis of work 

behaviors provides one way of reducing the cost of examination 

development while not sacrificing test validity. Barnes and 

O'Neill (1978) grouped jobs for examination development in the 

Canadian Public Service. Rosse, Borman, Campbell, and Osborn 

(1984) clustered U.S. Army enlisted jobs into homogeneous groups 

according to rated job content in order to choose a representative 

sample of MOS's for test validation purposes. Rosse et al. 

clustered the jobs by sorting them on the basis of holistic job 

descriptions. 

Using a methodology similar to that used by Rosse et al., 

Rheinstein, McCauley, and O'Leary (1989) compared sources of job 

information (i.e., the people doing the sorts). McCauley, O'Leary, 

and Rheinstein (1989) compared the job groupings that resulted when 

the sorters received varying amounts of job information. These 

studies provided some of the data to be presented below. 

The purpose of the present study was to compare a traditional 

method of job analysis (administering an inventory to a large 

sample of job incumbents) to the more holistic methods described 

above. 

Data Collection for the Holistic Methods 

j 

METHOD / 

A) Eighty-seven professional and administrative occupations in the 

Federal civilian work force were studied. Personnel research 

professionals and staffing specialists grouped the occupations into 

categories according to similarity of work behaviors. These raters 

were given descriptions of the 87 jobs which were taken from the 

Federal Government's Handbook of occupational Groups and Series of 

Classes (1969). The job descriptions consisted of the job titlr: 

and a brief narrative which summarized the major duties of the job. 

64 

I

--- 

These job descriptions were printed on 5 x 9 cards and given to the 

raters for sorting. The General Schedule (GS) series numbers were 

not included. Raters were asked to sort the jobs according to 

similarities in work behaviors. No limitations were put on the 

number of categories each rater could generate. 

Two groups completed the sort: (1) nine members from the Office of 

Personnel Research and Development (OPRD) at the U.S. Office of 

Personnel Management consisting of eight personnel research 

psychologists and a personnel staffing specialist .(the _. 

ltpsychologistsll) and (2) seven personnel staffing specialists from 

seven different federal agencies (the "staffing specialistst'). 

B) A second group of &affing specialists sorted just the job 

titles. The GS series numbers were not included. These raters 

also were asked to sort the jobs according to what they perceived 

to be similarities in work behaviors based on the job titles. No 

limitations were put on the number of categories the rater could 

generate. 

The categories resulting from each of the sorts were transformed 

into a 87 by 87 matrix for each rater wherein a one in a cell 

indicated that those two jobs were placed in the same category by 

the rater and a zero in a cell indicated that the two jobs were not 

placed together. The matrices thus derived were added together 

producing three summary matrices - one for the psychologists, one 

for the staffing specialists using job descriptions, and one for 

the staffing specialists using job titles only. These matrices 

were then factor analyzed. The six-factor solutions accounted for 

68.1% of the variance for the psychologists, 70.4% for the staffing 

specialists using the job descriptions, and 68.3% for the staffing 

specialists using job titles only. The overall agreement between 

the psychologists and the staffing specialists using the job 

descriptions was 60%. There was an agreement of 56.6% in the 

classification of the jobs between the staffing specialists using 

the job descriptions and the staffing specialists using job titles 

only. 

'Data Collection for the Job Inventory Method 

A five-section inventory that included a section of generalized 

work behaviors (GWB's) developed specifically for the professional 

and administrative occupations under study was administered to job 

incumbents. Approximately 14,000 inventories were sent out to 

incumbents, and approximately 6,000 inventories were completed and 

returned. As part of the inventory incumbents were first asked to 

read all the GWB's and then check the ones they perform. 

Incumbents were then asked to rate the GWB's in terms of relative 

time spent using a five-point scale ranging from "1 -Very much 

below average time" to "5 - Very much above average time." Mean 

time spent ratings were calculated for each GWB for each job. 

65

These means were factor analyzed to produce job groupings. The 

six-factor solution accounted for 77.4% of the variance. 

Experimental Desiqn 

The results of the factor analyses derived from each of the 

holistic methods were compared to the results derived from the job 

inventory method. For this study, the job inventory method was 

considered to be the criterion and the holistic methods were 

considered predictors. An agreement was defined as occurring when _ 

a predictor agreed with the criterion concerning the placement of 

a job in a group. The percentage of agreement is the total number 

of agreements divided by, the total number of jobs. 

RESULTS 

The number of jobs in each grouping for the three holistic methods 

and for the job inventory method is shown below in Table 1. 

Table 1 

Number of Jobs in Each Groupinq for Each Method 

Method 

Grouping 

1 2 3 4 5 6 

Job Inventory 3 45 8 10 17 4 

Psychologist 17 24 7 10 16 13 

(Job Description) 

Staffing Specialist 4 34 7 9 14 19 


Staffing Specialist 

(Job Titles) 

2 19 12 10 30 14 

As one can see from this table, the number of jobs per grouping was 

relatively stable across all four methods in Groupings 3 and 4. 

Groupings 1, 2, and 5 produced relatively good agreement in terms 

of the number of jobs to be included between the job inventory 

method and two of the three holistic methods. In Grouping 6, there 

was relatively good agreement across the three holistic methods but 

not with the job inventory method. 

66

Table 2 below illustrates the degree of agreement between each of 

the holistic methods and the job inventory method. In this table, 

the number of job correctly assigned to each grouping are presented 

for each holistic method. The percentage of agreement is also 

presented for each holistic method. 

Table 2 

Number of Jobs Correctly Assiqned for Each Holistic Method . 

Methods 

Psychologist 


Staffing 

Specialist 


Staffing 

Specialist 

(Job Titles) 

Grouping 

1 2 3 4 5 6 Total 

1 20 5 9 12 2 

0 25 5 8 10 2 50 57.5% 

0 18 0 8 13 3 42 48.3% 

49 

Percentage 

of Asreement 

56.3% 

The agreement with the job inventory method was relatively similar 

for the two groups working with the short job descriptions and 

somewhat lower for the group working only with job titles. 

When the factor loadings derived from the job inventory method were 

examined more closely, it was found that for 19 jobs the difference 

between the primary loading and the secondary loading was less than 

0.1. This finding indicates that, in terms of generalized work 

behaviors, these jobs could be classified equally well in either of 

two groupings. It was decided that the definition of agreement 

could reasonably be expanded to include agreement with either the 

primary or the secondary grouping for these 19 jobs. Under this 

revised definition, the percentages of agreement raise to 64.4% for 

the staffing specialists working with job descriptions, 63.2% for 

the psychologists, and 54% for the staffing specialist working only 

with job titles. 

Similar results were obtained when the percentages of agreement 

were'calculated using only the 68 jobs for which there were unique 

67

factor loadings (i.e., where the difference between the primary and 

secondary loadings was greater than 0.1). Using these 68 jobs, the 

percentages of agreement were 64.7% for the staffing specialists 

using job descriptions, 66.2% for the psychologists, and 55.9% for 

the staffing specialists using job titles only. 

DISCUSSION 

The findings of this study are somewhat hard to. interpret. 

Agreements of 56 % to 66% are too high to conclude that the holistic 

methods have no merit for the purpose of grouping jobs but not high 

enough to advocate their replacing traditional job inventory 

procedures. The cause of this inability to make a clear 

determination may well be the criterion measure itself (i.e., a job 

inventory based on work behaviors) since there was extremely high 

agreement between holistic and traditional approaches when the jobs 

were viewed in terms of ability requirements rather than work 

behaviors (Rheinstein, O'Leary, and McCauley, 1990). 

There are two factors that should be examined as causing this lack 

'of clarity in the criterion. The first is the nature of the jobs 

under study. Agreement was consistently higher across all four 

methods for some groupings (Groupings 4 and 5) than for others. 

The jobs within Group 4 were primarily enforcement jobs, and those 

in Group 5 were primarily jobs dealing with claims examining. The 

jobs in the other groups were more general in nature. The fact 

that there was no clear factor loading for 19 jobs (21.8%) means 

that there was much overlap of work behaviors among the jobs and 

that they could be equally well grouped in more than one way. 

The second factor to consider is the use of generalized work 

behaviors. It may be that the 57 GWBls used in this study were not 

sufficient to distinguish clearly among the 87 jobs. This 

hypothesis is supported by the fact that when the job-specific 

duties were grouped to develop the GWB's, there were 42 duties (or 

3% of the total number of duties) which could not be classified 

into one of the 57 GWB's (O'Leary, Rheinstein, and McCauley, 1990). 

The development and use of additional GWB's could add other 

dimensions upon which groupings would differ more distinctly, 

thereby facilitating the assignment of jobs. 

Despite the shortcomings mentioned above, the use of elements such 

as the GWB shows promise for grouping jobs on the basis of work 

behaviors. An inventory that consisted of truly job-specific 

duties (or tasks) would not only be unwieldy but would also not 

Permit grouping of jobs because there would be little or no overlap 

of work behaviors across jobs. 

Until further advances are made in this area, the question of the 

efficacy of holistic methods of job grouping remains unsolved. 

. .

However, the degree of agreement obtained in this study argues for 

pursuing research in this area. 

REFERENCES 

Barnes, M. & O'Neill, B. (1978). Empirical analysis of selection 

test needs for 10 occupational groups in the Canadian Public 

Service. Paper presented to the meeting of the Canadian 

Psychological Association, Ottawa, June, 1978. 

McCauley, D.E., O'Leary, B.S., & Rheinstein, J. (1989)'. A -' 

comparison of two holistic rating methods for grouping 

occupations. Presentation at the Conference of the Military 

Testing Association, San Antonio, TX. 

O'Leary, B.S., Rheinstein, J. & McCauley, D.E. (1990). Developing 

job families using generalized work behaviors. Presentation at 

the Conference of the Military Testing Association, Orange 

Beach, AL. 

Rheinstein, J., O'Leary, B.S., & McCauley, D.E. (1990). Addressing 

the issue of "quantitative overkill" in job analysis. 

Presentation at the Conference of the Military Testing 

Association, Orange Beach, AL. 

Rheinstein, J., McCauley, D.E., & O'Leary, B.S. (1989). Grouping 

jobs for test development and validation. Presentation at the 

Conference of the International Personnel Management 

Association Assessment Council, Orlando, FL. 

Rosse, R.L., Borman, W.C., Campbell, C.H., & Osborn, W.C. (1984). 

Grouping Army occupational specialties by judged similarity. 

Unpublished paper. 

U.S. Civil Service Commission. (1964). Handbook of occupational 

groups and series of classes. Washington, DC: U.S. Civil 

Service Commission. 

69

DeI.ayneR.Hudqeth 

PaulR. Fayfich 

TheUniversityof TWas at-in 

Johns. Price, SQNIDR,RAAF 

AirForceHumanResoums Laboratory 

misreseamhwasccnztuctedattheAirForceHuman Resaurces Laboratory 

(AFHRL) under the 1990 !3tnmsr ReseamAProgramforfacultytigraduate 

students, sponsored by the Air Force Office of Scientific Reseamh. 

Introduction 

l%eU!W?Occupational Me&mmentSquadron(CB&Q),RaMofphAirFo~l3?se 

(ZGB), Texas is responsible for the preparation, admi.nistramnon~~Ys~ 

of USAF cccupational surveys. Using current pmxedums, 

initial sumeymail-outand initial data processingrangesfmnseventonine 

months. Currentmethe forcmllectingand processingdataforoccupational 

analysis studies are slew, ccmplicated, arAexpensive. C@lSQhas?D4u~M 

MHRL to investigate a mre efficient system of administering occupational 

. Of particular interest is thepossibilityofautcmatingtheprocess 

~pemonal cmputersandtheuseofthe Defense DataNetwork for 

distribxtion of weys and collection of responses. 

Objectives of the Reseaxh Effort 

Five objectives for the effort were determined: (1) to create a 

computeriz@dversionoftheChapel~g~tJ&surVey (chosenbecause it 

was &out to be administered via traditional mans) t (2) to prepare d 

executearesearch design for cmparing the two form of administration: (3) 

towllectdata; (4) toanalyzethedataanddescribetheres~lts;ard (5) to 

prcvide recommendations for further reseaxhanddevel~t. 

Thecxxqzuter~jobinvento~ foruse inthesurveywasdevelopedusing 

Microsoft~s QuickBasic version 4.5. ThisthirdgenerationlanguageallcxJedUs 

towrite, testandplace scftware inthe field inlessthan fmrweeks. 

Modular develmt and formtive evaluation were used throughout this 

Pro==. 

Thesoftwareconsistsoftwo ir&pendentmAulesthatare chained 

together. The firstn&uleccnta~~eBiographicdl and Backgrounasections 

of the survey, and has 13 subpmcedums and 2576 lines of code. Ihe second 

mduleisthemty-TaskSectionandhas18 WW al-d 1341lines of 

code. Itcontainstwomajorprmedmes: thefirsthasthejobinclnnbent 

reviewingeachof407 tasksandidentifyingthetasksperformedinhisorher 

job; thesecondprocedure has the job immibent rating relative time spent, 

with a 

nine-point scale, onlythosetasksidentified inthe firstpe. 

Inbothprmedmesthe incmbentsmd 'backup' tirevieworchangeansw~ 

and ratings.

Tosunmmrize, informationgeneratedfrranthesecondmodule includedthe 

identificationoftasksperfoxmdby incmbents intheirpresentjobandtime 

ratingdata foreachtask. !Iheprqzmalsocollectsdataontheammntof 

t~~inus~thenrodule,hcrwroaniytimesan~tbacked~and 

changeda-, andhcwmanyerror messageswerewrittmtothescrem. 

Alldatawerewrittenouttodata filesamlcapbredon floppy disk. 

F&sear&Design 

Ihetwotieper&ntvariableswereformandsquence: thetwofomswere 

paper/pencil (P) and amputer-based (C); and the three types of sequence 

were: (1) P f011Wed by P, (2) P follmed by c, and (3) c follcweq by P. _. 

(Note: Although a fcurth variation of C folluwed by C would have been 

desirable, it was a condition of using this population that all persms had 

totakeapaper/pencilsumey.. Wejudgedthataskingairm2ntotakethesame 

~~~~~timeswouldaffectreliabilityofthedata.) 

Test/m-testplxlc&WS were used for cmparing the two survey 

achninistrationsandeachrespondentservedashisorher~control. tis 

wasnecessaryasno~ioncauldbemadeabcortthecentrdl~~and 

dispersion of scorn. EachresporxBnthadauniquepatternofresponseswhi~ 

describeiihisorherjob. 

me three trehnents were: 

Time 1 (Tl) 

Treatment #l..... P follWe5 by P (P-P) 

Treatment #2..... P II 11 c (P-c) 

Treatment #3..... c II 81 P (C-P) 

~hne2 (T2) Oric&alN 

Inte.n~ofelapsedtimebetweenadmini&mtions, anumbrof factorswere 

considered. A reviewoftheliteraturegenerallysupportedthedecisionthat 

alapseoftwotofaurweeksbetweentimeoneandtime~wlouldbe 

acceptable. 

!I~E purpcse of treatment #I (P-P) was to prwvide a baseline against whiti 

theather~~~~dbeccanparedwithrespecttothevariabilityof 

i3?st/re-test. Althcughaperfectmatiwasnotexpected,areasonablyhigh 

matchwas anticipated, as jobs seldmchangemch intwoweeks. 

~eP-Can;iC-P~~~prwidedccsnparisondatato examine effects due 

to form of administration. For exanp?le, the second survey might yield a 

hi~~nwlberoftaskssel~thanthefirst,sincethefirstacbninistration 

might sensitize imumbents to the nature of their jobs. However, this effect 

couldbecon.fouMed fortheP-CandC-Psequ~because itispossiblethat 

allCadmini&rationswouldyieldahighern~~b~rof responsesbecaUseC 

mspoMentswereforcedtolookateach separatetask,whereaSwiththepaper 

versiontheymightaccidentallysmnpasta relevanttasksta~t. 

2Mministration and Data Collection 

AboutMay24, the traditional P.suveywasdispat&edAirFo rce-wide by 

CMSQusingtraditionalmxms andmethodsofdistribution. The first 

71 

40 

20 

21 

I

. 

~~whosesuweyswereretumedto~wereiTlanediatelysentasecond 

P administration with an explanatory letter. ByJuly 5, 30 secondreturns 

were received and used for analysis. 

For this research effort, printed swxeys for the P-C -bEnt were handdelivered 

to the Survey control Officers ax Bery&rm Kelly, aMI Randolph 

?d?E6. !Ihe cmputerized version was then giv& within'2-4 weeks folluihg the 

paper administration. For all ccaqxterized administrations, local Z-248 

Ew?sonal~(pcs)wereused. Each xxspotient used a separate disk for 

taking the sumey. For the C-P treatmsnt a reverse process wasusedstarting 

withE&wksandLacklandAFBs. Allpapsrversionsbiemmachine-scannedby 

~tocreateadatafileondisk,~~wasthenma~arrdmergedwiththe . . 

datastrings collectedviatheFCs. 

Results 

Any inteqxetationofthedatamsttah intoacccuntthatthe Chapel 

Mamgement Specialw, selected forconveniencemaynotbegenemlly 

mqmsentative of all Air Fame jobs. Sewti, themmberofairmen forea& 

treatment was sndll. CIhhd, there is no "correct" oridedl selection of 

eitherjcbtas~ortherelativethespentratings,hence statisticsthat 

relyoncentraltendemy cculdnotbeused. Finally, arnmberofmre 

qualitative techniques such as "think aloud" protccols or follow-up 

questionnairesthatcouldhaveaddmssed sane of the ~issuesraisedby 

the5edatawerenotpossible inthegivmthe frame. 

Table 1sumnarizesdataintenrksofthetotal numberoftaskssel~by 

all individuals for each administration, the mean number of tasks selected by 

ea~person,thepercentoftasksselectedinboththefirstandsecond 

administrations, andthemeanchange innuinberoftasks selededper 

individual. 

Table1 

summary of TaskSelectionData 

N=30 N=17 N=17 

Treatment Pl - P2 Pl - C2 Cl - P2 

. . 

Total tasks select. 3,684 3,878 1,808 1,950 2,418 2,267 

Mean each respond. 123 129 106 115 142 133 

Selectedbothadm. 81% 82% 75% 

-change . +6.5% +8.4% -8.9% . 

. Anotherresearchissuewaswhether, forthejobtasksselectedby 

InWmbents, theestimatesof"TimeSpentin PresentJob"wouldvaKyasa 

function of type of admhhtmtion. (This isestimatedwithaninepoint 

scale:l= "VeKy small amount"). Table2showsthechangeinratingsfrm~e 

first administration to the secmd where, for example, for the Pl - P2 

administration of 2,988 tasks chosen, 1,283 time ratings were the same, 387 

taskswereratedonepointhi~~formoretimespent, 342 wereratedorle 

Point lower forlesstime spent, etc. 

72

Change 

IIf 

If 

1: 

1; 

0 

tl 

t2 

t3 

t4 

t5 

t6 

t7 

t8 

N = 30 N = 17 

Pl - P2 Pl - c2 

# of Tasks 

0 

: 

:: 

124 

239 

342 

1,285 

387 

281 

146 

76 

11 81 

2,98; 

Table 2 

Variation in Time Spent Ratings 

Change # of Tasks 

1; 3 

-6 : 

1; :A 

1; 1:; 

-1 560 154 

tl 231 

t2 137 

t3 84 

t4 78 

t5 

t6 i 

t7 0 

t8 0 

1,488 

Change 

7; 

1; 

1; 

1: 

0 

tl 

t2 

t3 

t4 

t5 

t6 

t7 

t8 

cy =-17p2 

# of Tasks 

x 

16 

:: 

113 

229 

364 

657 

133 

ii; 

19 

i 

i 

1,812 

We also wanted to examine the data which reflected job tasks chosen for 

on administration but not the other, as to whether estimates of "Time 

Spent..." varied between the two forms of administration. These data are 

displayed in graphic form in Tables 3, 4 and 5. 

Table 3 

Time Ratings Pl - P2 

not selected both 

N = 30 

Pl not P2 = 696 

P2 not Pl = 890 

Table 4 

Time Ratings Pl - C2 


Number of Tasks 

600( 

Y 

1 2 3 4 6 6 ? a 9 

Rating Scale 


200 

m rrlrctrd pi not p2 6X52 wkoted p2 not pl 

Pl not C2 = 320 

C2 not Pl = 460 

60 / 

0 

1 2 3 4 6 6 7 8 9 

Rating Scale 

- WlWted Pl not C2 6i? ukctod ~2 not Pl 

73

Table 5 

Time Ratings Cl - P2 


N = 17 

Cl not P2 = 606 

P2 not Cl = 455 

Discussion 


260 I - - - - 

Rating Scale 

- _ - - 

W aelected cl not ~2 TX wlected ~2 not cl 

Perhapsthemcstiq2ortantbenefitsofth.is researcharethat~ncrw 

hasaprotoQpeccznputerizedsurvey, andthepotential of autcxnationhasbeen 

demonstrated. Ajobsurveywasadministesedwithmirroccanpprter (andcouldbs 

distributed electmnically). Ihe data were captured electronically and 

analysiswas accq?lished ina fewhours. 

Tables 4 and5de1~nstmtethene&foradditional research where there 

seems to be a disproportionate rkmber of responses for "1" ard "5". Inforrflal 

feedbacksuggeststhatthe instructions for estimating "TimsSpentonPresent 

JoWcanbeintfxpreteiiin~~~~thanoneway. Ihisszqgeststhatfurther 

studyoftheeffecksofthewordingofthese instructionsiswarrant32d. 

IheP-Pbaselinedata, CornparedwithPl-CZ, suggestthatthese foms of 

administration are ccgclparable in terns of test/m-test. They both indicate 

thattakingtheinventorycauses increasedssnsitivitytoone'sjob interms 

ofnumberoftaskschosen. IfsamethingliketheHawthomEffectwas 

vting, an incmaseintasksselect&inP forthe C-PtreaWantshouldbe 

evident, and is not found. Datathatsupp0rttheconfcundingeffectsof 

having to see each item on the coquter are evident. In Table 1, for Pl-C2 

andCl-P2,therewasan i,mreze in the mean rmber of items selected of 8.4% 

whenusingtheccq~ter forthesecondadministrationandadscredse of 8.9% 

--w paper- 

Theccq@ervemionalsodemonstrated,onalirn.itedbasis, huwasurvey 

can be "bran&&" or f~ppeciff to only display job tasks relevant for a given 

personbasedonpriormsponses.Improvedaaxraq'andsignifidanttime 

saviqscculdresult. Also, theabilityofthecq+kertopmdessdata 

duringtheadministrationofajobsurveycouldalsoresultinnewmethodsand 

levelsofmviewbyjob inxnkents. 

74 

I

Recammenaations 

Werecxararrendthat~UsAFbeginirrmrediatelytodevelap~~jab 

inventories. ~npartiaikr: (l)aqrehensiveelectmnicnsWrkneedsto 

bedesignedthatwillallawaKsQtoelectranicallya~, p-and 

archive occupational surveys. (lN?.monnel oonaept III" (E-3) cklm?ntly under 

d~~~tbyAirForceMilitaryEersoMelcenter,withgatewayS~each 

Air ForceconsOlidatedEWe Personnel Office, shcnildbe investigated further): 

(2) because the CcBlpxzter offers display, rwiew and reporting capabilities not 

available with traditional. paper administration, we strongly recQBnmerd that 

planning efforts to use this capability be undertakenas soonaspossible; and 

(3) ~isneededtooptimizedesignof~~izedsurveyswhichmi~t _. 

im=ludeb~or~ing,differentidLfsedbackbasedonindiviltual 

~tt=nsofresponse,Pro=du= for review and correction of responses by 

unmbe.nts, arvlothersumey$zsign feakxreswhichareuniquetothe 

ccslpnrter- 

misreportdescribes 3xsezu&whichmmpamdpaperaMpmcilve.rsus 

cxaqmter-baseiadministmtionofa USAFJobInventoxy forthechapel 

ManagementspeCialty. !tkst/mMzest administration pmmdures, with each 

subjectactingasa~n~l,wereused.~edatashawtherewasa81%match 

for Paper (P) followed by P; 82% for P followed by aqute.r (C) and 75% for C 

follow& by P. !the data suggest that aqxter-based administration will 

impmvetheyieldoftasks chosen, butthatitsusetocollect estimates of 

"Time Spent..." ratingsispmblematicwiththe cumentsumey instructions. 

Cmputerizing of job surveys is feasible. Additional efforts are need&i to 

cr&ze a valid, reliable Air Force wide automated system 

75

NPT ENEANCEN.ENTS To TEE 

OCCUPATIODlAIi RBSBARCE DATA BAXX: 

Joe Menchaca, Jr., Capt, USAF 

Jody A. Guthals, 2Lt, USAF 

Air Force Human Resources Laboratory (AFHRL/MOD) 

Brooks Air Force Base, Texas 

Lou Olivier, Glenda Pfeiffer 

OAO Corporation 

INTRODUCTION 

The Occupational Research Data Bank (ORDB) is an on-line, data 

repository providing users immediate access to a variety of 

occupational information about Air Force specialties (AFS) and the 

people who perform duties in them. The combination of several 

unique subsystems gives ORDB the ability to retrieve many otherwise 

dispersed sets of data from a consolidated data bank. Instead of 

the normal laborious and time-consuming task of finding personnel 

background information by formal requests to computer data bases, 

searching Air Force regulations, or searching a library of 

technical reports and previous studies, the ORDB allows the user to 

streamline occupational data retrieval by providing easy access to 

data from all these sources. Two years ago a paper was presented 

discussing some planned enhancements and applications of the ORDB 

to assist manpower, personnel, and training (MPT) decision makers 

and analysts in the acquisition of Air Force weapon systems 

(Longmire and Short 1988). The purpose of this paper is to 

describe implementation of these enhancements and discuss some 

actual MPT applications. 

BACKGROUND/OVERVIEW 

Plans for the development of the ORDB began in 1978. While 

vast quantities of information were available about Air Force 

occupations, the data were widely dispersed among various different 

organizations, with many different formats, and degrees of 

coverage. At that time, the Air Force Human Resources Laboratory 

(AFHRL) maintained 29 different types of computer files from by 

many different sources. Also, AFHRL housed Air Force technical 

reports dating back to 1943 and was the official Air Force 

repository of all occupational study data files generated by the 

USAF Occupational Measurement Center (USAFOMC). Other organizations 

(HQ USAF, ATC, AFMPC, etc.) had their own data bases and generated 

numerous recurring reports, regulations, and studies. 

Occupational researchers needed consolidated information that was 

easily and rapidly accessible.

The ORDB was designed and continues to reside on the AFHRI, 

UNISYS 1100/82 mainframe at Brooks, Texas. The programs within 

ORDB were created in a user-friendly, tutorial environment so that 

even the most novice of computer users could access its 

information. Beyond the original scope of ORDB's development, the 

current enhancements to the system focus on ways to make the ystem 

more useful to a variety of users such as researchers, OMC 

analysts, and MPT managers who determine MPT requirements for 

already existing weapon systems and who must forecast similar 

requirements early in the planning stages of new weapon system 

acquisitions (Longmire and Short, 1988). 

The ORDB provides storage and on-line retrieval of a variety 

of occupational data within its seven major subsystems. Figure 1 

diagrams the ORDB. It also shows each subsystem's primary area of 

use. The check marks within the circles indicate new subsystems 

which are described beldw. 

(1) The CODAP [Comprehensive Occupational Data Analysis Procframs) 

Subsvstem allows rapid retrieval of reports from the most recent 

occupational study on an AFS. 

(2) The Enlisted AFSC Information Subsvstem (EAIS) contains AFSC 

descriptions (for ladder and career field), progression ladders, 

and prerequisites for the years 1978 to the present, and number 

change history (1965 - present). 

(3) The Officer AFSC Information Subsvstem (OAIS) allows retrieval 

of officer AFSC information similar to that available in the EAIS 

(1976 - present). 

Support 

Weapon System 

Officer AFSC 

Figure 1. ORDB BUBBYBTEMB 

77

(4) The Computer-Assisted Reference Locator (CARL) provides 

listings of occupational studies, technical reports, films, and 

, other documents related to Air Force jobs. 

(5) The Enlisted Statistical Subsvstem (ESS) provides statistical 

distributions of selected data elements for enlisted personnel on 

the Uniform Airman Record (UAR) file at the end of the calendar 

year as well as personnel with records on the Pipeline Management 

System (PMS) file (1987 - present). 

(6) The Archived Statistics Subsvstem contains pre-generated 

statistics on demographic, aptitude, education, training, turnover, __ 

and duty-related information on Air Force enlisted personnel 

previously generated for calendar years 1980-1986. The CY 89 task 

phased out pre-generated statistics which are now accessed from 

this subsystem. 

(7) The Weaoon Svstem Information Subsvstem (WSIS) permits access 

and retrieval of Air Force occupational information by weapon 

system, special experience identifier (SEX), or AFSC. 

The capabilities which ORDB developers are seeking are best 

summarized as an up-to-date occupational research data base, 

containing a wide variety of both historical and current 

information on United States Air Force enlisted and officer career 

fields. The CARL subsystem is continually updated as material 

becomes available, AFSC descriptions are updated semi-annually, 

Occupational Measurement Center study reports are loaded into the 

system on a continual basis as soon as an analysis is complete, and 

all modifications are documented in the User's Manual and/or 

Procedural Guide on a day-to-day basis (Olivier et.al.). Overall, 

the conscientious effort to update and maintain the ORDB is the key 

to its success. 

RECEN!I? ENHANCEMENTS 

Presently, research is continuing with the ORDB to facilitate 

the planning and analysis of MPT requirements earlier in the weapon 

system acquisition process. There are presently two primary areas 

of improvement. First was the development of the Weapon System 

Information Subsystem (WSIS). A second major enhancement was the 

conversion of the Statistical Variable Subsystem from aggregated 

occupational statistics to current user-defined population 

statistic variables from the UAR and PMS. 

As was mentioned earlier, the WSIS allows users to obtain 

information cross referenced between a specific Air Force weapon 

system, SEIs and enlisted AFSCs or any combination thereof. An 

enlisted AFSC is a six character field (i.e. 41131C) including 

suffix. Prefixes are not used. Special Experience Identifiers are 

three digit numeric codes which identify special experience not 

otherwise reflected in the USAF enlisted classification structure. 

SEIS are used to achieve greater flexibility in the management of 

personnel, particularly in the quick identification of specially 

_’ 

78

qualified resources to support contingency operations or 

situations. All SE1 information was derived from AFR 39-1, Airman 

Classification and within the WSIS has been matched to appropriate 

weapons systems and AFSCs. 

The WSIS can retrieve information by calendar year beginning 

with a base year of 1987. It allows a user to enter a weapon 

system and obtain all the related enlisted AFSCs and SEIs, or vice 

versa. Weapon system identification was derived from the 1988 AF 

Magazine Almanac with the intention of creating a comprehensive 

listing of existing/active USAF Weapon Systems including all 

airplanes, helicopters, missiles, etc. The data are arranged by 

mission type (i.e. Strategic Bombers, Trainers, Helicopters, etc.) 

with the actual weapon systems listed for each mission type. 

In the past, statistics on 125 variables were computed each 

year against the most current UAR, Airmen Gain/Loss (AGL), and PMS 

data files and then uploaded into the system. With the new 

Enlisted Statistical Subsystem (ESS), this process has recently 

changed for the sake of providing current data as soon as it 

becomes available. As was mentioned earlier, the ESS is comprised 

of records of all enlisted personnel on the UAR file as of 31 Dee 

of each calendar year and all personnel with records on the PMS 

file who completed training in that year. Statistical data is 

requested by Duty AFSC or PMS Course ID. One and two-way 

distributions for selected variables are included in the output. 

Where appropriate, means, standard deviations, and row/column 

counts are also listed. 

The UAR as'of 31 Dee for each year contains all enlisted 

personnel to include active duty projected gains, some recent 

losses, etc. AFHRL personnel scrub the file so that the resultant 

file contains records of all enlisted personnel on "active duty". 

Only certain selected fields are used in the ORDB ESS. A total of 

44 UAR variables have been selected for the ESS. 

The PMS files contain records of all personnel who attended 

training at Air Force Technical Schools. For the purpose of the 

ESS, only active duty enlisted personnel who have completed 

training in the given year are selected. Four variables have been 

selected for the ESS to bring the total to 48 variables. The 

variables are listed in Table 1. For a two way distribution, one 

variable must be marked with an asterisk. 

At present, information stored in ORDB is AFSC-specific. 

Current modifications to the Weapon System Information Subsystem 

(WSIS) and the Enlisted Statistical Subsystem will soon yield 

additional information by weapon system. This capability should be 

available by the end of CY90. The result will be an improved 

occupational research data source containing a wide variety of both 

historical and current information on enlisted and officer career 

fields of the United States Air Force. There has also been a 

recent proposal to place the data base on compact disc, for use on 

a write-once-read-many (WORM) drive. A significant increase in the 

number of users who could access the system would result. 

79

I. 

1 Duty AFSC 

2*Secondary AFSC 

' 3*ASVAB-General 

4*Subst. Abuse-Lvl. 

5 Primary AFSC Prefix 

6 Secondary AFSC Pre. 

7 Base of Assignment 

8 Major Command 

9 SEI-PAFSC-1st 

lO*Age-Years 

11 SEI-PAFSC-4th 

12 Ethnic Group 

13*Marital Status 

14*PMS Training Length 

15 PMS Final Rate 

16 PMS Course ID (AFSC) 

17 PMS Term. Reason 

18 Primary AFSC 35*Number of Dependents 

19*ASVAB-Electronic 36*ASVAB-Mechanical 

20*ASVAB Admin. 37 Unfavorable Info. File 

2lDuty AFSC Prefix 38 Substance Abuse-Type 

22*AFQT Score 39*APR-Most Recent 

23*Current Grade 40*EPR-Most Recent 

24*Time in Grade 41*Current Flying Status 

25*TAFMS-Months 42 Current Location 

26*Cat. of Enlist. 43 SEI-PAFSC-2nd 

27 SEI-PAFSC-3rd 44 Security Clearance 

28*Training Status 45 SEI-PAFSC-5th 

29 Mental C,ategory 46 Cat. of Enlisted Status 

30 Academic Ed. 47*Program Element Code 

31*Sex 48*Conus-Overseas 

32 Race 

33 Prof. Military Education 

34 Military Status of Spouse 

Table 1. ES8 VARIABLES 

MPT APPLICATIONS 

ORDB relates many dispersed sets of data into a consolidated, 

rapidly accessible data base. Instead of the normal laborious and 

time-consuming task of finding background information by formal 

requests to computer data bases, searching Air Force regulations, 

or searching a library of technical reports and previous studies, 

the ORDB allows users to streamline data retrieval while saving 

computer resources. ORDB is valuable for aiding research design, 

conducting historical and cross-specialty analyses, and guarding 

against duplication of effort and inconsistencies between data 

bases. ORDB access facilitates planning and analysis support of 

MPT requirements earlier in the weapon system acquisition process. 

The WSIS is proving to be helpful to MPT planners and analysts 

requiring occupational information by AFSC or total weapon system. 

Researchers within AFHRL are primary users of the ORDB. A 

recent example of the ORDB's many uses was a CODAP retrieval of all 

duty descriptions for certain AFSCs. These descriptions were then 

used in the development of a taxonomy determining skill knowledge 

and ability to be used in weapon system acquisition to determine 

MPT requirements. The ORDB has been identified as a key component 

of several high-priority AFHRL research projects. Some of these 

are the Training Decisions System (TDS), the Advanced On-the-Job 

Training System (AOTS), Job Performance Measurement, and the Basic 

Job Skills Project. The advanced On-the-Job Training System (AOTS) 

program used the ORDB CODAP subsystem for their initial research. 

Future use of the ORDB to support the program at the base level, 

called the Base Training System (BTS), is presently being 

Considered. The proposed portable WORM drive ORDB would enable 

more People to use the ORDB. 

The ORDB is a critical resource to projects underway as part 

of the MPT Integration effort. Work at ASD/ALH, the Air Force MPT 

Directorate, Continues to require use of the ORDB. DOD directive 

5000.53 calls for MPT integration early in weapon system 

-. I, .>-.w- . .- - ._ 

8 0

acquisition. ASD/ALH made extensive use of the ORDB in April 1989 

when a rapid analysis of MPT and safety factors for the A-16 was 

called for. Included in this analysis was a data retrieval of the 

target population, maintenance personnel and demographics. The 

study's objective was to determine the target maintenance personnel 

who were applicable to the A-16 and what their job entails. 

There are several other key ORDB users. At the Occupational 

Measurement Center, the ORDB is used to provide quick in-depth 

orientation to AFSCs and as a rapid response tool to high level 

management queries. The Training Performance Data Center (TPDC) in 

Orlando, Florida, has benefitted from accessing the system to 

obtain prompt, up-to-date data on Air Force specialty structures 

which have in turn been made available to a number of DOD agencies. 

TPDC researchers will soon be providing the Laboratory with a 

process mapping eguipme,nt-to-occupations, which will be a vital 

component in the Weapon System Information Subsystems development. 

The Air Force Management Engineering Agency (AFMEA) is hoping to 

use the ORDB as a cross reference for manpower studies. AFMEA is 

conducting special interest studies in support of an Air Staff 

requested study to determine which career fields report excessive 

man hours. Finally, AFMEA is doing a comparison of skill and 

experience to determine changes in the force structure. The ORDB 

will provide needed information to find a relative value of 

experience in Air Force personnel. 

Access to the ORDB by users outside the Laboratory is 

available via commercial and DSN telephone lines and through the 

Defense Data Network (DDN), a capability which conveniently serves 

a number of outside agencies currently having or requesting access. 

REFERENCES 

Longmire, K. M., and Short, L. 0. (1988, December). The 

Occupational Research Data Bank: A Key to MPTS Analysis. 

Proceedinas of the 30th Annual Conference of the Military 

Testins Association (262-267). Arlington, VA. 

Longmire, K. M., and Short, L. 0. (July 1989) Occuoational 

Research Data Bank: A Key to MPTS Analysis Support 

(AFHRL-TP-88-71). Brooks AFB, TX: Manpower and Personnel 

Division, Air Force Human Resources Laboratory. 

Olivier, L, Pfeiffer, G., and Menchaca. J. Jr. (January 1990) 

Occunational Research Data Bank User's Manual 

(AFHRL-TP-89-62). Brooks AFB, TX: Manpower and 

Personnel Division, Air Force Human Resources Laboratory. 

81 

,

ASCII CODAP: PROGRESS REPORT ON APPLICATIONS 

OF ADVANCED OCCUPATIONAL ANALYSIS SOFTWARE * 

William J. Phalen, Air Force Human Resources Laboratory 

Jimmy L. Mitchell, McDonnell Douglas Missile Systems Company 

Darryl K. Hand, Metrica, Inc. 

Abstract 

The development of automated procedures for selecting job and task module types from a 

hierarchical clustering solution and the interpretive software associated with these procedures were 

reported at the 1987 and 1988 MTA conferences. Over the last two years, operational testing and 

evaluation of this software has demonstrated its value in terms of enhanced analytic capabilities and 

accelerated completion of the analytic process. This report provides informative examples and 

experiences to illustrate how complex analyses have been accomplished by using the job and task 

module type selection and interpretation software to extract, organize, and display latent bits of 

relevant information from a COCAP database. 

Introduction 

The principal occupational analysis technology in the United States Air Force is the 

Task Inventory/Comprehensive Occupational Data Analysis Programs (CODAP) approach. 

This system has supported a major occupational research program within the Air Force 

Human Resources Laboratory (AFHRL) since 1962 (Morsh, 1964; Christal, 1974), and an 

operational occupational analysis capability within Air Training Command’s USAF 

Occupational Measurement Squadron since 1967 (Driskill, Mitchell, & Tartell, 1980; 

Weissmuller, Tartell, & Phalen, 1988). The CODAP system is now used by all the U.S. and 

many allied military services, as well as a number of other government agencies, academic 

institutions, and some private industries (Christal & Weissmuller, 1988; Mitchell, 19SS). 

Recently, the CODAP system was rewritten to make it more efficient and to expand its 

capabilities (Phalen, Mitchell & Staley, 1987). In the process of developing this new ASCII 

CODAP system, several major innovative programs were created to extend the capabilities 

of the system for assisting analysts in identifying and interpreting potentially significant jobs 

(groups of similar cases) and task modules (groups of co-performed tasks). Initial 

operational tests of these automated analysis programs were conducted and preliminary 

results were reported at previous conferences (Phalen, Staley, Sr Mitchell, 1988; Mitchell, 

Phalen, Haynes, & Hand, 1989). 

Over the last two years, operational testing and evaluation of new interpretive software 

has continued and these programs have demonstrated their value in terms of enhanced 

analytic capabilities and their potential to accelerate completion of an occupational analysis. 

Some of these programs have been released into the operational version of ASCII CODAP 

while others remain experimental; i.e., they are not yet in final operational form. In this 

presentation, we want to provide some examples of this continuing work. Such examples will 

also serve to illustrate how complex analyses can be accomplished more expeditiously by 

using the job and task module type interpretation software to extract, organize, and displ:ly 

latent bits of relevant information from an occupation-specific CODAP database. 

* Approved for Public Release; Export Authority 22CFR125.4 (b)(13). 

. 

-- 

! 

I I 

I 

I 

1

A Suite of Advanced Interpretive Assistance Programs 

A set of seven programs has evolved gradually over the last few years which are meant to 

assist analysts in interpreting job and task clusters; some of these were completed in time to 

be released with the initial version of ASCII CODAP. Others are still being refined and thus 

are not yet ready for operational use. It is helpful to have an overview of the entire set of 

programs, so everyone can see how the programs relate to one another and to their ultimate 

objective. These programs are shown in Figure 1 below. 

Identify Appropriate Clusters 

IdenNy/Display Core Tasks 

IdenIify/I>isplay Core C&es 

Relationship of Task Clusters 

lo Job Clusters 

Case Cl uslers Task Cl usfers 

(Job Types) (Task Modules) 

JOBTYP MODT YP 

I I 

CORTAS TASSET 

I 

CASSE T 

I 

CORCAS 

JOBMOD 

Figure 1. The Set of Advanced Interpretive Assistance Programs 

(Boldface = operational program in ASCII CODAP; Ifafic = experimental, not yet released). 

The operational programs are briefly described as follows: 

JOBTYP automatically identifies stages in most branches of a hierarchical clustering 

DIAGRM which represent the “best” candidates for job types. First, core task homogeneity, 

task discrimination, a group size weight, and a loss in “between” overlap for merging stages 

are calculated for all stages and these values are used to compute an initial evaluation value 

(for JOBTYP equations, see Haynes, 1989). This value is used to pick three sets of initial 

stages; these are then inserted into a super/subgroup matrix for additional pairwise 

evaluation, in order to further refine the selection of candidate job type groups Three final 

sets of stages (primary, secondary, and tertiary groups) are then reported for the analyst to 

use as starting points for selecting final job types. 

CORTAS compares a set of group job descriptions (“contextual” groups) in terms of 

number of core tasks performed, percent members performing and time spent on each core 

case, and the ability of each core task to discriminate each group from all other groups in 

the set. It also computes for each group an overall measure of within-group overlap called 

the “core task homogeneity index”, an overall measure of between-group difference called the 

“index of average core task discrimination per unit of core task homogeneity”, and an 

asymmetric measure of the extent to which each group in the set qualifies as a subgroup or 

supergroup of every other group in the set. 

TASSET compares clusters of tasks (modules) in terms of the degree to which each cluster 

of tasks is co-performed with every other task cluster (supergroup/subgroup matrix). Within 

each cluster, TASSET computes the average co-performance of each task with every other 

task in the cluster (representativeness index) and the difference in average co-performance 

of the same tasks with all other task clusters (discrimination index). TASSET also identifies 

83

tasks which meet the co-performance criterion for inclusion in clusters in which it was not 

placed (potential core tasks), as well as tasks that are highly co-performed with all clusters 

except the cluster under consideration (negatively unique tasks). 

The experimental programs are as follow: 

MODTYP - Just as the JOBTYP program automatically selects from a hierarchical 

clustering of cases the “best” set of job types based on similarity of time spent across tasks, 

the MODTYP (module typing) program selects from a hierarchical clustering of tasks the 

“best” set of task module types based on task co-performance across cases. The term “best” 

means that the evaluation algorithm initially optimizes on four criteria simultaneously (i.e., 

within-group homogeneity, between-group discrimination, group size, and drop in “between 

overlap” in consecutive stages of the hierarchical clustering). After all stages of the clustering 

. have been evaluated on these criteria, primary, secondary, and tertiary sets of mutually 

exclusive task clusters are selected as first-, second-, and third-best representations of the 

modular structure of the hierarchical clustering solution. The three sets of groups are then 

input to another evaluation algorithm which computes super- and subgroup indices between 

all pairs of groups in the primary solution within the same TPath range. Based on the 

combined results of both evaluations, the sets of groups are revised. The final set of primary 

groups is input to the TASSET and CORCAS programs to provide analytic and interpretive 

data for each primary cluster of tasks. MODTYP output also reports the initial and final sets 

of primary, secondary, and tertiary groups and their evaluation indices. 

In addition to the data summaries of groups noted above, which can be very complex, a 

graph of all final stages in TPATH sequence is generated to help the analyst understand the 

relationship among the possible levels of clustering. An example of such a graph is shown 

in Figure 2. In this case, Level 1 = primary group; Level 2 = secondary group; and Level 

3 = tertiary group. By showing a different symbol for each level, the graph highlights the 

most likely choices of groupings (task modules) for the analyst’s consideration. Used in 

conjunction with the Task Cluster Diagram, this display provides a quick way for analysts to 

make preliminary judgments as to the appropriate groups to select for further evaluation. 

hIODTYP MODULE TYPING TEST RUN R-l Avionics Test Station, AFS 451X7 Page 13 

Graph of All Final Stages in TPATH Sequence 

1 289 578 867 1157 

Stage TPATH Range Level + ____________--__---_ + ------------------- ---- + ___________-___-----___ + -------------- 

321 l- 72 2 __--__ 

384 l- 33 1 Xx 

461 36 - 37 1 X 

575 38- 40 1 X 

766 42 - 43 1 X 

342 46- 72 1 XxXx 

333 l- 43 3 . . . 

362 46- 63 3 . . . 

401 64 - 71 3 . . 

365 73 - 84 3 . . 

469 73- 76 1 X 

Figure 2. Example MODTYP Graph of All Final Stages in TPATH Sequence (AFS 45’1x7) 

84 

.

CORCAS - The CORCAS report characterizes task clusters selected by the analyst for 

further evaluation in terms of the people who most perform it, and especially those principal 

performers whose jobs are concentrated in this task cluster to the exclusion of all or most 

other task clusters. The CORCAS report may contain any type of background variable 

information describing a case that will fit in the allocated space, just as on a PRTVAR 

report; however, “base of assignment” and “job title” are often the most useful variables. An 

example is shown in Figure 3 below. 

CO RCAS CORE CASES FOR TASK MODULES 

Summary Statistics for Target Module ST0046 

CSOOOI Slap 41: PSOOOl 435 to 437 

Page 82 

Description Value Description Value 

Number of tasks in target module 3 Average number of tasks performed by all cases .OS 

Number of core cases in target module 11 Average number of tasks performed by core cases 1.82 

Percent of module time covered by core cases ,~ 70.11 Average percent time spent in module by all cases .OS 

Core case homogeneity index (CCMI) 35.48 Avcragc percent time spent in module by core cases 1.29 

Co-performance Task Title 

22.81 G 208 Evaluate water survival performances of students not wearing pressure suit assemblies 

18.40 K 302 Perform minor repairs of life rafts, such as patching or replacing spray shields 

27.88 K 319 Store life rafts 

Case Level Statistics for Target Module STOW5 (n = 3) 

Core Cases Sorted on Average Task Importance Values 

Sorted 

Avcrap Number Percent Percent 

Number Task of Core of Tasks Time in Performance 

KPATII Grade DAI’SC Supvsd. Base Job Title Importance Modules Performed Module Emphasis 

146 Es 91150 05 Mather NCOIC Admin 78.52 9 100.00 1.99 117.24 

41 I3 91150 ocl Mather Arspc Physlgy 59.71 15 66.67 1.30 67.49 

40 I3 91130 02 Mather Ar. Phy. Spec 56.38 7 66.67 1.65 58.84 

13 I35 91170 01 Brooks Supv Aero Phy 53.47 26 100.00 54 54.08 

202 E3 91150 00 Mather Arspc Phy Spec 52.78 5 66.67 1.22 19.51 

139 Es 91150 04 Mather Asst NCOIC Acad. 42.52 9 66.67 1.40 40.07 

Figure 3. Example CORCAS Report Showing Types of Data Which Can Be Displayed 

This example illustrates how the program can be useful in interpreting task clusters; in this 

case note that almost all cases are individuals assigned to Mather AFB, CA, where the Air 

Force conducts its navigator training, By assessing these data in conjunction with the three 

tasks in the module, an analyst can begin to make sense out of the tentative task module. 

The CORCAS report makes it apparent that the three tasks are co-performed because they 

are all a part of the navigator training course at Mather AFB. Note also that the KPATH 

number for each case is also shown; this means that by crossmapping KPATH sequences and 

analyst-assigned job type names, we could also display the job type for each member (but 

would have to sacrifice some other data in order to have room in the display). We have 

done this experimentally and found it very useful; in some cases, it leads the analyst to 

reconsider the job type names initially assigned. 

CASSET - Whereas CORCAS characterizes a task cluster (module) in terms of those cases 

whose jobs are most representative of the task module, the CASSET program generates 

displays of cases whose jobs are most representative of the job types (group of cases) within 

a given set of job clusters. This approach permits an analyst to quickly characterize a job 

85

type by the salient features of its most representative and discriminating members. Like 

CORCAS, the CASSET report may contain any type of background variable information 

describing a case that will fit in the allocated space, just as on a PRTVAR report, with “base 

of assignment” and “job title” often being the most useful variables to aid analysts’ 

interpretations. 

JOBMOD - The JOBMOD (Job Type versus Task Module mapping) program aggregates 

the case- and task-level indices computed by the four advanced analysis programs and uses 

these aggregate measures to relate task clusters to job types and vice versa. The description 

of job types by a handful of discriminant clusters of tasks, and the association of each task 

cluster with the types of jobs of which it is an important component, is a basic requirement 

for defining and integrating the MPT components of an existing or potential Air’ Force 

specialty or weapons system. If AFSs are to be collapsed or shredded out, or new jobs are 

to be assigned to an occupational area, or old jobs are to be moved to another occupational 

area, such highly summarized, yet meaningfully discriminant hard data are essential (Phalen, 

Staley, & Mitchell, 1989:4-5). 

Within a specialty being studied, a JOBMOD printout is generated for each job group 

showing the relationships of the set of task modules to the cases representative of the job. 

An example of such a printout is given below: 

JOBMOD ANALYSIS OF TASK MODULES WITHIN A JOB GROUP 

ST0035 Centrifuge Operators (n= 5) 

01 = Number of tasks in module 

02 = Average Percent Members Performing (PMP) within group performing task within the nmdute 

03 = Average sum of Percent Time Spent (PTS) for group performing tasks within the module 

04 = Percent of most time-consuming task s’time covered by tasks in module 

OS = Percent of tasks in module which are core tasks for the group 

06 = Percent of the group’s core tasks which are in the module 

07 = Percent of tasks in module which are discriminating or unique for the group 

08 = Percent of group’s discriminating or unique tasks which are in the module 

Module Description 01 02 03 04 05 

GPO001 Hypobaric Chamber Operations 28 19.29 8.28 15.02 .OO 

GPO002 Classroom Instruction 18 3.33 1.24 2.84 .OO 

GPO003 Emergency Escape & Survival 12 1.67 .11 .31 .oo 

GPO004 Parachute Familiarization 20 .oo .oo .oo .oo 

06 07 

.oo .oo 

.oo .oo 

.oo .oo 

.oo .oo 

GPO022 Centrifuge Operations 22 69.09 42.58 87.88 54.55 66.67 100.00 

GPO023 Research Chamber Operations 42 8.57 6.69 9.88 2.38 5.56 33.33 

GPO024 TU-103 Training 6 .oo .oo .oo .oo .oo .OO 

Page 17 

GPO035 General Tasks 3 33.33 1.10 10.40 33.33 5.56 -00 .OO 

Figure 4. Example JOBMOD Report Showing Summary Relationships of Task Mod&s to a Job Group 

08 

.oo 

.oo 

.oo 

.oo 

13.10 

8.33 

.oo

Discussion 

The advanced analysis assistance programs outlined here represent a substantial advance 

in the automation of COD@ analysis, aimed at permitting the occupational analyst to focus 

attention on making critical judgments, rather than spending hours and hours examining 

various case data or task data summaries in an attempt to develop an overall perspective on 

a specialty or occupational area. By using somewhat standardized displays which focus on 

possible job types or task clusters (modules) and defining relationships within and between 

given sets of jobs or modules, these programs permit an analyst to quickly decide what the 

potentially meaningful clusters are, and to proceed with other aspects of the analysis. 

There still remains some work to be done in terms of polishing the three still-experimental 

programs. After they are refined and finalized through additional operational testing, they 

will be released into the operational ASCII CODAP system, and will become available for 

implementation in military occupational analysis programs. Suggestions for additional 

analysis assistance programs which might be needed and useful are also welcome. 

References 

Christal, R.E. (1974). The United States Air Force occunational research proiect (AFHRL-TR-73-75, AD-774 

574). Lackland AFB, TX: Occupational Research Division, Air Force Human Resources Laboratory. 

Christal, R.E., & Wcissmuller, J.J. (1988). Job-task inventory analysis. In S. Gael (Ed), Job analvsis handbook 

for business, industrv, and government. New York: John Wiley and Sons, Inc. (Chapter 9.3). 

Driskill, W. E., Mitchell, J.L., & Tartell, J.E. (1980, October). The Air Force occupational analysis program - 

a changing technology. Proceedings of the 22nd Annual Conference of the Militarv Testing Association. Toronto, 

Ontario, Canada: Canadian Forces Personnel Applied Rescarch Unit. 

Haynes, W.R. (1989, January). JOB-TYPING, Job-typing programs. In: Comnrehensive Occunational Data 

Analvsis Proerams. San Antonio, TX: Analytic Systems Group, The MAXIMA Corporation, Prepared for the 

Air Force Human Resources Laboratory [program documentation available on the AFHRL Unisys computer]. 

Mitchell, J.L. (1988). History of job analysis in military organizations. In S. Gael (Ed), Job analysis handbook 

for business. industrv. and government. New York: John Wiley and Sons, Inc. (Chapter 1.3). 

Mitchell, J.L., Phalen, W.J., Haynes, W.R., & Hand, D.K. (1989, October). Operational testing of ASCII CODAP 

job and task clustering methodologieS (AFHRL-TP-88-74). Brooks AFB, TX: Manpower and Personnel Division, 

Air Force Human Resources Laboratory. 

Morsh, J.E. (1964). Job analysis in the United States Air Force. Personnel Psvcholog, 17, 7-17. 

Phalcn, W.J., Staley, M.R., Sr Mitchell, J.L. (1987, May). New ASCII CODAP programs and products for 

interpreting hierarchical and nonhierarchical clusters. Proceedines of the Sixth International Occunational 

Analvsts’ Workshop. San Antonio, TX: USAF Occupational Measurcmcnt Center. 

Phalen, W:J., Staley, M.R., & Mitchcil, J.L. (1988, Dcccmber). ASCII CODAP programs for selecting and 

interpreting job and task clusters. Proceedings oft he 30th Annual Conference of the Militarv Testine Association, 

Arlington, Virginia: U.S. Army Research Institute. 

Weissmuller, J.J., Tartell, J.E., & Phalen, W.J. (1988, December). Introduction to operational ASCII CODAP: 

An overview. Proceedings of the 30th Annual Conference of the Militarv Testing Association. Arlington, VA: 

U.S.Army Research Institute. 

87

-1-b---- .--. -__ 

----. .~ 

PROFESSIONEL SUCCESS OF FORMER OFFICERS IN CIVILIAN OCCUPATIONS 

Paul Klein 

Studying at Federal Armed Forces Universities 

Owing to the fact that the recruitment of personnel for military service was 

becoming increasingly difficult, the Federal Minister of Defense set up ,a commission 

to reorganize education and training in the Federal Armed Forces. In 

mid-1971, the commission presented a report suggesting, among other things, the 

reorganization of education and training for officers. In doing so, the conunission 

proceeded on the assumption that only by providing a system of education 

and training which - besides military requirements - also “considers to an increasing 

extent the soldiers’ individual interests regarding further education 

could we expect the Federal Armed Forces to become more attractive for volunteers, 

resulting in an increasing number of applicants” (Lippert/Zabel 1977, 

page 52). 

With respect to officer education and training, this statement consequently led 

up to the introduction of an academic course of studies as part of the officer 

education program. On the one.hand, this course of study was to facilitate the 

transition to civilian occupations for temporary-career volunteers after leaving 

the armed forces, thus making this type of career again more attractive for 

volunteers. On the other hand, it was also expected to be of benefit to all 

officers in the course of their service, in particular to regular officers with 

staff assignments, as the commission assumed that “the functions of officers in 

the fields of leadership, organization, training, and their responsibilities 

towards their subordinates today make different demands on them than they did in 

the past, and that these demands can hardly be met by a system of officer education 

and training which emphasizes a rather practical approach, i.e., passing 

on experience previously gained” (Ellwein et. al. 1974, page 12). Finally, the 

course of studies was to provide an alternative for regular officers who - for 

whatever reasons - might decide to correct their original choice of occupation. 

For pragmatical, economical, and academical reasons the commission suggested 

that the Federal Armed Forces should establish their own universities. Let tures 

at these universities commenced on 1 October 1973 in Munich and Hamburg. 

For all officers with an extended period of enlistment, studying at one of the 

two Federal Armed Forces universities is an obligatory part of their education. 

An officer has three and a half years to complete his course of study. To make 

maximum use of this study period, which is rather short as compared with courses 

at civilian universities, studies are based on the trimester system. 

When the universities opened in 1973, the courses offered in Hamburg and Munich 

included mechanical engineering, electronics , economics and managerial science 

as well as pedagogics, with additional courses provided in Munich in the f ieids 

of aerospace engineering, civil engineering including geodesy, and computer 

science. Additional courses have been added in the meantime. 

88

From a technical point of view, there are no major differences between the 

courses of studies provided at the two Federal Armed Forces universities and the 

corresponding courses provided at civilian universities. Just like there, 

studies are completed by taking the diploma examination. Students who pass the 

examination successfully are conferred an academic degree, such as “Diplomingenieur” 

. 

Concept and Conduct of the Study- 

In 1983 the Federal Minister of Defense assigned to the Federal Armed Forces 

Institute of Social Sciences the task of conducting a study on the opportunities 

and problems involved in the transition of officers with extended periods of 

enlistment to the civilian working life. The study was to consider both officers 

who completed a course of studies at the Federal Armed Forces universities and 

temporary-career volunteers with an extended period of enlistment who left the 

Federal Armed Forces at the end of their service without an academic education, 

as well as jet pilots and weapons system operators who retired from service when 

reaching the age of 41. 

The study was designed as what is called a panel survey, i.e.; all the officers 

with an extended period of enlistment who left the armed forces in 1984 and 1985 

were questioned for the first time when retiring from active service, and a 

second time three and a half years later using a standardized questionnaire. The 

results given now in this presentation have been obtained from the second survey 

among the retired officers. This second survey was conducted in late 1987 

through early 1989 and included almost 60 % of all temporary-career volunteers 

with a twelve years period of enlistment, as well as the pilots who left the 

Federal Armed Forces in 1984 and 1985. 

Results 

If taking for granted that the results obtained are representative of officers 

with an extended period of enlistment who have retired from active service, we 

may say that those who graduated from a Federal Armed Forces university have 

managed to become integrated into the civilian working life. 

. Using the answers obtained from the officers as a basis, we may permit ourselves 

to state that the majority of university graduates had no major difficulties in 

adapting themselves to the demands made in the civilian sphere, and that they 

were successful in their subsequent civilian careers. Except for pedagogics it 

is obvious that all courses of studies provided at the Federal Armed Forces universities 

may well be said to pave the way also for a civilian career. 

Owing to the fact that the situation on the labor market was extremely unfavorable, 

it was difficult for those who graduated in pedagogics to find a civilian 

occupation closely related to their field of studies in the Federal Republic 

of Germany. The fact that they nevertheless managed to find occupations, even 

though many times outside the field of pedagogics, testifies to the flexibility 

and adaptability of these officers. 

89

Some three and a half years after leaving the Federal Armed Forces 80.8 % of the 

420 university graduates who had been questioned were employed, 1.9 % were 

trainees, and 0.2 % were out of work. Of those employed, 72.1 % worked for private 

enterprises and 18.6 % had joined the civil service. 7.7 % had set up on 

their own by that time, or worked freelance. 

There were clear differences with regard to satisfaction on the job, the income 

situation, and career prospects between officers who had decided to work for a 

private enterprise and those who had decided in favor of the civil service. 

Generally speaking, it may be said that those who chose the “safe” way of a 

civil service career - possibly because thinking along the lines of job security 

and shying away from taking risks - had to pay the price by having to put up 

with limited perspectives regarding income and promotion. 

Since in the Federal Republic of Germany the pay grades of officers are comparable 

to the pay grades of’other civil service members we were able to find 

out that the majority of the officers who decided in favor of a civil service 

career had not achieved a higher-ranking position as compared with the last 

position they held in their military career. Correspondingly, the same may be 

said of their financial situation. 

Those questioned who had opted for private enterprise revealed a quite different 

development. A mere 7.1 % of them stated that they earned less now than they had 

in their last assignment in the armed fores, as opposed to 80.0 % who said that 

their income had increased slightly or even considerably. Particularly those 

working in the field of engineering regarded their financial position to be 

quite favorable. More than 75 % of them pointed out that their salary was now 

considerably higher than the pay they had received as officers. ~11 of the 

computer scientists who were questioned said that their income had increased 

considerably. 

Of the university graduates questioned, 84.4 % were largely content with their 

civilian occupations and career prospects. Their expectations, so they said, had 

been met. 12.3 % said their satisfaction was limited and their expectations had 

not come true in many cases. Among those who spoke of limited satisfaction and 

disappointment on the job was a relatively large number of those who had studied 

pedagogics but also two of the seven computer scientists. Typically enough, only 

a few of the “disappointed” ones were employed with private enterprises; most of 

them had joined the civil service, with many of them using a way of access which 

the legislature primarily provided for non-commissioned officers retired from 

active service. (See Table 1.) 

The positive assessment of military service resulted in more than half of the 

graduates saying that they would go the same way again, if they had to decide 

once more. (See Table 2. ) 

Dropouts and temporary-career volunteers or regular officers without a university 

education had to, cope with a much more difficult transition to a civilian 

occupation than had their graduate counterparts. Since, as a rule, they lacked 

training in a civilian occupation only some of them managed to find adequate 

civilian employment immediately upon leaving the armed forces. only 25 % of the 

dropouts questioned and merely 10 % of the temporary-career volunteers and regular 

officers without a university education found some civilian employment 

immediately after leaving the armed forces. The considerable number who did not 

have to undergo some sort of vocational training, either as in-plant trainees or 

- . _.....v ..,_. ._ .__^, 

90 

.

y attending schools, to meet the requirements for employment in the civilian 

sphere. This was not easy for them, in particular if long-term training was 

required. In those cases, financial bottlenecks and austerity became almost 

inevitable attendant circumstances, the more so since in not only a few cases 

they had to go through a period of unemployment - even if only brief in most 

cases - before starting their training period. 

Assessment 

I am very content. Without any exceptions 

my expectations and hopes have been 

fulfilled. 

I am content. My expectations and hopes 

have mainly been fulfilled. 

I am not content. Many of my expectations 

and hopes have not been fulfilled. 

I am very discontent. None of my expecta- 

tions and hopes have been fulfilled. 

I cannot answer the question. 

Table 1 

Overall Assessment of Military Service 

Made by Officers Who Graduated From University 

Pedagogics Economics 

2 

Number (%) of Answers Received 

’ 

Engineering Computer 

Subjects Science 

( 3.6) (2:61 1 

40 141 3 

(71.4) (82.4) (42.9) 

11 

13 

(19.6) ( 7.5) (28!6, 

- 

3 

( 5.4) ( 440) ( 243, 

- - 

Number 56 100 171 7 

Options for Answering 

I would enlist again for the same number 

of years. 

I would enlist again for a shorter period 

of time. 

I would not join the Federal Armed Forces 

again under any circumstances. 

I have not thought about this yet. 

OR: I cannot say yet. 

Number 

Table 2 

The Inclination of Officers Graduated From University 

Towards (Hypothetical) First Enlistment 

Pedagogics 

(2:!8, 

3 

Number 1%) of Answers Received 

Economics and 

Managerial SC. 

18 20 - 

(18.0) (11.6) 

12 17 

(12.0) ( 9.9) 

- 

! 

! 

I 

I 

3 i 

(50.01 / 

( 5.5) 

55 100 171 6 

I 

---j 

j 

I 

91

-.. v..-.----_---..--_.__.-_ 

-_yy ..-.... - ..~. 

Without training in a civilian’occupation the chances on the labor market were 

small for dropouts and temporary-career volunteers without an academic education. 

Those who underwent vocational training had no major difficulties in 

being integrated afterwards. The question as to whether this is also tr;Erczf 

officers who took up academic studies only after they had left the armed 

cannot be answered at present, as they have not yet completed the respective 

courses of studies. 

Owing to the difficulties experienced in the transition to civilian life, dropouts 

assessed the time they spent in the military less favorably than did university 

graduates: however, temporary-career officers without an academic education 

seemed to be quite content with their military service. 

Assessment 

Table 3 

Overall Assessment of Military Service 

Made By Dropouts and Temporary-Career Volunteers Without University Education 

I am very content. Without any exceptions 

my expectations and hopes have been 

fulfilled. 

I am content. My expectations and hopes 

have mainly been fulfilled. 

I am not content. Many of my expectations 

and hopes have not been fulfilled. 

I am very discontent. None of my expectations 

and hopes have been fulfilled. 

I cannot answer the question. 

Number 

Temporary-Career Volunteers 

w/o Univ. Education 

3 ( 3.4) 

(’ 75.0) 

17.0) 

2 ( 2.3) 

2 ( 2.3) 

88 

Number 1%) of Answers 

- 

Dropouts 

2 ( 3.4) 

28 (47.5) 

15 (25.4) 

14 (23.7) 

Among pilots there was a relatively high degree of discontent with regard to the 

civilian occupations they held three and a half years after retiring from active 

duty. This was primarily owing to the fact that these officers had attained high 

ranks before leaving the armed forces and had based their expectations on what 

they had achieved. If these expectations should remain unchanged it will be very 

difficult to remedy their situation. This applies in particular to those officers 

who not only expect their civilian salary to correspond to the rank they 

had attained (in most cases lieutenant colonel) but also try to continue their 

career as civilian pilots. 

59 

I 

/ 

) 

i 

, 

I 

I 

-

Literature 

------. .-- -.~---.. -- .-. .--. .-- -. -__ -.- 

_ _ . 

- ..-. WC 

Bildungskommission beim Bundesminister der Verteidigung (Ed.): Neuordnung der 

Ausbildung und Bildung in der Bundeswehr, Bonn 1971. 

Ellwein, Th./Miiller, A.v./Plander, H. (Ed.): Hochschule der Bundeswehr zwischen 

Ausbildungs- und Hochschulreform, Opladen 1974. 

Hitpass, J./Mock, A.: Das Image der Universitbten, Diisseldorf 1972. 

Klein, P.: Der irbergang ldngerdienender Zeitoffiziere in das zivile Berufsleben, 

Miinchen 19 84. 

Klein, P.: Die Bewdhrung ehemaliger Offiziere der Bundeswehr im Zivilberuf, 

Miinchen 1987. i 

Klein, P.: Truppendiensttauglich? Zur Bewdhrung von Absolventen der Bundeswehruniversitaten 

in der Truppe, in: W.R. Vogt (Ed. 1 : Militar als Lebenswelt, 

Leverkusen 1988, S. 241-250. 

Lippert, E./Zabel, R.: Bildungsreform und Offizierkorps, in: Sozialwissenschaftliches 

Institut der Bundeswehr, Berichte H.3, Miinchen 1977, S. 49-156. 

93

A MILITARY OCCUPATIONAL SPECIALTY (MOS) 

RESEARCH AND DEVELOPMENT PROGRAM: GOALS AND STATUS 

Dorothy L. Finley and William J. York, Jr. 

U.S. Army Research Institute Field Unit 

Fort Gordon, Georgia 

Threat, force modernization, doctrine, and force structure 

often change in ways which influence what is required with . 

respect to soldier performance. Responses to changes in soldier 

performance requirements to assure adequate operation and 

maintenance of the Army's inventory of systems often include 

changes in MOS and CMF designs. These changes are, in this 

program, called MOS restructuring and is the focus of the 

program. MOS restructuring is defined as the addition or 

deletion of tasks to an existing MOS, the merger or deletion of 

MOSS, or the assignment of tasks to a new MOS. 

The Army is faced with enlarging and more varied inventories 

of equipment (older equipments often cannot be disposed of due to 

the insufficient numbers of new equipments), reduced manpower 

ceilings, and a reduced and changing manpower pool. The 

decisions made about MOS and Career Management Field (CMF) 

restructuring determine the number of soldiers in units versus in 

training (given manpower ceilings), the number of operators and 

maintainers needed to staff the equipments, the design of the 

training system, and the levels of aptitudes required. Analyses 

have demonstrated that what appears to be the best MOS 

restructuring option with respect to one of these factors may be 

a very bad option with respect to the other factors. The goal of 

this program is to develop decision aids to facilitate the 

identification of optimal, not suboptimal, MOS restructuring 

solutions with respect to manpower, personnel, and training 

resource considerations, and the requirements for unit 

performance. 

There are several considerations and constraints involved in 

any action to restructure MOSS. A fundamental concern is task 

and equipment commonalities and differences. One does not want 

to assign a set of tasks to a soldier which are so different and 

numerous as to impose a too large training requirement or 

require too high a level of too many different aptitudes. One 

does, on the other hand, want to assign a sufficiently large 

number of tasks such the soldier will be fully employed and can 

be flexibly assigned. This concern must be considered within the 

Contexts of both requirements and constraints. Requirements 

include such items as aptitude and gender job requirements, 

manpower utilization and training requirements, and the need for 

career progression opportunities. The constraints include such 

items as manpower pool characteristics and size, manpower 

ceilings, available training resources, geographical and

organizational distribution of the equipments, and the size of 

the MOS and relative percentages of soldiers across the grade 

levels. Overall, MOS restructuring can be summarized as a 

complex, multi-dimensional decision. The considerations, and 

constraints versus requirements, relate to at least training 

impacts, personnel characteristics, force structure, equipment 

design, personnel resources, manpower resources, and task 

structure. 

As noted above, the program objective is to develop aids to 

facilitate MOS and CMF restructuring decisions regarding such 

questions as: Is restructuring needed at all? Should a new MOS 

be created? Should MOSS be merged? Is an overall redesign of 

the branch MOSS and CMFs needed? Whatever is done impacts 

directly on the branch training system design. The addition or 

deletion of tasks which require training imposes a requirement to 

modify the training system to accomodate those changes. 

Program Overview 

The current formulation of the program is presented Figure 

1. Work has been accomplished or is projected for the near 

future on: The Army Authorization Domentation System (TAADS) (a 

manpower data base), and personnel and training data bases; the 

ability, equipment, and task domains; and trade-off algorithms. 

Recent.accomplishments with respect to the TAADS data base, and 

the ability and equipment domains will be described in the next 

section. 

As depicted in Figure 1, the intent is to provide the 

analyst with the tools needed to identify desireable MOS 

restructuring possibilities, and to consider these withing 

manpower, personnel, and training resource constraints: and then 

to provide the means to do tradeoffs between the alternatives 

with respect to manpower, personnel, and training impacts. In 

Figure 1, under "Trade-Off Algorithms", both operations-based and 

requirements-based are noted. Operations-based analyses are 

those performed, in the Army, by the Personnel Proponent as the 

basis for preparing the paperwork which will actually cause a MOS 

restructure action to be implemented. These analyses are 

sometimes triggered by the outcomes of requirements-based 

analyses. Requirements-based analyses often take place when 

there is a major change in equipment inventories, doctrine, 

organization, or force structure. These requirements-based 

analyses tend to be performed by the combat developers in 

coordination with the training developers and personnel 

proponents. 

95

TAADS and PMAD Data Base 

Personnel and Training Data Base 

Ability Domains 

Equipment Domains 

Task Domains 

\ 

Cunent and Pfojected Position & 

Manpower, Personne/, and Tmhing Resources 

Operations-Based 

Optimum 

+ Manpower, 

Personnel, 

and Training 

Alternatives 

Fisure 1. Overview of the MOS restructuring program to develop 

decision aids. 

Ecuinment Domains 

Recent Accomplishments 

Equipment domains are defined as groupings of equipments 

based on their similarities with respect to equipment 

descriptors. Human factors specialists dealing with the 

development of new systems have always defined tasks, ability 

requirements, etc. in terms of the design of that new item of 

equipment. Many MOSS, however, deal with many systems and, when 

a new item of equipment is entered into the inventory then one 

must consider.inventory groupings in making MOS assignment or 

restructuring decisions. The identification of appropriate 

descriptors began as a part of assisting the Signal Branch 

Personnel Proponent in developing a training strategy to Support 

the merger of three MOSS with two of them becoming an Additional 

Skill Identifier (ASI) to the merged MOS. After investigation it 

was determined that, in terms of equipment commonalities, only 

one of the MOSS should be assigned an ASI. This finding resulted 

96

in training cost savings, a reduced training attrition rate, 

improved position fill capability, and increased potential to the 

soldiers for promotion. Drawing upon this research, an initial 

Equipment Domains Assessment Procedure (EDAP) has been developed 

which shows promise for identifying equipment domains appropriate 

for operators. Identifying equipment domains appropriate for 

maintainers is a more complex problem and will take further 

research. 

Ability Domains 

The Army Research Institute Fort Huachuca Field Unit has. 

refined the Job Abilities Assessment System (JAAS) and added a 

part C, in addition to existing parts A and B, specific to 

military intelligence., Mr. York will present a paper at this 

meeting describing the application of the refined JAAS, parts A 

and B, to MOSS in the Signal Branch. I am going to describe 

their application to intelligence MOSS to derive ability 

requirements profiles that can be compared to assess the 

reasonableness of MOS assignment to a new system. It is of 

interest to us because the profiles are derived through analysis 

of the tasks assigned to the soldier and, therefore, provides a 

means of appraising whether the restructuring proposed, i.e., the 

reassignment of tasks to MOSS, creates too great a demand on 

ability requirements. 

JAAS consists of a taxonomy of 50 abilities (e.g., dynamic 

strength, written expression) which, for presentation purposes, 

are often grouped into eight clusters (e.g., gross motor skills, 

communication skills) and a set of procedures for making scalar 

judgments regarding the level of each of the 50 abilities 

required to perform a set of tasks. This technique was used to 

develop ability profiles for several intelligence MOSS and to 

appraise the ability requirements for a new intelligence system. 

It was determined that some of the intelligence MOS ability 

requirements profiles were distinctly different. It was further 

determined that the particular MOS selected to perform operations 

and control tasks on the new system was a good choice in that the 

ability requirements profile for the MOS closely matched the 

ability requirements profile for those tasks on the new system. 

TAADS Data Base 

Up to 60% of the effort required on the part of the 

Personnel Proponent to prepare a MOS restructuring action is 

devoted to position data analysis. Position data analysis an 

analysis of the TAADS and Personnel Management Authorization Data 

(PMAD) data bases for each of the impacted MOSS. These contain 

detailed information on each.MOS position currently authorized 

(TUDS) and projected (PMAD). This is largely performed manually 

and, hence, very time consuming and error prone. There are 

criteria as to appropriate grade structure, etc., and it is 

essentially a "zero sum game". The TAADS and PMAD constitute the 

constrained manpower data base at a MOS position by position 

97

level with a great deal of associated information. 

A Position Data Analysis Job Aid-l (PDAT-JA-1) software 

TAADS analysis tool has been developed which will be installed at 

the first Personnel Proponent office (at the Signal Branch) in 

December. It automates manipulation of the TAADS data base and 

provides analysis tools. A place holder is in the program for 

the PMAD data base when it becomes available in the form needed 

for our purposes. The PDAT-JA-1 outputs are: 

* Quantitative summaries of MOS authorization for each 

grade level by: grand total, ASI, Skill Qualification Identifier _, 

(SQI), major command (MACOM), tables of organization and 

equipment (TOE), tables of distribution and allowances (TDA), 

continental United States (CONUS), and outside CONUS (OCONUS); 

* Deviations from the Average Grade Distribution matrix: 

Deviations from criteria regarding: 

(SIMO:), gender, ASIs, and SQIs; 

space imbalanced MOS 

* Development of an acceptable grade structure; and 

* (When PMAD program implemented) Identification of TAADS 

and PMAD mismatches by unit identification code and grade. 

The development of an acceptable grade structure is enabled 

through the provision of work sheets which allow the analyst to 

create modified TAADSs (the original data base always remains 

intact). Each modified TAADS can then run to produce the first 

three outputs above until an acceptable grade structure is 

realized. 

98

APPLICATION OF THE JOB ABILITY ASSESSMENT SYSTEM TO 

COMMUNICATION SYSTEM OPERATORS 

WILLIAM J. YORK, JR. and DOROTHY L. FINLEY 

U.S. ARMY RESEARCH INSTITUTE FIELD UNIT 

FORT GORDON, GEORGIA 

As the Army introduces major new equipment into its 

inventory,there is a need to restructure Military Occupational 

Specialties (MOS) and to reclassify soldiers from old MOSS into 

new MOSS. Identification and quantification of specific soldier 

abilities required to perform in new a MOS would enhance both the 

training development and personnel management decision process 

associated with major reclassification actions. A method for 

mapping soldier abilities requirements from old to new MOSS would 

provide Army managers with a useful tool in the areas of force 

structure design and personnel or job classification. 

In support of this, the AR1 Fort Gordon Field Unit is 

conducting research using the JAAS methodology developed by 

Fleishman to determine if significant differences in terms of the 

JAAS abilities exist among Signal MOSS and if unique ability 

patterns are significant enough to support mapping from old MOSS 

to new MOSS. Moreover, we hope to identify a group of abilities 

that could be measured by existing tests. This effort, to digress 

for a minute, supports a need to determine how best to reclassify 

soldiers from several existing Signal MOSS into two new MOSS. 

These two new MOSS support a recently introduced area 

communication system that is to replace the majority of the 

current division and Corps Signal equipment and structure. 

Reclassification and training of current MOSS holders to perform 

in the new MOSS is a critical issue. The feasibility of using the 

JAAS methodology to determine which new MOS is most similar to 

existing MOSS is the primary research goal. 

Our initial effort focused on existing communication MOSS. 

Using the JAAS abilities shown in figure 1 (pg.4) and the ability 

description and scale shown in figure 2, (pg.4) two groups of 

subject matter experts (SMES) rated four Signal operator MOSS - 

31C, 31M, 31L and 72E. Group A, consisting of seven senior 

personnel, rated all four MOSS. Group B, Consisting Of nine t0 

eleven MOS SMEls, rated only their MOS. Mean scores by ability 

and ability cluster were calculated for each MOS. Interrater 

reliability was determined by applying Kendall's Coefficient of 

Concordance to the rank-ordering of the eight ability clusters. 

As shown in table 1, (pg.2) rater agreement varied significantly 

among the four Moss, as well as, between the two groups of 

raters. Figures 3 and 4 (pg.5) depict the difference in profiles 

between the two groups. Table 2 shows examples Of actual ratings 

for two MOSs- the best and the worst- in terms of rater 

agreement. 

99

72E 

31c 

31L 

31M 

31M COMM 

RATER 1 1 

RATER 2 7 

RATER 3 1 

RATER 4 8 

RATER 5 1 

RATER 6 8 

RATER 7 5 

FINAL 0.090452 

31c COMM 

RATER 1 1 

RATER 2 1 

RATER 3 2 

RATER 4 2 

RATER 5 4 

RATER 6 3 

RATER 7 1 

RATER 8 1 

RATER 9 1 

RATER 10 3 

RATER 11 6 

FINAL 0.445887 

GROUP A 

0.324852 

0.333605 

0.242180 

0.090452 

TABLE 1 

TABLE 2 

CON 

3 

3 

5 

4 

3 f. 

7 

6 

REA 

2 

5 

3 

2 

4 

5 

7 

SPLD 

7 

8 

1 

1 

6 

4 

2 

PER-V 

8 

1 

7 

5 

2 

1 

3 

GROUP B 

0.343915 

0.445887 

0.124285 

0.286190 

PER-A 

5 

2 

6 

3 

5 

6 

1 

PSY GRMO 

4 6 

6 . .4 

4 8 

6 7 

8 7 

2 3 

4 8 

CON REA SPLD PER-V PER-A PSY GRMO 

6 4 2 5 3 7 8 

7 5 3 6 4 2 8 

4 8 7 3 5 1 6 

4 6 3 5 1 7 8 

3 7 8 5 1 2 6 

7 8 5 6 1 2 4 

6 7 5 3 2 4 8 

2 3 4 5 6 7 8 

3 5 4 7 2 6 8 

4 8 5 7 1 2 6 

4 7 3 8 1 2 5 

Correlation analyses between each pair of the four MOSS were 

conducted using the'MOS mean of each of the 50 abilities. Ratings 

for Group A and B had been combined for this analysis. Results 

are shown in table 3. 

TABLE 3 

CORRELATION MATRIX 

72E 31c 31L 31M 

72E - .7830 .0862 .5782 

31c - - .0824 .5527 

31L - .1280 

31M 

Statistical analyses of mean differences between MOSS by ability 

and ability cluster have not been conducted, but visual 

examination indicate that differences do exists both at the 

ability and cluster level as depicted in figures 5 and 6 (pg.5). 

100 

.

As already shown, rater reliability is poor but may be a 

function of the analysis approach. Additional analysis will focus 

on the ability level instead of the cluster level. We felt that 

the correlation results were highly interesting in that the 

degree of relationship between the MOSS pairs is highly 

supportive of the relationships subjectively thought to exist. 

Moreover, we expect to see even stronger relationships, both 

positive and negative, at the ability subset level. For example, 

a correlation analysis between each MO8 pair using the abilities 

groupings of communications, auditory, psychomotor, and gross 

motor should reveal this increase. We believe that this approach 

will also be applicable to a comparison analysis between current _. 

MOSS and new MOSS. 

Our future efforts will focus on several areas. First, we 

plan to determine rater agreement for each ability across each 

MOS. Using these results in conjunction with analysis of mean 

differences between abilities and MOSS we intend to focus on a 

reduced set of abilities. This subset will be a function of rater 

reliability and discriminate power between MOSS. In other words, 

those abilities that have the best rater agreement and tend to 

discriminate between Moss will be used. In addition to the 

development of a refined subset of abilities, we plan to analyzed 

MOSS at the major duties level. Two new MOSs- 31D and 31F- will 

be analyzed using the JAAS procedure and then compared with the 

total MOS and major duty profiles of the four MOSS already 

completed. These two MOSS are the operators for the new area 

communications system and are the MOSS into which a significant 

number of Signal soldiers will be reclassified. It is hoped that 

the profile comparisons will provide objective information for 

this reclassification effort. 

101 

_.

l-‘IguLC I 

novised Llat OK nl~llltlsn rind clestors 

~0llCL1’TUhlr BKILLSt 5. 

6. 

7. 

8. 

9. 

FEIICCI’TUAL GKILISI VlSlOlI 2 1 . 

2: 

::: 

. . ::: 

rRnCErTuAL SKILIS8 hU”ITIOl, ,,. 

32. 

33. 

GllOS3 tlOTOn SKILI.5, II. 

42. 

43. 

44. 

45. 

Dlcyole 20 mllss to vork 

Figure 2 

- ‘ 

- 4 

llty OC Closore 

(I‘JttPCII lrccogllltlo~l) 

selrctlvc httriltloll 

R atlo orialltntloll 

v T sunllzetloll 

102

0 

a 

103 

-t 

\~~.1~--l---1.1-~L-I 

I

Preferences for Military Assignments in German Conscripts 

Introduction 

K. Arndt 

Federal Office of Defense Administration 

Bonn, Germany 

What German conscripts know about available military assignments is primarily based on 

information from friends and acquaintances who have already done their military service. The 

media, the military counselor and visits to military units (open day) constitute an additional but 

less important source of information. It is generally true to say that knowledge and a general 

overview of all the military assignments that are available depend on whether information has 

been obtained passively or actively. Preconceived ideas about military activities do often lead 

to discrepancies between everyday military life and expectations. Lack of motivation, indifferent 

feelings about military service and discontent with the draft procedure are the consequences. 

In addition to the need for objective standardized information on military assignments, measures 

were required to counteract the negative image to the armed forces, since the willingness of 

young men to do military service has continually been decreasing over the past years. Against 

the background of a military threat, which was perceived to be real, the majority of those liable 

to military service passively agreed to military service, but as early as 1987 this percentage 

declined to less than 50 % for the first time. It must be assumed that this development has 

continued to date. As a result of this findings, it was decided to develop a transparent and 

efficient method designed to provide young men liable to military service with an overview of 

the requirements and qualifications for military assignments and a clear picture of job dcscription. 

The “Assignments - Interests - List” (AIL) is the result of this development during which variant 

models of information transfer and target-orientated representation were prctestcd. The results 

of a nation-wide AIL test are reported. 

Description of AIL 

On the basis of expert rating, the 117 possible assignments for conscripts were reduced to 2.5 

representative assignments covering both fighting and non-fighting troops. A brief description 

was compiled for each of the selected assignments, including a picture of a typical activity and 

an account of the most important requirements and features of the job. 

The Assignments-Interests-List comprises the military assignments as described in Table 1. The 

AIL method can be used for groups or individuals. Each item is looked at wilhout any time limits. 

Pretest results 

An initial prctcst was carried out by sampling 105 persons liable to military service. Of the 452 

preferences stated, 256 (57 %) fell to assignments with non-fighting troops and 196 (43 %) to 

fighting troops. Application of the AIL method produced a marked increase in the number of 

desired assignments indicated; without AIL, the average number of assignments considered to 

be interesting was 2.6 while use of AIL produced an increase to an avcrapc of 4.4 . The education 

level had no ascertainable influence on the preferences expressed. The LC~L time to work through 

the test ranged from 6 to 22 minutes. 

104

Table 1 

Military Assignments in the AIL 

Fighting Troops Non-Fighting Troops 

military policeman 

light infantryman 

mountain trooper 

paratrooper 

mechanized infantryman 

gunner 

missile gunner 

gunlayer 

engineer 

deck hand 

signal construction man 

signal operating man 

radio operator 

Sample 

clerk 

radar operator 

teletypist 

supplyman 

second cook 

driver 

electronics technician 

radiomechanic 

!+aircraft mechanic 

armament repairman 

automotive vehicle mechanic 

medical corpsman 

.._ 

_____----- --- -.-. ---. ~_.______ -All 

1,225 persons liable to military service were tested by means of AIL during the psychological 

qualification and placement test before they were drafted into military service. The composition 

of the sample ensured regional and educational representativeness. 

Method 

The 25 items of the AIL are listed in a fixed order. The testee is asked to give his opinion on 

each item by indicating whether he is interested or uninterested in the described assignment. 

Consequently, the response is obtained by the “forced-choice” method. The testee indicates his 

judgement on a response form which offers two categories. The preferences are numerically 

represented by a preference score P which is defined by the interest-to-disinterest ratio: 

Preference score P 

= 

total interest 

total disinterest 

It applies P = 1,00 : interest/disinterest in the item are equally great, 

i.e. there is indifference 

P > 1.00: interest outweighs disinterest, i.e. the item is preferred 

P < l,OO: disinterest outweighs interest, i.e. the item is rejected. 

The preference scores P were placed in a rank order to show the preferences for assignments of 

the AIL items. Rank order comparisons between subsamples were carried out by applying 

nonparametric procedures (Kendall’s coefficient of concordance W). The statistical evaluations 

were performed on a IBM AT PC with SPSS/PC+ standard software. 

Acceptance of AIL 

Following the pretest, the testees could express their opinion on AIL by answering the following 

three questions: 

105 

. . . .

Question I: Do you consider such information on military assignments to be a necessary part of 

the qualification test? 

Question 2: Has AIL provided you with any details about military assignments which you did 

not know before? 

Question 3: Are you interested in further information? 

Table 2 shows the response frequencies for the response categories. 

Table 2 

Response frequencies of the acceptance poll (yes = +/ no = -/ partly = 0; in percent) 

EDUCATIONAL LEVEL 

Question total 1) 2) 3) 4) 

1) 

2) 

3) 

4) 

+ - 0 + - 0 + - 0 + -- 0 + .- 0 

1 86 6 8 67 22 11 85 6 9 88 3 9 92 4 4 

2 57 9 34 82 9 39 57 9 44 56 8 46 34 13 31 

3 56 26 18 55 29 I6 54 26 20 59 25 16 54 28 18 

- - 

without school-leaving certificate of secondary-level primary school 

. 

wtth school-leaving certificate of secondary-level primary school 

lower secondary school-leaving certificate 

secondary school graduation 

The results of the acceptance poll reveal that information on military assignments provided by 

AIL is considered to be necessary by the majority. The higher the education lcvcl the more 

widespread is this opinion. 

AIL offers onIy basic information on selected military assignments, but its information content 

is detailed enough for more than 50 % of the respondents to gain additional information they 

didn’t have before. In contrast to surveys which point to a monotonic rciutiortship bctwecn 

knowledge of military service and education level, there is no ascertainable difference in this 

regard. 

Regardless of education level, more than 50 % of all respondents wished lo obtain more 

information about the military assignments presented. Bearing in mind previous surveys looking 

into the attitudes towards military service of young men liable to military service which showed 

a growing rejection with increasing education levels, this result was somewhat unexpected. 

Despite increasingly negative attitudes towards military service, interest in obtaining information 

does not decline with higher education levels. 

In conclusion, the results of the acceptance poll show that 

- the respondents do not have much information on specific military assignments, 

- the respondents are keen to obtain additional information, 

- the majority of respondents welcome information on military assignments. 

Results 

The opinions expressed on the 25 AIL items in the test sample wcrc analyzed with a view to 

answering the following questions: 

(1) Which military assignments arouse greatest interest and which tend to appear uninteresting’? 

106 

. .

Preference scores 

overall sample (N = 1,225) 

Figtire 1: Preference scores for AIL items in the overall sample 

Preference scores of the first ten 

rank positions: 

Rank AILitems P 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

driver 3.73 

clerk -74 

auto.vehic.mech. .74 

milpolice .70 ’ _ 

radar operator .59 

armament repair .%I 

aircraft mech. 52 

gunner .46 

light infantery 44 

paratrooper 42 

(2) Are opinions about military assignments influenced by educational level or regional factors? 

(3) Do assignments with fighting and non-fighting troops meet with different degrees of interest? 

The following figure shows the preference scores P for the 25 AIL items based on the two 

response categories (interesting/not interesting). It clearly highlights the fact that only the 

assignment as driver achieves a P-scoreH.0. Since military driving licenses continue to be valid 

in civilian live upon completion of military service, assignments to a driver’s job is of great 

benefit to military conscripts. The other AIL items received much lower preference scores (all 

less 1,0) . The scores are shown in a ranked order. 

The subsamples based on school education and regional background produced preferences 

completely different from those obtained for the overall sample. As highlighted in table 3 the 

number of preferred military assignments (P > 1,0) increases as the level of school education 

rises. 

Table 3 

Preference scores P > 1,0 for subsamples based on school education 

and regional background 

-a----- 

Northern 

___________-___--------- 

R e g i o n s 

Central Southern Sum 

level 1) 1 5 2 8 

Ieve! 2) 1 5 2 8 

level 3) I 6 3 10 

level 4) 4 5 7 15 

Total 7 21 14 41 

1) . 

2) 

wIthout school-leaving certificate of secondary-level primary school 

with school-leaving certificate of secondary-level primary school 

3) 

lower secondary school-leaving certificate 

4) 

secondary school graduation 

107

-_________ _.._.._.^.. ..--._-e___--.. --- --- -- - .-~I. ---. 

A statistical analysis of the rank ordered P-scores using KENDALL’s coefficient of concordance 

W did not produce any significant differences between the school education samples within the 

specific regions. 

The results presented in Table 3 show that although the AIL items were appraised in the regions 

with varying degrees of intensity, the ranking position of the preference scores largely coincide. 

To compare the preferences for assignments with fighting as opposed to non-fighting forces the 

mean ranking position of the individuals assignments were taken. In both, the overall sample 

and the subsamples based on region assignments with non-fighting forces achieved in each case 

a significant better mean ranking position than those with fighting forces. 

Table 4 

Mean ranking positions for assignments with fighting 

and non-fighting forces (* = p c .OS; ** = p c .Ol). 

-I__------v---- - - - - m - m - - ------- 

Mean ranking position Difference 

in ranking 

Region Fighting forces Non-fighting forces position 

Northern 

Central 

Southern 

Total 

16.4 9.1 7.3 ** 

15.0 10.4 4.6 “1) 

15.8 9.6 6.2 ** 

14.9 10.7 4.2 ‘1) 

------ 

Preference for assignments with non-fighting forces was found to be broadly the same throughout 

the overall sample and the regional subsamples. In the subsamplcs based on school education, 

however, there was no such uniform appraisal (Tab. 5). 

Table 5 

Mean ranking positions for assignments with fighting/non-fighting forces 

in subsamples based on school education *) 

-____----___- ____ ______--I_-__-- --.- ---- 

Mean ranking position Diffcrencc 

in ranking 

School Education Fighting forces Non-lighting forces position 

1) 12.8 12.8 0.0 

2) 13.5 12.2 0.7 

3) 16.2 9.0 7.2 **) 

4) 15.9 9.8 6.1 **) 

Total 14.9 10.7 4.2 *> 

1) School education ( 1 to 4 ) see Table 4. 

It would appear that the preference for assignments with non-fighting forces increases, the higher 

the level of school education. 

Croups with a lower education level showed no or little diffcrencc in their preference for fighting 

or non-fighting forces while those with a higher level of education clearly preferred assignments 

with non-fighting forces. 

The following cross-tabulation with the determinants “region” and “school education” highlights 

the differences between assignments with non-fighting and fighting forces. Conscripts with a 

.

Region 

School’)North Center South 

1) School education level 4 

(1 to 4 ) see Table 4 

5 

Differences of mean rankings 

NO= - C&AL SOiJTH 

Table 6 / Figure 2 

Differences in the mean ranking position for assignments with fighting and non-fighting 

forces according to school education and region (significant differences underlined). 

lower education level show no or non-significant differences for assignments with fighting or 

non-fighting forces. In contrast, these differences are clearly pronounced and highly significant 

in the cases of conscripts with a higher level of education. With the exception of those with a 

school-leaving certificate of a secondary-level primary school in the southern region and those 

with a lower secondary school-leaving certificate in the northern region, regional impacts on the 

preferences are negligible. When compared to corresponding samples in other regions, these two 

samples exhibit significantly high differences in the mean ranking positions. 

The results presented here concerning military assignment preferences in samples with different 

educational and regional backgrounds are based on the mean ranking positions for the various 

assignments with fighting and non-fighting forces. An analysis of the appraisals of the individual 

assignments produces quiet divergent results. For example, in all regions those with the highest 

school-leaving certificate most strongly prefer the assignment as “gunlayer” with the fighting 

forces, but it is only in the central region that there is a clear preference for the assignment as 

“paratrooper” . 

Conclusions 

AIL is an effective and objective way of providing information and ascertaining the assignment 

preferences of those liable to military service. In addition to individual assignment prcfcrcnccs, 

which are important for placement, it is possible to find out about the main preferences of those 

liable to military service and the way they are affected by their regional and educational 

background. On this basis, indications of information actions can be taken in the pre-draft phase 

(e.g. in.recruitment campaigns). Changes in preference scores will rcvcal whether such actions 

are effective. 

The AIL procedure is beneficial both to the Federal Armed Forces as an organization and those 

liable to military service. Aptitude diagnosis is thus understood IO be a cooperative process 

between equal partners which gives the prospective conscript adequate guidance and allows 

room for initiative and active participation. The “classical” diagnostic criteria of objectivity, 

reliability and validity are supplemented by fcaturcs such as fairms* s, transparency, acceptance, 

counselling and innocuousness. Aptitude diagnosis in this form seeks to benefit both sides 

(testing organisation and individual candidate) equally. 

109

Aptitude-Oriented Replacement of Conscript Manpower 

in the German Bundeswehr 

Retrospective View 

S. B. Schambach 

Federal Office of Defense Administration, Bonn, Germany 

In 1990, the Psychological Service of the Geman Federal Armed Forces (GFAF) is celebrating 

a very special jubilee: The Aptitude and Placement Examinations (EVP) for Draftees at the 

Subregion Recruiting Offices have been carried out for 25 years. 

Before the EVP was introduced, manpower requirements of draftees had heen met solely on the 

basis of medical fitness, the final assignment of a position being controlled by a lot system. At 

his muster, each draftee obtained a rank number chosen at random. The slots for replacement 

were assembled in a list and also given rank numbers. The draftees were called up by the order 

of the list until the slots were filled. For numerous assignments, though, only men were called 

up who had a specified civilian occupational training. 

The effect of the lot system was that many men of restricted medical fitness were assigned jobs 

which they were hardly apt for, while well qualified men failed to be called up for service. In 

contrast to this, the increasing standard of technical equipment of the forces required higher 

ability of the assigned manpower. In public, criticism of the call-up “lottery game” was growing. 

It was for these reasons that in 1965 the Aptitude and Placement Examination was instituted to 

be taken by each draftee who was found medically lit at his muster. For the examination method, 

the US-American “Army Alpha Test Battery” formed the model, a modified version of which 

had already been successfully applied to army volunteers. 

The EVP comprises a sophisticated biographical questionnaire mainly referring to interests, 

skills and general performance factors, but its core is a test battery covering the areas of general 

ability and educational level, perception and reaction, mechanical and electrical engineering 

comprehension, as well as some further faculties related to specialist functions. In defined cases, 

a group situation test or an interview with the psychologist, or more test procedures can be 

applied. 

The same test battery is applied to army and air force volunteer applicants (except officer 

applicants) in order to facilitate psychological diagnostics for those applicants who are liable to 

compulsory service as well. The test battery for volunteers includes additional test procedures. 

Table 1: Aptitude and Placement Examination for GFAF Draftees (EVP) 

Biographical analysis, with special regard to performance factors, interests and activities, school 

and occupational training 

Test battery: General intelligence and educational level, technical comprehension (mechanical 

and electric engineering), perception-reaction capacity; under defined circumstances also: 

Perception tests, group situation test, interview, etc. 

Evaluation of behavior characteristics and expression in writing 

110

Aptitude-Oriented Manpower Replacement on the Basis of EVP Assignment 

Proposals - Some Remarks on the GFAF Recruiting System 

The EVP psychologist, on the basis of his diagnostic findings, works out for each draftee 

proposals forhis aptitude-oriented placement in military service. The psychologist’s assignment 

options are mere recommendations to the recruiting agencies since as yet a draftee has no legal 

claim to be trained according to his EVP aptitude assessment. The administration officials in 

charge of personnel replacement are instructed to give priority to the EVP results. 

Each recruiting official has to record by data input into the central computer, the degree to which 

he has taken aptitude objectives into account in every single replacement decision, i.e. regarding 

every single draftee who was given an assignment. The following levels of quality in personnel 

replacement are discerned: 

1 - Aptitude-oriented replacemeht 

2 - Job-related replacement 

3 - Occupation-related replacement 

4 - “Quantitative” replacement (regardless of aptitude). 

In this list, consideration of aptitude criteria is decreasing from step to step 

1) Aptitude-Oriented Personnel Replacement 

Each military occupation on entrance level is characterized by a job title and a corresponding 

specialty number stating the military setvice (army, air force, navy) and the type of job. Groups 

of similar jobs are combined and labeled by an alphanumeric “assignment symbol”. These 

symbols (over a hundred) were specially designed to facilitate personnel replacement. Most of 

the symbols comprise several specialties of equal medical and psychological job requirements. 

The assignment symbols and their respective job titles and specialty numbers are listed in the 

so-called Personnel Requisition Table where the symbols are again grouped with respect to 

different fields of service, e.g. artillery functions, aircraft repair, medical duties. The Personnel 

Requisition Table also contains additional requirements and hints for placement, linked to 

assignment symbols, as e.g. a certain civilian occupational training which is a prerequisite of an 

assignment, or if high school graduates am wanted for these jobs. 

The troops announce their manpower requirements by giving the assignment symbols. At 

present, the core requirements for each of the four annual call-ups are announced half a year 

before. Only shortly before each call-up term can the complete personnel requisition be set up 

which includes personnel fluctuations by drop-outs, organizational changes, changes in the 

degree of medical fitness, enlistment as volunteer, etc. 

The recruiting organization can, after their preparatory activities during the course of conscrip 

tion (registration of men liable to service, muster, EVP), dispose of accumulated and computerrepresented 

data on every single man due for conscription. Even before psychological assignment 

proposals are present, the computer will automatically pick out a provisional assignment 

symbol corresponding to a man’s civilian occupational training (if he has any). The psychological 

assignment proposals are also recorded in the central computer. They are as well given in 

terns of assignment symbols, relating to the above-mentioned Personnel Kcyuisi tion Table. At 

present, the psychologist may propose up to 9 different assignments for a draftee. 

Following a computer-aided optimizing model, the manpower requirements of the troops - in 

terms ofassignment symbols- are shared out between the subregional recruiting offices. Those 

111

ecruiting ofticcs of a military district which will, according to their stock of assignment 

symbols, best be able to meet the rcquircd symbols, arc allotted the requisitions. 

The computer will also support the recruiting official in placing a draftee on a spccificd slot. 

The machine will automatically provide a placement proposal by fitting a requisition symbol at 

issue to one of a man’s given psychologically-based assignment symbols. Yet a great number 

of placements are still carried out manually because persons with special charactersitics or 

certain personal and social circumstances (unemployed, married, medical doctors, etc.) have to 

be considered with priority. 

2) Job-Related Personnel Replacement 

Sometimes a requisition of a certain job and respective symbol is at issue for which a man due 

for that call-up term, and bearing a corresponding assignment symbol, cannot be found. 

Conversely, there may be a draftee who from social or occupational etc. reasons shall be called 

up for a certain term when there happen to be no requisitions for the symbols proposed for him. 

To help the recruiting official in cases like these, the Psychological Service has supplied special 

lists which indicate whether a given symbol may be substituted by another one because ofsimilar 

aptitude factors, or whether a symbol for a highly qualified job may be replaced by another one 

calling for less qualification. For instance, a draftee whose aptitude as a paratroop er has been 

stated, will likewise be apt as a guard and security soldier, even if this assignment symbol should 

not be proposed for him. 

3) and 4): Occupation-Oriented and Quantitative Replacement 

In these cases, a position is filled regardless of the psychologist’s aptitude-oriented assignment 

proposals, or, as an exception, there are no such proposals at all. Under condition 3), the 

assignment will at least follow the man’s civilian occupational training. In most of these cases, 

draftees are concerned in whom there prevail exceptional life situations. In such placement 

decisions, the psychologist in charge shall collaborate as a consultor. 

In about 95 % of the call-ups, the psychologist’s assignment proposals have been taken into 

account by the recruiting offtcials, as is shown by the following table which reflects a long-term 

state of affairs: 

Table 2: Quality of Personnel Replacement with Regard to Psychological 

Aptitude Criteria - May 1990 - 

AMY 

Air Force 

Navy 

Total 

Percentage of Placements 

- - 

Aptitude-oriented Job-related Occupation-related Quantitative 

_----- 

95 4 0 0 

95 4 1 1 

92 6 0 1 

95 4 0 I 

.-_. ____

The Psychologist’s Method of Proposing Assignment Symbols 

In formulating his assignment proposals, the psychologist goes by the system of symbols laid 

down in the Personnel Requisition Table. The psychological aptitude prerequisites for the 

military jobs are compiled in the so-called Symbol Assignment Ttible. This table was issued by 

the Psychological Service and is structured in the same way as the Personnel Requisition Table, 

giving the assignment symbols instead of specialty numbers. In this table, psychological aptitude 

profiles am set up for each assignment symbol according to the method of multiple cut-off scores. 

Different kinds of prerequisites are attached to each symbol which have to be observed by the 

psychologist: 

Table 3 : Psychologically-Based Assignment Proposals in Accord with: 

5. 

- medical requirements 

- basic intelligence level 

- cut-off scores in the relevant subtests; 

- (for certain symbols:) additional indispensable or desirable aptitude prerequisites such as 

knowledge of English language, driver’s license, etc. 

- administrative remarks 

- specified civilian occupational training, if indicated in the Personnel Requisition Table 

The psychologist will compare the total pattern of his diagnostic findings to the job characteristics 

of the assignment symbols, especially to their concretized medical, occupational, test 

and other implications, and pick out the ones corresponding to the draftee’s aptitudes. Assignment 

symbols ruled out medically are absolutely excluded. With respect to the other aptitude 

prerequisites, the psychologist is normally given a high judgment factor: 

a) He may go below the prescribed cut-off scores (regarding intelligence level and subtest 

results) if the difference is within the frame of the confidential limits given by the test reliability. 

b) Major deviations fmm the test profiles are permitted in individual cases if psychodiagnostitally 

founded. The same is valid for deviations from occupational and other prerequisites. 

The psychologist then ranks the assignment symbols he is going to propose. The priorities are 

subject to his judgment. He will take into account for which symbol the aptitude is best, and 

which are the draftee’s personal interests and preferences. If a draftee is suited for a so-called 

.deficit symbol for which manpower replacement is difficult, this is regularly given priority. A 

list of these symbols is available for the psychologist. 

Methodical Deficiencies in Aptitude Assessment 

The test profiles and cut-off scores established for the assignment symbols were set up according 

to expert ratings. They were not based on detailed job analyses. Systematic research on the 

113 

_

validity of the EVP test methods and the performance of draftees in the jobs assigned them, am 

missing for most assignment symbols. A formal comparison via central computer data between 

the assignment for which a man was called up, and the specialty he was awarded after basic 

training, gave only 75 % congruence. For this low rate, deficiencies of psychological prognostic 

methods am only partly the cause. Many conscripts have to be moved during the course of their 

basic training from various organizational and medical reasons, so that they will complete their 

service in a job other than the one assigned by the recruiting agencies. Nevertheless, the aim of 

the Psychological Service is to increase the percentage of correspondence by methodological 

improvements. 

Table 4: Aptitude Characteristics Relevant in Military Jobs 

Perception and Reaction Reasoning 

Signal Shape Discernment Verbal Skills 

Spatial Imagination Memory 

Achievement Motivation Mechanical /Technical Comprehension 

Reliability Electric Engineering Comprehension 

Concentration/Stress Tolerance Psychomotor Coordination 

Arithmetical Comprehension Social competency 

As a first step, in more than 400 military jobs listed in the Personnel Requisition Table aptitude 

characteristics were identified by expert rating in cooperation with the military services. For the 

14 characteristics found, see Table 4. An attempt to operationalize these characteristics by 

psycho-diagnostic constructs and corresponding test methods which might allow for aptitude 

assessment, showed that only part of these constructs are covered by traditional EVP test 

methods. Important characteristics, such as 

- spatial imagination 

-memory 

- psycho-motor coordination 

- stress tolerance 

do not seem to be represented in our test procedures. 

Thorough validation studies therefore seem indispensable. At present, 36 psychologists of the 

recruiting organization are investigating into some 30 military jobs (listed under 22 assignment 

symbols). The studies include: 

- detailed job description and analysis of aptitude demands 

-identification and operationalization of probation criteria such as award of specialty, successful 

completion of courses, assessment of superior, as well as personal criteria such as job 

satisfaction, or interest in later enlistment as volunteer 

- studies on probation and validation of the traditional EVP examination methods (test procedures, 

biographical data, etc.) 

- implementation and examination of other test methods, and development of new test methods 

if necessary; probation study on these methods. 

114 

.

Table 5: Study on Job Characteristics: Radio Relay Soldier (Scale: 1 [best] to 7) 

Test Methods Traditional 

Test Profile 

General Intelligence Index 

Figure Reasoning Test 

Word Analogy Test 

Arithmetical Comprehension Test 

Orthography Test 

Mechanical Comprehension Test 

Electric Engineering Comprehension Test 

Reaction-PerceptionTest 

Signal Discernment Test 

Memory Test 

Spatial Imagination Test 

Concentration Test 

3.8 

4 3 

3 5 

5 

3 4 

3 3 

5 

5 

4 

4 

4 

Proposed Test Profile 

(Operationalized 

Job Characteristics) 

Most of the researchers have presented sophisticated job analyses, and identified probation 

criteria. Results of job analyses show that several job titles which are listed under the same 

assignment symbol, in the Personnel Requisition Table, differ in their aptitude characteristics 

to a degree that separation is being suggested. For numerous jobs, psycho-diagnostic constructs 

were found for which our EVP methods do not provide sufficient information (see Table 5 for 

the radio relay soldier), They will probably be supplemented by test procedures which will allow 

for prognosis of concentration and stress tolerance, memory, and spatial imagination. 

Summary 

A random-based system of conscript manpower replacement in the German Bundeswehr proved 

unable to ensure the sufficient qualification of recruits in their military jobs. Since 1965, 

conscripts perform a psychological Aptitude and Placement Examination (EVP) before they are 

called up for service. Roughly 75 % of conscripts complete their training regularly by king 

awarded the specialty corresponding to their assignment. The aim of the Psychological Service 

is to increase this percentage by detecting in conscripts abilities yet unexploited, and making 

use of them in personnel replacement. This implies improvements in the methodology of 

aptitude diagnosis, especially also the application of new types of tests. By means of psychological 

job analysis, work characteristics which have not been covered by EVP diagnostics, are 

to be identified, and appropriate examination methods are to be developed. Additionally, 

einpirical studies are to be carried out to investigate into the validity of our present examination 

methods with regard to military job demands. As a first step, aptitude characteristics of the 

military jobs taking part in the quarterly replacement, were categorized and operationalized by 

psycho-diagnostic constructs which might allow for aptitude assessment. Inspection of these 

constructs shows that part of them are covered by traditional EVP test methods while some 

important characteristics do not seem to be methodically represented in our Entrance Examination. 

At the moment, validation studies are being carried out on 28 different military jobs for 

which requirements are urgent and in which one single aptitude characteristic is prominent. 

Investigation designs and some first results are available. 

115 

- -._.---I__._ 4 ---I

_ ____. - __.-.. ~.._ . ..-__.-- 

DEVELOPING A TRAINING TIME Sr PROFICIENCY MODEL FOR ESTIMATING 

AIR FORCE SPECIALTY TRAINING REQUIREhlENTS OF NEW WEAPON SYSTEMS 

David S. Vaughan Winston R. Bennett 

Jimmy L. Mitchell SC J. R. Knight David V. Buckenmyer 

McDonnell Douglas Missile Training Systems Division 

Systems Company Air Force Human Resources Laboratory 

Abstract 

Estimating traiqing costs and training capacity constraints are among the major manpower, 

personnel, and training issues in the development of new weapon systems. Use of the recently 

developed Training Decisions Modeling Technology in the systems acquisition process is 

problematic since no occupational survey data will be available as a basis for modeling the specialty, 

its jobs, and its training. This paper reports an innovative experimental approach using subject 

matter experts’ ratings of generic skill and knowledge categories for the anticipated work to predict 

training time and proficiencies (training setting-specific learning curves). Regression aaalysis 

indicates that substantial proportions of the variance in training time curves can be predicted from 

such ratings. This approach may improve training decision making and logistic support x~~lyscs 

early in the new weapon system acquisition process. 

Bacl

equired for an occupation (Ruck, 1982). It includes procedures for developing data bases 

and modeling the dynamic flow of people through jobs and through both formal training and 

on-the-job training. Furthermore, the system includes modeling and optimization capabilities 

which provide estimates of training quantities, costs and capacities for both formal training 

and training on-the-job training (Vaughan, et al., 1989). 

Problem - MPT Decisions in the New Weapon Systems Acquisition Process 

In the New Weapon Systems Acquisition Process (NWSAP), the assessment of changes 

required in manpower, personnel, and training programs are difficult (Gentner, 1988) - the 

problem is particularly acute for the largely-hidden on-the-job training (OJT) costs and OJT I 

capacity of units which will receive the new system. The TDS, with its capability to estimate . 

such costs and capacities, may be of considerable value in helping evaluate MPT costs and 

capacities in NWSAP studies, if TDS procedures can be adapted to predict needed task 

charactersitics and to model expected impacts on job and training patterns. 

TDS Training-Time Models 

Training-time models are important components of the TDS data base. These models 

may be thought of as learning curves; they translate training time on a group of tasks (task 

module) into the proficiencv, relative to full proficiency, obtained from such training. Figure 

1 illustrates a set of training-time models for an aircraft maintenance task. Note that 

separate learning curves were developed for several major training settings or training 

delivery methods, including classroom, correspondence course, guided hands-on, and OJT. 

These training-time models permit different training delivery methods to be traded off to 

find the best way, or combination of ways, to deliver training for a particular task. These 

training-time models play a critical role in the TDS model. In particular, they are the basis 

for estimating OJT training quantities. 

1 

Profiency (Z:) 

00 , 

6 0 - 

0 2 4 6 a 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 

Training Hours 

Figure 1. Training Allocation Curve for Aircraft Environmental Systems Task Module 34. 

117 

- 

i

___-.-. _- _-_.__^I__. _.. - -... 

--~_ 

In the TDS R&D work, training time models were developed from SlMEs’ judgments 

concerning training times in various training settings required to reach full proficiency. This 

approach proved satisfactory for ongoing MPT planning applications (Vaughan, et al, 1989). 

However, it poses several problems for the weapon-system-design application. First, it 

requires SMEs who are familiar with training on the subject tasks. For a new weapon 

system, there are no SMEs with “hands on” experience with training. This is a common 

problem in Logistics Support Analysis (LSA); the usual solution is to find extsting systems 

that are comparable to a new system. Data and SMEs are used for the existing comparable 

systems. In general, this approach involving comparable existing systems could be used to 

estimate training time models for the new weapon system tasks. However, because the new 

system often makes use of technology not incorporated in any comparable existing system, 

some of the new tasks have no counterparts on existing systems. Thus,. the comparable 

e.xisting system approach is not entirely satisfactory for our use. 

Experimental Approach 

In the weapon-system-design application, the TDS should be sensitive to design changes 

and should provide feedback to designers concerning which features or aspects of their 

designs are the primary drivers of training requirements. The training-time-modeling method 

does not identify which task features or characteristics determine a task’s training time model 

and cannot provide feedback to designers concerning how change a design in order to reduce 

training requirements. That method would rely entirely on SMEs’ judgments based on global 

task experience to obtain the training time models. As a consequence, the method is not 

likely to be very sensitive to impacts of design changes on task training times. 

Equation 1 is the model that we used on the TDS R&D to estimate the training time 

curves such as illustrated in Figure 1: 

p = ac,hc, + acIzhcl* + a,h,, + acorLhmrZ + aSohho + ahoZhho2 + ao,~holl 

+ ao,12ho,12 

[Equation l] 

where p = relative proficiency, a:s = regression weights, hi’s = training hours in various 

training settings, and subscripts for training settings are defined as: 

cl = classroom, 

car = correspondence course, 

ho = guided hands on (Air Force field training detachment courses, etc), and 

ojt = on-the-job training. 

The model of equation 1 has several features. First, it has no additive constant; zero 

training hours produces zero proficiency. Second, each curve of Figure 1 corresponds to the 

second-order polynomial equation section of equation 1 associated with a particular training 

setting. Third, the polynomial equation segment associated with each training setting is 

neg:tivelqr accelerated. All eight model parameters associated with a particular task were 

estimated simultaneously in a single regression analysis. 

118

For the weapon system design TDS application, our objective is to replace the separate 

training time equations for each task module with a single equation that can be applied to 

any task. In the desired equation, task modules are described by scores on scales which 

reflect various skill and knowledge requirements. The first step in developing such a training 

time model is to generalize the task-specific training time model of equation 1 to cover many 

tasks. This can be done by introducing dummy-coded task identification variables: 

P = sum(i= 1,t) [ a,,h,,xi + acuhclki + a,&,> + amdhmrki + 

where xi dummy-coded task identification variable for task i, 

i= l...r, and 

xi = 1 if the current observation is for task i, 0 otherwise. 

[Equation 21 

Equation 2 may be thought of a model whose variables are interactions of tasks and 

training hours. This model contains 8 times t (number of task modules) interaction predictor 

variables. Consider a model in which tasks (e.g., the task indicator variables x are replaced 

with task descriptions in the form of scores for tasks on skill and knowledge scales: 

P = sumtj = l,r) [ aclh,Jj + acuhZYj + amrhmryj + amrlhmr% + 

where yj = score for current task on rating scale j, j = l...r. 

[Equation 31 

Equation 3 may be thought of as a special case of equation 2, in which the task by 

training hour interaction is restricted to that portion attributable to the task rating scale 

scores. If the scores measure the task features that drive their training time models, then 

equation 3 will account for most of the proficiency variation that equation 2 can account for. 

The next step in building our training time model was to identify a set of standardized 

skill and knowledge scales. For this purpose, we adopted a set of 26 skill and knowledge 

dimensions that was developed by occupational analysts at the USAF Occupational 

Measurement Center for classifying tasks in various occupations (Bell & Thomasson, 1984). 

More recently, these task dimensions have been revised by researchers for use in assessing 

skill transferability between occupations (Lance, Kavanagh, & Gould, 1989). 

We obtained ratings on each of the 26 scales (see Figure 2) for all task modules in the 

Aircraft Environmental Systems Maintenance (Air Force Specialty 423Xl),occupation. This 

occupation contains 57 task modules, each composed of one or more occupational survey 

tasks (Perrin, Knight, Mitchell, Vaughan, & Yadrick, 1988). The ratings were obtained 

specifically for this R&D work from five Air Force Non-commissioned Officers (NCOs) who 

were experienced in the Aircraft Environmental Systems Maintenance occupation. 

Agreement among the five raters was measured for each scale by the intraclass correiation 

or omega-squared (Hayes & Winkler, 1971). This intraclass correlation is equivalent to the 

R,, measure often used to evaluate occupational.survey task factor ratings. For raw ratings, 

119

intraclass correlations for many scales were zero 

or negative. A standardization transformation to 

remove scale use differences among raters was 

needed. For these scales, a zero rating has 

absolute meaning--that a task requires no skills 

or knowledges related to a particular scale. A 

standardization transformation should not change 

zero ratings. For this reason, the following 

standardization was applied to the ratings: 

Y ,,k = X,,, / [sum(k= 1,57) Xijk ] [Equation 41 

where yi,k = standardized rating for rater i on 

scale j and task k, and 

X Ilk = raw rating for rater i on scale j and 

task k. 

Figure 2 presents interrater agreement 

statistics for the 26 rating scales after this 

standardization. The ratings have acceptable 

interrater agreement. Three of the scales, 

Medical-Patient Care, Medical-Equipment 

Oriented, and Medical Procedures had no 

non-zero ratings for tasks in this occupation. 

Thus, meaningful intraclass correlation statistics 

could not be computed for these scales although 

all raters agreed on all ratings for these scales 

(zero). 

For modeling purposes, we augmented the 

training-time data file from the TDS R&D with 

scores on the 26 skill and knowledge scales. We 

used mean standardized ratings across raters for 

each task and scale. If the 26 scales are a useful 

basis for estimating training-time models, then 

equation 3, which uses scores on the scales along 

with training times to predict proficiency, should 

account for most of the proficiency variation 

accounted for by equation 2, which includes 

actual task identities. 

Results 

Scale ( Omega21 Rkk 

1. Clencal .28 .66 

2. Computatronal .13 .44 

3. Office Equipment Operatton 34 .72 

4. Mechanical .13 A3 

5. Simple Mechanrcaf .06 .23 

EquipmentlSystems 

Operatron 

6. Complex Mecnanicaf .Ol .06 

Equipment/Systems 

Operation 

7. Mecnanical-Electrical .15 .47 

8. Mechanfcai-Electronrc .20 .56 -. 

9. Elecnrcal .ll .38 

10. Eiewomc .20 .56 

11. E!ectncaf-MechanrcaJ .22 .58 

12. Eiectncaf-Eiectronc .13 A3 

13. Eiectronic-Mechantcal .19 .54 

14. Simple PhysrcaJ Labor .oo .oo 

15. Medical-Pattent Care . . 

16. Medical-Equipment . . 

Or-tented 

17. Medical Prccedures . . 

18. Simple Nontechnical .02 .I3 

Procedures 

19. Communicative-Oral .20 .55 

20. Communicatrve-Written .*9 .a3 

21. General Tasks Or .13 .e 

Proceaures 

22. Reasoning/Planning/ .05 .2? 

Analyzing 

23. Scienttfic Math Reasoning .08 .31 

Or Calculatrons 

24. Speaal Talents .05 ‘9 

25. Supervisory .27 24 

26. Training .05 .20 

Note: 

Omega Squared Is The Intrac!ass Correlation fnter- 

Rater Agreement: It Is Equivalent To The Fit 1. Pkk 

Is The Estimated Reliability For The Mean Rating From 

Five Raters. 

‘All Tasks Had Zero Ratings; Inter-Rater Agreement 

Statistics Are Meaningless. 

Figure 2. Interrater Agreement Data 

for the 26 Skill and Knowledge Scales. 

Our first modeling activity was to fit the regression model of equation 2. The R* for this 

model was .65, which is statistically significantly greater than zero: F(451,2255) = 9.5, p < .OOl. 

Next, we fit the regression model of equation 3, which replaced task identification variables 

with scores on the skill and knowledge scales. The R* for this model was .52, which is also 

statistically significant: F( l&$,2525) = 14.9, p < .OOl. If one views the skill and 

120

knowledge-based model (equation 3) as a restricted version of the full task identity model 

(equation a), the R’ increase associated with the full model may be tested. That test shows 

that the R’ difference between these models, .13, is statistically significant: F(267,2258) = 

3.1, p c .Ol. However, the skill and knowledge scales model accounts for 80% of the variance 

accounted for by the full-task model. Thus, the skill and knowledge model has great 

practical value for estimating TDS training time models. 

Discussion 

The skill-and-knowledge scale model is much more accurate than we expected. Also, it 

permits training-time models--learning curves--to be estimated for tasks early in the design 

process, and it provides feedback to designers concerning which particular task skill and 

knowledge requirements are causing high training times. For these reasons, we believe that 

the skill-and-knowledge trainingtime model represents a significant step forward in our 

ability to design systems with acceptable training requirements, and to incorporate MPT 

considerations into the weapons system design process. 

REFERENCES 

Bell, J. & Thomasson, M. (1984). Job Cntecorization Proiect. Randolph AFB, TX: United States Air Force 

Occupational Measurement Center. 

Christal, R.E., s( Weissmuller, J.J. (1988). Job-task inventory analysis. In S. Gael (Ed), Job Analvsis Handbook 

for Business. Industrv. and Government. New York: John Wiley and Sons, Inc. (Chapter 9.3). 

Gentner, F.C. (1988, December). USAF Model Manpower, Personnel and Training Organization--An Update. 

Proceedincs of the 30th Annual Conference of the Militarv Testinu Association. Arlington, VA: U. S. Army 


Hayes, W.L. Sr Winkler, R.L. (1971). Statistics: Prohahilitv. Tnference and Decision. New York: Holt, Rinehart 

s( Winston. 

Lance, C.E., Kavanagh, MJ., Sr Gould, R.B. (1989, August). Development and Convergent Validity of Cross-Job 

Movement Indices. Paper presented at the annual meeting of the American Psychological Association, New 

Orleans, LA. 

Mitchell, J.L.; Ruck, H.W.; & Driskill, W.E. (1988). Task-based training program development. In S. Gael (Ed), 

Job Annlvsis Handbook for Business. Tndustrv. and Government. New York: John Wiley Sr Sons, Inc. 

Perrin, B.M., Knight, J.R., Mitchell, J.L., Vaughan, D.S., & Yadrick, R.M. (1988). Trainine Decisions Svstem: 

Develonment of the Task Characteristics Subsvstem (AFHRL-TR-88-15). Brooks AFB, TX: Training Systems 


Ruck, H.W. (1982, February). Research and development of a training decisions system. Proceedinas of the 

Societv for Applied Learnine Technoloa. Orlando, FL. 

Ruck, H.W., SC Birdlebough, M.W. (1977). An innovation in identifying Air Force quantitative training 

requirements. Proceedines of the 19th Annual Conference of the Militarv Testine Association. San Antonio, TX: 

Air Force Human Resources Laboratory and the USAF Occupational Measurement Center. 

Vaughan, D.S.; Mitchell, J.L.; Yadrick, R.M.; Perrin, B.M.; Knight, J.R.; Eschenbrenner, A.J.; Rueter, F.H.; and 

Feldsott, S. (1989, June). Research and Develonment of the Trainina Decisions System (AFHRL-TR-88-50). 

Brooks AFB, TX: Training Systems Division, Air Force Human Resources Laboratory. 

121

EVALUATING TRAINING PROGRAM MODIFICATIONS 

Deborah Lawson McCormick and Paul L. Jones 

Naval Technical Training Command 

Evaluating changes in training programs is never a simple 

task, even under laboratory conditions where threats to validity 

can be controlled. In operational settings evaluation may appear 

to be an insurmountable problem -- one in which good evaluation 

methodology does not seem feasible. One major problem for __ 

evaluators in operational settings is that they are often not 

consulted until after training modifications have already been 

initiated. As a result, both experimental control and 

opportunities for data collection are severely limited. 

Even in those rare cases where evaluators are a part of the 

implementation from its onset, problems exist. For example, an 

evaluation design which uses equivalent control and experimental 

groups is often not possible in on-going training programs. In 

addition, operational settings are inherently dynamic environments; 

consequently, the effects of deliberate program changes are 

confounded with effects of other random factors which 'constantly 

impact the program. In these cases, isolating effects directly and 

unquestionably attributable to factors of the program change is 

impossible. 

This difficulty in establishing definite cause and effect 

relationships is sometimes used as a reason to forego evaluation. 

Rather than attempting a seemingly futile task, the tendency is to 

rely on intuition. The argument goes something like this: "These 

changes make sense, the students like them, the instructors like 

them . . . they probably work." 

However, increased competition for funding dollars makes the 

need to verify training improvement and justify additional funds 

crucial. Increasingly, funding sources are requiring hard data in 

support of dollars spent. As evaluators, we are being forced to 

accept that a less than perfect evaluation (that is, one which only 

suggests, rather than "proves," cause and effect) is better than no 

evaluation at all. 

This paper describes an evaluation model which we feel is 

flexible enough to prove useful in most evaluation circumstances, 

from the ideal condition, where evaluation has been planned in 

conjunction with change implementation, to those evaluation 

nightmares, where change implementation is complete before the 

evaluator is consulted. Following a brief description of the 

model, and application of its use is discussed. 

122

EVALUATION MODEL 

The evaluation system we recommend approaches evaluation on 

two levels. At the more immediate level, we attempt to determine 

effects on student performance in the specific training areas 

modified. For example, changes in test scores or training time in 

those specific content areas might be analyzed. The second level 

attempts to determine how the training program as a whole was 

affected by the modifications. Such measures as course attrition, 

total training time, performance in other areas of the program, 

etc., would be considered. Inferring cause and effect 

relationships becomes riskier as one moves to these more general 

measures of effect -- measures further removed from the proposed -' 

cause. However, modification in one area of the training program 

should ultimately affect the program as a whole and become 

manifested in these general measures. In reality, it is this 

broader impact which serves as the bottom line point of interest 

for most of our clients. 

The evaluation model we use can be condensed to a six-step 

process described below: 

1. Beqin evaluation nlanninq earlv, before imnlementation of the 

proqram chanqe if oossible. The evaluator should be involved as 

soon as possible, ideally during the modification planning stage -certainly 

prior to modification implementation. Many threats to 

validity can be anticipated and controlled if the evaluator is 

involved in this manner. Realistically, however, we know that this 

scenario seldom occurs. More often, the evaluator is called in 

after the modification has been implemented. For this reason, we 

usually find ourselves beginning with step two. 

2. Know the oroqram YOU are about to evaluate. A thorough 

understanding of the nature of the program change and its impact on 

the general operation of the training program is critical to good 

evaluation. The evaluator must understand the program's 

objectives, the anticipated impact of the change on these 

objectives, and the methods used to accomplish them. In addition, 

the evaluator must determine what data is currently being collected 

to evaluate program performance and whether this data might be 

useful in evaluating the program change. Most importantly, a 

definitive statement of how the change is intended to affect the 

program (that is, the goal of the program chanqe) must be 

formulated. 

3. Determine data collection procedures and qather baseline data. 

The purpose of baseline data is to develop a snapshot of how well 

the program is performing in the area to be modified prior to the 

change. Often you will find existing measures of performance, such 

as test scores, which directly address this question. In other 

123 

I

a 

----- --.___-__..-.. . . . _. ..____.._ -. ~~ 

. 

cases, it will be necessary to introduce new data collection 

instruments, e.g., surveys, questionnaires, etc. 

Whether data collected should be restricted to only the area 

modified, to a broader segment of the program, or to the program in 

totality depends on the expected effect of the change. In general, 

the interdependence of program parts usually warrants a complete 

evaluation. 

4. Monitor the implementation of the chancre. Keep informed about 

how the implementation is proceeding. Document associated factors 

which might impact the success of the change, such as changes in 

instructor or student attitudes, changes in quality of either 

instructors or students, or changes in resources. 

5. After the nroaram has stabilized followina incorooration of the 

chanae. aather data for comoarison with the baseline. This step 

involves collecting data corresponding in type to the baseline data 

for a sample of students under the modified program. This step 

will involve readministration of instruments developed for the 

evaluation, for example attitude questionnaires. 

6. Analyze data and internret results. Most of our clients have 

neither the time nor the propensity to wade through a morass of 

statistics. Although we sometimes use fairly sophisticated 

statistical procedures and usually include these analyses in the 

report, we always attempt to synthesize the findings for our 

clients. We try to answer the general question of how the training 

program was affected by the modification in an easily accessible 

one page (or one table or figure) summary. 

MODEL APPLICATION 

In 1987, a project known as the Model School program was 

initiated at Electrician Mate's (EM) School at the Naval Training 

Center, Great Lakes. The purpose of this project was to examine 

the training program for EM's at this school, explore ways to make 

that training better, and implement those that were feasible. As 

a result of this project, a number of changes took place in this 

program over the next two years. For instance, a technology-based 

learning center was instituted, changes in remediation occurred, 

the testing program was revised somewhat, etc. 

In the spring of 1990, the Research Branch of the Navy 

Technical Training Command was tasked with conducting an evaluation 

of the impact of these changes on the training program. Because we 

were not involved during the implementation of the project, we used 

a modification of the six-step approach described above. 

In this case, our first step was getting to know the program 

we were tasked with evaluating. From talking with the school staff 

124

and various other sources, we became familiar with the training 

program as it existed prior to the Model School Project, the 

objectives of the project and changes made to the training program 

as a result of it, and other occurrences which, although they may 

have been coincidental, had potential for impacting the training 

program. We found that many of the changes had potential for very 

subtle impact; for example, the staff's optimism for the program 

probably improved their teaching, but this notion is difficult to 

substantiate. Also additional out- of-class study aids had been 

developed and introduced throughout the training program. 

Cumulatively, one would expect these changes to result in improved 

student performance; however, we felt the attempt to isolate and 

attribute effects to individual factors would be impossible'in a -. 

post hoc evaluation design. 

With these ideas in mind, we approached the evaluation with 

two broad questions: (1) How did the performance of pre-Model 

School project students compare with the performance of post- Model 

School students, in terms of attrition rate, setback (repeating of 

course segments) rate, test scores, and number of retests? (2) 

What changes occurred in the intervening time period which may have 

impacted student performance? 

Next we constructed baseline data for a group of students 

attending the training prior to modification for use as a quasicontrol 

group. The school had been utilizing an automated testing 

program which maintained students' scores on tests and number of 

retests taken. This data gave us a picture of performance in the 

individual content areas, as well as an overall measure of 

performance. We also collected more general performance measures 

such as course attrition and setback data. 

Corresponding information was collected for a comparison group 

who received the training after Model School Program 

implementation. Because academic ability levels of students in 

fundamental training courses have historically varied 

systematically with the season of the year, we selected our 

comparison group from months corresponding to that of the '8contro111 

group in an attempt to maximize equivalency of the two groups. 

Data were, analyzed and major findings presented in the summary 

format shown in Figure 1. This graphic representation enabled us 

to overlay potential impacting factors with major measures of 

student performance. Our clients liked this format because it 

provided an at-a-glance picture of both the changes to the training 

program and corresponding variations in terms of student 

performance. In this instance, the overall impact of the program 

appeared to be positive in that the two major indicators of 

training success, attrition and setback rates, both improved. 

125

-----r^___._.______I_______. 

-...-.._---_-----. - -- 

126 

.--_ ._ __ _

.--_-.-- .---- ----- -...- -._--_.--_-__ 

CONCLUSION 

Sometimes evaluators are hesitant to perform evaluations such 

as the one just described because they are too messy and imprecise. 

When we are coerced to perform them, we tend to apologize for the 

product. These evaluations do little more than provide a 

historical description the program in terms of its components and 

its performance. But even these evaluations serve two important 

functions. First, they provide concise, accurate descriptive 

information to program managers, information otherwise not 

available to them. Secondly, they establish a climate conducive to 

evaluation. Managers become aware that many questions they would 

like to have answered conclusively could be answered if evaluators -' 

are consulted early in the modification process. 

In summary, our advice is to accept those messy evaluation 

projects, adapt proven evaluation methodology/procedures to your 

particular set of circumstances, and conduct the evaluation, 

exercising controls to validity wherever possible. At best, you'll 

be able to analyze the data and draw cause and effect relationships 

with reasonable accuracy. At worst, you'll be able to synthesize 

the data and describe changes, suggesting possible causes. In 

either case, the client will have more information and be better 

equipped to make informed decisions than he/she would have been 

otherwise. 

127

_____._._ .- _ .-... -.._- -..._ . . _ _. 

The Effect of Reading Difficulty 

on Correspondence Course Performance 

Dr Grover E. Diehl (ECI) 

During the 1989-90 academic year the Air Force Extension Course 

Institute(EC1) broadly examined the impact on reading level on 

the correspondence courses in Career Development Course (CDC) 

First reading grade levels (RGL) of CDCs were 

~~~~~%$ using Che FORCAST method. The FORCAST method involved 

the manual counting of words in samples of text. These RGLs here 

then examined to determine whether the RGLs had increased significantly 

on a year by year basis and also with 1982 data to 

determine whether there were significant differences between the 

samples. Next, RGLs were correlated with end-of-course performance 

by percent of first time exam failures and then by proportion 

of overall course failures. Following this, the FORCAST 

RGLs were correlated with target RGLs prepared by the Air Force 

Human Resources Laboratory and with computer generated RGLs 

using a Flesch-Kincade formula. 

A basic intervening variable in the assessment of reading difficulty, 

however, was the fact that personnel and Air Force jobs 

were matched during enlistment processing so that the most intellectually 

demanding skills were peopled with the most intellectually 

able personnel. One way around this problem was to 

calculate difference scores between the RGL targets and the obtained 

FORCAST RGLs -- a measure of perceived difficulty of the 

material to the student -- and correlate this with failure rate. 

This procedure treated student ability as a covariate with a 

corresponding reduction in the error portion of the predication 

equation, without the necessity of using analysis of variance.An 

analysis of difference scores constituted the last question to 

be addressed. 

Findings 

FORCAST RGL and Edition Date. No statistically significant association 

was found between the FORCAST reading level and the edition 

date of the materials (a period of about 12 years). The 

Pearson Product Moment Correlation coefficient (r) of FORCAST 

RGL with edition date was .0742 (N = 215, p =.279). To check 

for possible curvilinearity, a scatterplot was prepared which 

suggested a completely random occurrence pattern. FORCAST reading 

level did not vary in a linear way from year to year. 

Difference Between RGLs Sampled in 1982 and 1990. There was apparently 

sufficient variation within the samples to be statisti- 

128

tally significant. Hotelling's T2 was 5.1998 with a probability 

of .027 at 1 and 2 degrees of freedom. It should be noted, however, 

that the test,;was made on a group to group basis and there 

was no indication which individual pairs may have changed the 

most. It was in fact possible that no pairs would significantly 

vary even though the full model rejected the null hypothesis. 

Also, since the test was non-directional, it was not possible to 

identify which group contained higher RGLs than the other although 

they appear to have been higher more recently. I 

. 

An observation related to this, however, was the significant' 

correlation of the RGLS of the two samples (r = .4709, p = 

.OOl). This raised an interesting situation in which samples 

taken in 1982 and 1990,: although significantly different,. were 

none the less related. The relatedness, however, was not developmental 

over time. One possible solution to this ambiguity was 

that RGL was varying with frequent but intermittent corrections 

using a current "clearly written text" standard. 

1990 RGLs and First Time Examination Failures. RGLs were not 

significantly related to first time examination failure rates (r 

= .0741 and probability = .327). 

1990 RGL and Overall Course Failures. ~11 students failing the 

first final examination were provided a retest. Course failure 

required failure of both the first examination and the 

reexamination. As was the case with first time exam failures, 

course failures were not significantly related to RGL in the 

1990 sample (r = .0404, probability = .403). 

FORCAST RGL and AFHRL Targets. The correlation between FORCAST 

RGLs of course materials in the 1990 sample and AFHRL targets of 

actual student reading ability (50th percentile reading ability) 

was .0695 and was not significant (p = .333). A reduced target 

at the 15th percentile also failed to be significantly related 

to the obtained FORCAST RGL (r = .0249 with probability = .439). 

The data failed to demonstrate that the variation within the 

reading ability of personnel was linearly related to FORCAST 

RGLs of the CDC material. 

FORCAST RGL and Flesch-Kincade RGL Comparison. Due to resource 

limitations on the Flesch-Kincade RGL side, comparisons were 

made on only one CDC consisting of four volumes. The means and 

SDS were 11.2725 and . 5187 for FORCAST and 9.0800 and .1619 for 

Flesch-Kincade. The obvious difference between the averages was 

significant with Hotelling's T2 of 92.8489 and probability equal 

.004 '(df = 1 and 3). Flesch-Kincade generated significantly 

lower RGL estimates than did FORCAST. The correlation, although 

129

------ ---- __ ._ ._ -.._-_-- .--. 

- . 

large by research standards (r = .5252) was not statistically 

significant (p = .237). 

Difference Scores and Failure Rates. Using a 50th percentile 

personnel reading ability as a target base, the correlation of 

the RGL deficits with first time exam failure rate was .2117 

with a probability equal to . 101 -- not statistically significant. 

When a 15th percentile target base was used, the correlation 

of the deficits with first time failure rates was also 

not significant (r = .2459, p = .068). Similar analyses of . 

course failure rates yielded the same result. 

Conclusion 

FORCAST reading grade levels were not significantly associated 

with end-of-course test performance, reading grade level targets 

using the Air Force Reading Ability Test scale, or Flesch- 

Kincade reading difficulty obtained from a computer analysis. 

Additionally, FORCAST reading grade levels had not changed consistently 

over time. There was evidence that RGL had risen 

slightly sometime during the eight year period but it was unclear 

whether the rise was continuing. 

Careful examination of the summed evidence suggested, however, 

that the null outcomes were possibly due to an aggressive 

"clearly written text" program within ECI. This effort, which 

replaced FORCAST in the mid-1980s, introduced an ongoing conscious 

effort on the part of the text writers and reviewers to 

ensure the readability of the materials. Earlier information 

suggested that use of FORCAST was associated with a reduction in 

reading difficulties to the point where FORCAST was no longer 

predictive. Present data suggested that the "clearly written 

text" standard may continue to limit the value of FORCAST as a 

predictive indicator. 

Discussing more generally the issue of attention to RGL, it was 

noted that most ways of determining RGL and tests designed to 

assess the reading ability of students were highly correlated -often 

as highly intercorrelated as the validity coefficients of 

the individual measures. Differences in outcome Values were 

typically due to scale. The task of maintaining acceptably low 

reading difficulty within written materials was primarily one of 

maintained focus on the problem using any of several means. FOR- 

CAST was one means easily calculated by hand. The Flesch- 

Kincade RGL provided here by Right-Writer, although almost 

necessitating a computer, was a viable option especially when 

the written material was already in an acceptable word processing 

medium. The Right-Writer output in fact contained consider- 

I . ..1__ 

II- -... 

130

able ancillary information which could be useful to writers. For 

example, suggestions for making writing more direct and improvement 

of sentence structure were given, and there was a listing 

of negative words, jargon, colloquial and misused words, 

questionable spellings, and words which readers may not understand. 

External reports such as these served to alert writers 

and reviewers to idiosyncrasies which may distract the student 

from the material and to maintain focus on reading difficulty 

and level. 

Audience: Instructional developers. 

. . 

For more information contact Dr Grover Diehl, ECI, Gunter AFB AL 

36118-5643, 

AUTOVON 446-3641 or commercial 205-279-3641. 

131

Navy Basic Electricity Theory Training: Past, Present, and Future 

Steve W. Parchman, John A. Ellis, & William E. Montague 


THE OPINIONS EXPRESSED IN THIS PAPER ARE THOSE OF THE AUTHORS, 

ARE NOT OFFICIAL AND DO NOT NECESSARILY REFLECT THE VIEWS OF 

THE NAVY DEPARTMENT 

Introduction 

Basic electricity and electronics theory training (BETT) in the Navy has historically 

had high attrition and setbacks and has been plagued by questions about the relevance of 

the course content to Navy jobs. BETT is taught as a separate topic at the beginning of 

more than twenty Navy A schools to more than 20,000 students annually. BETT 

material historically has proven difficult for students to learn and has resulted in high 

attrition and set-back rates. For example, in FY 88 attrition in five electrical A schools 

averaged 28% (AE, ET, EM, IC, DS; total annual throughput = 5000). Average setback 

rate for these same schools was 69%. Approximately 70% of these losses occurred in the 

BETT phase of these courses. Further, the abstract nature of this content has raised 

questions about its relevancy for vocational jobs. For example, recent research has 

shown that trainees who have passed course tests fail to pass relatively simple practical 

exercises. These problems with trainee learning have remained even in the face of 

substantial expenditure of effort to revise the content and to change the method of 

delivering it. 

Research on learning and training suggests that more fundamental changes in 

curriculum structure can lead to improvements in learning. Research and development is 

needed to develop and test alternative methods for training electrical and electronic 

theory, with the goal of reducing both attrition and setback rates by a minimum of 25%. 

This paper discusses Navy basic electricity and electronics theory training (BETI’) 

with some suggestions for development of future training programs. It begins by briefly 

reviewing the history of Navy BETT training followed by a discussion of alternative 

approaches to this training that have been tried. Finally, several options for training 

improvements are presented. 

BETT History 

Through the 1950s and 6Os, Navy electronics training was both theory and math 

intensive. Well qualified trainees were amply available, thanks in part to the draft. “A” 

School electronics courses, often eight months long, challenged the trainees and also 

prepared them for the rigors of the “B” schools. “B” schools of up to fifteen months were 

available to qualified re-enlistees. These schools resembled university engineering 

programs. 

Perhaps it was inevitable that two dozen or more schools around the country 

independently teaching similar content would generate pressure for consolidation. In the 

early 196Os, consolidations were carried out, and a common syllabus, based on Bureau of 

Personnel publications, was adopted at each of the major training centers. 

TWO factors which came into play in the 1960s and 70s resulted in major changes in 

Navy electronics training. First, the Programmed Instruction (PI) movement reached its 

Peak of popularity in the 60s. Use of this approach in the Navy was judged desirable, 

132

and a contractor (Westinghouse) was funded by the Bureau of Personnel to convert the 

basic or introductory portion of Electricity/Electronics courses into a self-teaching 

-format. The contractor’s charter was not to change the substance of the course, but rather 

to convert it into a different “delivery system.” With the assistance of a committee of 

E/E instructors from San Diego schools, the basic lectures of the BuPers syllabus were 

converted into narrative and PI materials, summaries were written, test items were 

inserted as progress checks, and module tests, midterms, and final exams were also 

prepared from existing test items. In 1968, a partially self-paced compromise version of 

several variations of instructor-taught basic electricity and electronics was offered. 

Second, in 1967, NPRDC (then Navy Personnel and Training Research Laboratory) 

and CNATECHTRA began work on a computer-managed instruction (CMI) system. 

One of the courses eventually put on the CM1 system was the Westinghouse conversion 

of the basic E/E course. With only minor modifications, it was on-line at NAS Memphis 

as a CM1 course in 1973. 

A major organizational change also influenced BE/E training. Following 

recommendations of a review board (the Cagle Board), control of technical training was 

moved from Bureau of Personnel in 1972, and vested in two new organizations, Chief of 

Naval Education and Training (CNET) and Chief of Naval Technical Training (CNTT), 

with the latter absorbing the functions of CNATECHTRA. These new organizations 

evaluated the CM1 course and concluded that this form of training could be effective and 

economical. A CNTT in-house group (MIISA) was created to improve and expand the 

CM1 software. Basic E/E was consolidated in four schools and incorporated into the 

CM1 system. In 1975, the Westinghouse version of the San Diego compromise of the 

BuPers version of Basic E/E was standardized throughout the Navy. Between 1975 and 

1984, while there were cosmetic changes to the course material, the only substantive 

modifications were to increase the CM1 system’s ability to output various summary 

reports, to eliminate some “nice-to-know” material, and to add some modules on newer 

technologies such as transistors and advanced circuitry. The system was effective in 

meeting its goal of graduating large numbers of students in a significantly reduced 

amount of training time. 

In late 1984, CNET made two major decisions regarding BETI’: (1) the courses 

will be converted from self-paced to group-paced instruction, and (2) the training will be 

integrated into the appropriate “A” schools (thus, BETI courses would disappear as 

separate entities). The conversions began in 1985 and the majority were completed by 

1989. The BETT courses were phased out. These conversions did not result in any 

major redesign of BET’I’ training. Instead the existing BE’IT materials were adapted and 

added as the initial phase to the existing “A” school courses. The decision to add BETT 

to ‘A’ school courses presented an opportunity to increase the job relevance of this 

training. However, the schools were unable to do this during the initial adaption phase 

because they did not have the resources for making major course revisions. 

In general, the problems with the current materials are: (1) there has not been a 

recent job or task analysis; the instruction on electricity was adopted from simplified 

older physics courses*, (2) there are opportunities for course improvements including 

*Physicists regard the content and sequence of BETT to be outmoded theoretically. “The study of 

electricity at rest - “electrostatics” - used to bulk large in elementary physics. It was all that was 

133 

.

instructional design, test items, and laboratory experience that should be explored, and 

(3) students do not seem to develop a good practical understanding of electricity and 

electronics. Attention to all of these issues could lead to lower attrition and setback rates, 

and improved transfer to job tasks. 

Practical Test 

A practical hands-on performance exam was developed by NPRDC to test the 

transfer ability of BETT curriculum to more relevant job situations. The BETT course 

objectives were evaluated to determine which would be the most common to the job a 

technician might do in the fleet. From this analysis five hands-on test questions were 

developed using real components, resistors, capacitors, conductors, and a flashlight 

(battery) which tested the trainee’s ability to recognize and identify electrical 

components, determine the components operating condition using a multimeter, and 

analyze its effect in an operating circuit. Since the BETT course materials focus on use 

the multimeter, the test assumed a similar focus. Prior to giving the test the school staff 

evaluated it and thought that their students would have little difficulty achieving a good 

score. 

The test was given at two Navy class A schools once in 1986 at the Avionics 

Technician class ‘A’ school in Memphis, and again in 1988 at the Electrician’s Mate 

class ‘A’ school in Great Lakes, Illinois. 

The Memphis Study. 

The hands-on test was given to determine whether students could apply knowledge 

and skills learned in BETT in practical situations. The data from the hands-on 

performance test show that these students performing at very low levels. The mean 

scores for this test were 61.3 (n=l05) of 104 possible points. The mean score was 

considerably lower than the end-of-course-curriculum test scores and would be 

considered below passing in most Navy schools. 

The Great Lakes Study. 

In June 1987 the Chief of Naval Education and Training (CNET) designated the 

Electrician’s Mate (EM) ‘A’ School at Great Lakes, Illinois a “model school.” The goal 

was “to apply the best techniques and instructional technologies available... so that we 

will have in place curriculum, technologies, and management techniques which reflect 

the very best we currently know about teaching and learning.” The Navy Personnel 

Research and Development Center (NPRDC) was asked to participate in the model 

school effort. 

Our first research effort was to evaluate EM ‘A’ school Phase I training, which is 

the basic electronics and electricity (BETT) portion of the course. The hands-on 

performance test was given to 44 trainee’s from the first two phases of the course. and 

23 trainee’s awaiting initial instruction. The objective was to determine if phase 1 and 

11 trainee’s could solve practical problems using the knowledge and skills taught in EM 

known of electricity two centuries ago, and tradition dies hard. It makes a poor beginning for 

modem electric circuits. Now you need some knowledge of it for atomic physics. How much you 

see and learn of this part of physics will depend on apparatus, weather, and instructor. On the 

whole the 1~~s the better.” From, Rogers, E.M. 1977, Physics for the inquiring mind. Princeton, 

NJ: Princeton University Press, page 533. 

..----.%-- L . . ., -. 

134 

_ .

‘A’ and what knowledge a typical trainee brings with him to the school. 

All subjects were given the six practical problems to solve. The mean score for the 

two trained groups was 44.6 out of 140 points possible. An additional test item 

developed by the EM school staff accounts for the increase in total possible points. 

The average trainee had difficulty measuring values in simple electrical devices 

using the multimeter. For the most part the trainee’s know how to use the multimeter. 

However, they have difficulty knowing when or where to use it. Further, even after 

completing Phase II training, most trainees are not able to accurately interpret meter 

readings to identify an open or short, which is fundamental to equipment maintenance. 

Trainees did much better recognizing the various electrical components than they 

did measuring them. However, less than 50 percent of the PI and PI1 trainee’s were able 

to identify a capacitor, and a significant number of PI trainees had problems identifying a 

battery, conductor, and resistor.’ After the second phase component identification 

improves significantly. 

Alternative Approaches 

Over the last 40 years a number of alternative approaches to teach BETI’ have been 

developed and tested. This section will summarize some of the more significant work in 

this area. While none of these projects specifically reports cost data, all of them report 

decreases in attrition and set back rate and some report decreases in course length. All of 

the decreases directly relate to cost savings. 

The first test was done in the Navy School of Electronics in 1949 (Johnson 1951). 

The course that was changed was the basic electricity and electronics for radio, sonar, 

and radar maintenance. The results were that the course was shortened from 26 to 18 

weeks, attrition as compared to a control group was reduced by 66% and the setback rate 

dropped significantly. 

The second, was the LIMIT project done by HumRRO in the late 1950s (Goffard, 

Heimstra, Beecroft & Oppenshaw 1960). It reorganized the three week basic electricity 

section of a field radio repair according to job-oriented training (JOT) principles. A 

comparison of conventional students with JOT students showed that the latter achieved 

higher test scores. 

The third was project REPAIR again done by HumRRO in the late 1950s and early 

1960s (Brown et. al. 1959, Shoemaker 1960). The course modified was the entire field 

radio repair course. Approximately 100 students completed the new field radioman’s 

class. When their performance was compared with that of graduates of the traditional 

class, it was found that they were “significantly superior ” to the traditional class in four 

of seven test administered--troubleshooting, test equipment, repair skills, and 

achievement. No improvement was noted on the alignment, manuals, and schematics 

tests. An interesting finding was that the experimental course graduates were superior to 

the standard course graduates on each of the 8 problems that made up the troubleshooting 

test. This is impressive since 3 of the problems involved equipment on which the 

experimental course students received only 4 hours of familiarization training, compared 

with 38 hours of training for each student in the standard course. 

The fourth project was X-ET which was done at NPRDC in the mid 1960s 

(Pickering & Anderson 1966, VanMatre & Steinemann 1966, Steinemann, Harrigan, & 

VanMatre 1967). An experimental electronics technician (X-ET) course was developed 

135 

. .

that differed from ongoing courses in that it accommodated students previously 

considered unqualified in terms of aptitude scores and education. Training was oriented 

towards job skills and minimized nonessential math.ematics and electronics theory. The 

results showed that the X-ETs were taught to required levels of proficiency in a 

substantially shorter time than in the conventional course. In follow up studies of job 

performance it was found that, in general, the X-ETs were performing their duties 

satisfactorily in comparison with a control group and on the basis of ratings by 

supervisors and peers. They were superior to control ETs in troubleshooting, even 

though they scored lower on paper-and-pencil tests of electronics knowledge. 

The fifth project, SUPPORT, applied JOT to the Army’s medical corpsman’s course 

(Ward, Fooks, Kern & McDonald 1970). (This was not a BETI’ revision.) The course 

was changed from a lectured based, theory oriented course to a more job oriented course 

where the content was organized so the relevance of each new topic was readily 

apparent. The evaluation revealed that JOT students performed better than 

conventionally trained corpsmen in 21 out of 26 tests, including both paper-and-pencil 

tests and extensive job-sample, simulated performance tests. In addition, JOT students 

were faster than conventional corpsmen in attending to serious battle field wounds. 

There was a project related to the SUPPORT project that was aimed at extending the JOT 

methods used in the corpsmen training to radio operator training (Goffard, Polden & 

Ward 1970). The findings were that the recycle rate for trainees was reduced by 30 

percent in comparison with the standard course, and attrition was reduced by about 50 

percent. These outcomes were achieved even though the JOT classes were 40 percent 

larger and contained twice as many mental category IV personnel as the standard course. 

The final project was APSTRAT (Weingarten, Hungerland, Brennan, & Allred 

1971). This project was specifically targeted for low aptitude personnel admitted under 

Project 100,000. The findings were that the redesigned Army field wireman’s course had 

35 percent less attrition and that set back rates were cut from 30 percent to zero. 

Future Direction for BETT Training Development 

Based on the findings of the above studies and the results of the recent NPRDC 

hands-on practical tests, below are two alternative approaches to BETT training that 

could be used to make future Navy technicians better equipped to maintain the 

sophisticated weapons systems in tomorrows Navy. 

Develop a job oriented BETT course that is generic to all electrical schools. The 

basic electricity front-end that has been added to the ‘A’ schools would be converted 

from an abstract, mathematics and physics knowledge oriented course to one where jobrelevant 

skills are practiced in a situation of actual use. The current ‘A’ school phases 

would remain the same. The new front-end training would build on the trainee’s 

knowledge of familiar electrical devices to teach basic electrical operation and 

maintenance concepts. The knowledge acquired using these devices should transfer to 

the equipment used in the later phases of ‘A’ school and on-the-job. The new job 

oriented training would stress hands-on trainee performance. Hands-on experience would 

increase from the current twenty-five percent to sixty percent or more of total class time. 

The training would be developed, implemented and evaluated at one electrical school to 

determine the feasibility of implementing it in other electrical schools. 

136

Develop job oriented electricity theory training using equipment and tasks specific 

to each ‘A’ school. The basic electricity theory training would be integrated into the ‘A’ 

school equipment operation and maintenance lessons. There would be no separate frontend 

theory training. The trainee would learn basic electrical operation and maintenance 

concepts on the equipment used on-the-job in the fleet or on reasonable simulations. The 

training would be sequenced so the easier/more familiar devices would be taught first, 

with more difficult concepts and techniques being taught with the more complicated 

devices. For example, for initial basic theory training, a radio receiver at ET ‘A’ school, 

or the small boat lighting system at EM ‘A’ school could be used to teach students basic 

circuit operation, preventive, and corrective maintenance methods. Those simple devices 

could be followed with more complicated devices that have more advanced concepts. ks 

in the generic option the emphasis would be placed on trainees learning hands-on 

practical skills. Laboratory time would increase to allow sufficient time for the trainees 

to become skilled in the application of the theories and concepts learned. 

References 

Brown, G., Zaynor, W., Bernstein, A., & Shoemaker, H. (1959). Development and evaluation of an 

improvedfreld radio repair course. HumRRO-TR-58-59. Alexandria, VA: Human Resources 

Research Organization. 

Goffard. S., Heimstra, N., Beecroft, R., & Opcnshaw. J. (1960). Basic electronicsfor minimally 

qualified men: An experimental evaluation of a method of presentation. HumRRO-TR-61-60. 

Alexandria, VA: Human Resources Research Organization. 

Goffard. S., Poldcn, D., & Ward. J. (1970). Development and evaluation ofan improved radio 

operator course (MOS 05&X7). HumRRO-TR-70-8. Alexandria, VA: Human Resources 


Johnson, H. (1951). The development of more effective methods of training electronics technicians. 

Washington, DC: Working Group on Human Behavior Under Conditions of Military Service, 

Research and Development Board, Department of Defense. 

Pickering, E., & Anderson, A. (1966). A performance-oriented electronis technician training program: 

I. Course development and evaluation. STB 67-2. San Diego: U.S. Naval Personnel Research 

Activity 

Shoemaker, H, (1960). The functiona context method of instruction. IRE Transactions on Education, 

Vol. E-3, no. 2, June 1960,52-57. 

Steinemann, J., Harrigan, R., & VanMatre, N. (1967). A performance-oriented electronics training 

program: IV. Fleetfollowup evaluation of graduates of all classes. SRR-68-10. San Diego: U.S. 

Naval Personnel Research Activity. 

VanMatre, N., & Steinemann, J. (1966). A performance-oriented electronics technician training 

program: II: Initialfleet follow-up evaluation of graduates. STB-67-15. San Diego: U.S. Naval 

Personnel Research Activity. 

Ward, I., Fooks, N., Kern, R., & McDonald, R. (1970). Development and evaluafion of an integrated 

basic combatladvanced individual training program for medical corpsman (MOS PlAlO). 

HumRRO-TR-70-l. Alexandria, VA: Human Resources Research Organization. 

Weingarten, K., Hungerland, J., Brennan, M., & Allred B., (1971). The APSTRAT instruction model. 

HumRRO PP-6-71, Alexandria, VA: Human Resources Research Organization. 

137 

.

USING EVENT HISTORY TECHNIQUES TO ANALYZE TASK PERISHABILITY: 

A SIMULATION 

Spnley~ D. Stephenson 

Southwest Texas State University 

Julia A. Stephenson 

University of North Texas 

Until now, task perishability, the point in time at which a 

~ task drops out of an airmanls inventory of tasks performed, has 

not been researched. This lack of research could be for two reasons. 

First, task performed/not performed is usually of more 

interest than is when a task leaves a job inventory. Second, 

perhaps measurement techniques for determining task perishability 

are either unavailable or unknown. In any event, little is known 

about task perishability. . _ . 

For a variety of reasons, knowledge about task perishability 

would be useful, primarily in training. For instance, the decision 

about when and where to train a task (formal school or OJT) 

could depend on how long the task is going to be used. Perhaps 

the most obvious use of task perishability would be in crosstraining. 

If a task can be determined to have a relative short 

residual life, perhaps training on that task is not necessary, 

even though the task is currently in the job inventory of comparable 

time-in-grade airman. Also, task perishability is 

obviously related to skill decay. A skill can be retained long 

after a task stops being performed. However, if a task perishes 

from an airman's inventory of tasks performed, the corresponding 

skill will eventually leave that airman's inventory of skills. 

Before skill decay can be measured, information about task perishability 

should be known. 

This paper will study the feasibility of measuring task perishability 

using a technique called Event History Analysis. 

There are two major features of event history as it applies to 

task perishability. First, it incorporates time in the analysis; 

;,zoAdhow long did a fa.sk stay in an airman's job inventory? 

It has the ability to handle censored data. Censored 

data i; data on which you have only partial information. For 

example, not all airman complete their first term enlistment. Of 

those who leave early, some will have stopped doing a task, but 

some will still be performing the task. Consequently, information 

about when the task would have left the censored airmen's 

job inventories if they had stayed in the Air Force is unknown: 

however, that the task lllivedll until the point of censoring is 

known. Rather than discarding these censored data, event history 

incorporates the available information and, although incomplete, 

produces more precise estimates of task survivability. 

TO study task perishability with event history techniques, 

this paper used a simulated data base of a type which could be 

derived from the data produced by the USAF Occupational Survey 

Program and other sources. 

USAF Occupational Survey Program 

The USAF job analysis program is frequently referred to by 

the term, CODAP (or TI/CODAP) (Christal & Weissmuller, 1988). 

CODAP usually involves taking a snapshot of the entire work force 

at one point in time; i.e., rather than being longitudinal, the 

data collected is vertical. Consequently, the data do not provide 

information about what an individual airman does over a 20 

138

year career. Instead, CODAP provides information about what all 

airmen are doing in groups such as first term, second term , or 

career enlistees. Also, the survey is administered.to essentially 

100 percent of the career field and produces a response 

rate of over 80 percent. 

Event History Analysis 

Event history analysis enables the researcher to determine 

probabilities associated with the length of time for a binary, 

dependent variable to change states. Another requirement is 

knowledge of the time from the start of the experiment to the 

change in state of the dependent variable. Both the origin time 

and the exact point at which the dependent variable changes must 

be precisely defined. Also, the length of time must always be a 

positive value. The last assumption is that the sample should be 

homogeneous (Cox & Oakes, 1984). 

One of the strengths of event history analysis is the ability 

to include some information concerning censored data. An item is 

considered to be censored if it is removed from the sample before 

the experiment is terminated and the dependent variable has not 

changed states. A second type of censoring occurs if the experiment 

ends before the dependent variable changes. In most parametric 

statistical analyses, such data would have to be omitted 

from the sample. However, the fact that the item had not changed 

at the point of leaving or ending the experiment can provide some 

relevant information that should be incorporated into probabilities 

associated with the time at which the dependent variable 

changes states. 

Several probabilities are associated with event history analysis. 

The failure and the survival functions represent cumulative 

distributions about when the dependent variable changes 

states. Failure is defined as the change in the dependent variable; 

survival is the lack of change. The hazard function represents 

the conditional probability that the dependent variable 

will change states in a specific time period, given that it had 

not changed states in the previous period (Kalbfleisch & Prentice, 

1980). The mean life residual function represents the 

average length that the dependent variable will survive beyond 

the specified time period (Oakes & Desu, 1990). 

All of these functions are related mathematically. E.g., 

once the survival curve is estimated, the mean life residual can 

be determined. The most widely used method for computing the 

survival function is the product limit estimator proposed by 

Kaplan and Meier (1958). 

Method 

At first glance, event history analysis does not seem appropriate 

for examining Air Force occupational data. One problem is 

that the Air Force maintains little data on persons who leave the 

service. Also, the actual time that a person stops doing a specific 

task is not recorded. However, data gathered by occupational 

surveys do meet the required assumptions. 

Event history requires that the dependent variable be 

binary. For task perishability, this translates to whether or 

not a task is being performed. In an occupational survey, 

respondents check if they are performing a task; thus, task performance 

is known. 

The second assumption of event history is that the origin and 

139

___r-.--r--... 

.-.--.- -- 

I- _ - _--._ ..-. .~ 

exact point at which a job leaves a person's inventory must be 

specified. Actually, however, the only information that is necessary 

is the length of time that a person holds a specific task 

in the job inventory. To meet this requirement, a small mental 

transformation of the data is necessary. Occupational surveys 

provide information on the percent members performing a task in 

each time interval. The difference in percent member performing 

over two intervals is in essence a measure of those who have 

stopped doing a task. Therefore, occupational survey data meets 

the two primary assumptions of event history analysis. However, 

a problem is the inclusion of the censored data. While the Air 

Force does have information regarding AFSC attrition rates, 

whether the specific task is in an airman's inventory when he 

leaves the service (attrites) is unknown. 

For the purposes of this study, we generated a 1000 person 

data base. This data base included actual data points for a task 

leaving an airman's job inventory, as well as censored data, 

which simulated those airmen who leave the Air Force prior to the 

task leaving their inventory. While this model is not specific 

to any career field, it does incorporate several facts which are 

intrinsic to the job/career development in the Air Force. For 

instance, many airmen spend up to 12 months in training before 

actually being assigned to a work place. Thus, this model starts 

simulating at the thirteenth month, which is actually the first 

point in time that a task could leave an incumbent's inventory. 

Another consideration is the large change in status at the 48th 

month. At this point many airmen leave the service: of those who 

do continue in the Air Force, some change career fields. This 

change results in many censored data points at the 48th month. 

In summary, single task performance data for an initial set 

of 1000 airman was simulated over a 6 year (72 month) period. 

Using the type of data available from Occupational Survey 

Reports, percent members performing for each month interval were 

created. Censoring was also generated for this simulation. 

Although exact censoring data cannot be determined from current 

Air Force data bases, historical attrition data are available. 

The censored data values can then be estimated from the attrition 

data using the information from current percent members performing 

a single task. A total of 300 (30%) censored data points 

were inserted into the data base using a random number procedure. 

From this simulated data base, three functions were calculated: 

the Survival function, the Hazard function, and the Mean 

Life Residual function. All calculations were performed using 

the Lifetest procedure in SAS. Examples of the survival and the 

mean‘life residual functions are given in this paper. 

Results 

Figure 1 shows the survival function for the simulated data 

base. It represents the probability of an airman at a specific 

time period performing the task. For example, at the 36th month, 

the probability of an airman still performing this task is .54. 

Figure 2 represents the mean life residual function for the 

simulated data base. This function can be interpreted as the 

average length that an airman will be performing the task beyond 

a specific time period. At the 36th month, on average, an airman 

will be performing this task 13.8 more months. 

140 

.

2E 

24 

20 

I@ 

12 

8 

4 

Figure 4 

Mean Life Comparison 

monlna .- --..- - _._ ___ _.____ . _ _. _ __ _ -.-.- .-. ----- 

0 

12 18 20 24 28 32 38 40 44 48 52 68 to 84 T’ 

months 

--- Inalude censors --O 0m1 iill cansors 

Figures 3 and 4 show a'comparison between the data base with 

all 1000 airmen (event history analysis) and the data base with 

700 airmen (i.e., all censored data omitted). The difference in 

the two survival functions (figure 3) is greatest at the 48th 

month, the point at which censoring is heaviest. 

The difference between the two mean life residual functions 

(Figure 4) is greatest at the beginning of the 13th month, basically 

because excluding the 300 censored data points removes some 

information about how long a task is performed. At the 48th 

month the two curves become very similar. Thus, censoring after 

the first term has less of an effect on the mean life residual 

function. 

This data could also be presented in a table format. A portion 

of these functions is shown in Table 1. 

Month 

8; 

38 

39 

40 

Table 1 

Comparison Data 

SUrVlval Function Mean Life ReddUal 

lndude Omit include cmt 

Censora Coneore Censor8 ceneora 

A44 -377 13.307 9.273 

-627 -364 13.264 8.871 

A16 .a40 12.647 8.2U 

.496 .a13 12.078 7.969 

A79 .293 11.472 7.602 

Discussion 

The results of this study show that event history analysis 

can be used to investigate task perishability. Due to the method 

of collecting task data in the Air Force's Occupational Survey 

Program, accurate figures can be obtained for the change in state 

of the binary variable, e.g., task perishability. Historical 

attrition data are available for all career fields. Thus, censoring 

is the only unknown variable, and it can be accurately 

estimated by combining occupational and attrition data. Therefore, 

an appropriate data base can be created for any AFSC. 

The results of the analysis also show the advantage of using 

event history to analyze task perishability. Figures 3 and 4 

vividly illustrate the difference in analyzing task perishability 

141 

. .

,a,: 

t 

/ ; 

Figure 1 

Survival Function 

I-r

using event history analysis, which can accommodate censored 

data, and using conventional analytical procedures, which essentially 

discard censored data. Estimations of both the survival 

and mean residual life functions are more accurate using event 

history analysis. Therefore, the results of this study strongly 

suggest that analyzing task perishability with event history 

techniques should continue to be studied. 

The use of event history analysis to examine occupational 

data, such as task perishability, is a new application of this 

statistic. Thus, several research issues need further examination. 

Of primary concern is the inclusion of censored data. As 

mentioned earlier, the Air Force does not maintain records of the 

tasks performed by persons who attrite. Therefore, determining 

the number of censored data points at each interval will have to 

be modeled. A logical start point would be to use the known 

information on percent (of those who complete the occupational 

survey) members performing as an estimation of the percentage of 

those who attrited but still held the task in their inventories. 

The assumption that 100% of the airmen are performing the 

task at the start of the career field raises a potential theoretical 

issue. However, the math underlying the model is primarily 

based on conditional probabilities, thus deviating from this 

assumption would not seem to have a severe effect on the task 

performance probabilities. Another theoretical question concerns 

the homogeneity of the persons in a particular career field. A 

more accurate analysis of when a task leaves an airman's job 

inventory may be accomplished by sub-grouping the career field 

with a covariate such as present grade, skill-level, or gender. 

Also, some tasks may perish more rapidly for airmen who are in 

their second career field. These and other theoretical issues 

need to be researched. 

An area of interest for further research is task emergence. 

The model set forward in this study could easily be restructured 

to analyze when a task enters a job inventory. A strength of 

this type of analysis is that it would provide information on a 

continuum, by month, instead of chunking by 1st term, 2nd term, 

etc. Perhaps task emergence and task perishment could be linked 

to provide more information on when and by whom a task, or group 

of tasks, is performed in a career field. 

References 

Christal, R. E., & Weissmuller, J. J. (1988). Job-task inventory 

analysis. In S. Gael (Ed.) The job analysis handbook for business, 

industry, and government (Vol II), pp. 1036-1050. New 

York: Wiley. 

cox, D. R. & Oakes, D. (1984). Analysis of survival data. New 

York: Chapman and Hall. 

Kalbfleisch, J. D. C Prentice, R. L. (1980) The statistical analysis 

of failure time data. New York: Wiley. 

Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from 

incomplete observations. American Statistical Association Jour- 

@, 53, 457-481. 

Oakes, D. & Dasu, T. (1990) A note on residual life. Biometrika, 

77, 409-410. 

143 

-.

-__- ..___ -..- 

A FIRST LOOK AT THE EFFECT OF INSTRUCTOR BEHAVIOR 

,IN A COMPUTER-BASED TRAINING ENVIROrJMENT 

STANLEY D. STEPHENSON 

SOUTHWEST TEXAS STATE UNIVERSITY 

Computer-based Training (CBT) research has typically focused 

on comparing a CBT course with a corresponding traditional 

instruction (TI) course. Compared to a similar TI course, CBT 

generally, but not always, produces increases in learning and 

retention while concurrently requiring less time than TI 

(Fletcher & Rockway, 1986; Goodwin et al., 1986; Kuibte;eulik, 

1986, 1987; McCombs, et al., 1984; O'Neil, 1986). 

, CBT 

results have not always been positive; there are many instances 

in which CBT did not produce increases in performance or 

decreases in learning time (Goodwin et al., 1986; McCombs et al., 

1984). In general there has been very little research on maximizing 

performance within a CBT system (Gillingham & Guthrie, 1987). 

Conversely, there is a long history of research on variables 

which influence achievement in TI systems. One of the most 

researched variables is instructor behavior. TI research has 

produced a relatively high degree of consensus as to what an 

effective instructor does versus what a not-so-effective instructor 

does, with effective being defined in terms of academic 

achievement (Brophy, 1986; Brophy & Good, 1986; Rosenshine, 

'1983). Yet, CBT research has neglected the role of the instructor 

(Moore, 1988). Therefore, little is known about whether or 

not TI instructor variables transfer to CBT. 

In one of the few studies which did examine the role of the 

CBT instructor, Moore (1988) found that students who had positive 

teachers scored significantly higher than those in classes with 

negative teacher. McCombs et al (1984) reviewed various early 

CBT courses and found that two factors were critical to the success 

of the CBT courses. These were: (a) adequate opportunities 

for student-instructor interactions, and (b) the incorporation of 

group activities with individualized training. McCombs (1985) 

reviewed the role of the instructor in CBT from a theoretical 

perspective and developed several practical suggestions for 

instructor use. One of her suggestions was that "...instructors 

must have meaningful roles in the management and facilitation of 

active student learning, if the CBT system is to be maximally 

effective" (p. 164). 

As noted from McCombs et al (1984), student-instructor interaction 

was a critical factor with regard to success of a CBT system. 

This is a significant finding since one of the most consistently 

reported positive TI instructor behaviors is frequent but 

short student-instructor interactions; i.e., an increase in student-instructor 

interactions produces an increase in achievement 

(Brophy, 1986; Brophy & Good, 1986; Rosenshine, 1983). Therefore, 

a TI instructor behavior which may transfer to CBT is student-instructor 

interaction. 

The purposes of this study were two-fold. First, this study 

was an attempt to begin to explore the effect of instructor 

behavior on achievement in CBT. Second, this study specifically 

examined the effect of student-instructor interaction in CBT. 

Based on the TI instructor literature, it was hypothesized that 

144 

.

increased student-instructor interaction would produce increased 

achievement. 

Method 

Subjects 

Subjects were 25 (15 female and 10 male) college juniors and 

seniors enrolled in a Business Statistics class. As part of a 

project designed to teach students how to use computer spreadsheet 

software to perform statistical computations, Ss volunteered 

to participate in a spreadsheet tutorial for extra credit. 

The extra credit was awarded for project completion, not for 

project performance. All Ss completed a survey to assess their 

personal computer (PC) literacy. 

Experimental Materials 

The spreadsheet tutorial was part of a larger commercial 

software tutorial package designed for an integrated spreadsheetword 

processing-database program. The tutorial is basically linear 

and learner-controlled; however, Ss did have the capability 

to repeat a lesson if desired. 

For the purposes of this study, the larger tutorial was modified 

to include just the introduction to the integrated package 

plus that portion of the tutorial software devoted to the use of 

the spreadsheet. The introduction portion (Part A) contained 

four lessons, and the spreadsheet portion (Part B) contained 

eight lessons. The tutorials were run on Tandy 1OOOSX PCs. 

An exercise designed to evaluate mastery of the spreadsheet 

tutorial commands was added to the experimental software. Since 

the students were volunteers from a Business Statistics class, 

the exercise used simple statistical concepts as the vehicle for 

evaluating spreadsheet mastery. Consequently, the experimental 

material consisted of a CBT spreadsheet tutorial modified to 

include a statistics-based exercise. 

Procedure 

Ss were randomly assigned by spreadsheet/PC literacy to one 

of two student-instructor interaction modes. Group I (n=13) had 

essentially no instructor-initiated interactions. All Group I 

interactions were initiated by the student and consisted of 

requests by the students for help in overcoming an obstacle in 

the tutorial. Group II (n=12) experienced the same type of student-initiated 

interactions experienced by Group I. In addition 

Group II was exposed to multiple instructor-initiated interactions. 

Both groups worked the CBT tutorial in three sessions. In 

session one, all Ss started on lesson 1A and worked in the tutorial 

for 90 minutes. In the second session, all Ss started on 

lesson Bl and worked through the last lesson, B8. In the third 

session, all Ss started on lesson B3 and again worked through the 

last lesson, lesson B8. Consequently, all Ss had a single exposure 

to lessons Al though A4 and repeated exposure to lessons Bl 

through B8. Also, since each S went at his/her own speed, Ss' 

total time on task varied. At the completion of lesson B8 on day 

3, all ss were given an exercise designed to evaluate their mastery 

of the tutorial material. Ss had 30 minutes to work on the 

exercise. 

During the startup period of the project (i.e., the first 15 

minutes of the first session), the instructor responded to all 

145

_.. . __-..-_- ._.. ~. _ _, 

questions in both groups to insure that the Ss were properly 

logged into the tutorial. For both groups, the instructor also 

responded to all student-initiated interactions with one or more 

of three responses: (1) "Try pushing the [ESCAPE] key;" (2) "Try 

pushing the [SPACE] bar;" or (3) "Re-boot the system and start 

over." These suggestions were given in sequence; e.g., if "Try 

pushing the [ESCAPE] key," did not work, then the S was told to 

"Try pushing the [SPACE] bar." For Group I Ss, these suggestions 

were the only instructor interactions experienced after the first 

15 minutes of session one. In a sense, Group I's instructor performed 

an impersonal, course administrator role. 

In addition to the interactions listed above, Group II Ss 

also experienced instructor-initiated interactions. In the first .. 

session, the instructor initiated four interactions with each S. 

In session two and three, the instructor initiated three and one 

interactions, respectively. These interactions were related to 

location of keys on the Tandy keyboard. E.g., shortly before 

needing to use the Back Slash (\) key, the instructor would tell 

the students where that key was located. Key location was 

explained and diagrammed in instructions given to all Ss, but for 

most students key location on the Tandy keyboard was a minor 

problem due to previous exposure to an IBM keyboard. Instructorinitiated 

interactions lasted between 5 and 10 seconds. 

It should be noted that in no instance did the instructor 

provide information which was not available to the student elsewhere. 

Also, in no instance did the instructor comment, provide 

feedback, or give praise on the student's performance on the 

tutorial. 

Dependent Measures 

Two dependent measures were recorded. First, the Ss' performance 

on the exercise were scored. Second, Ss also recorded 

which spreadsheet commands they used. Since most procedures can 

be performed in more than one way (e.g., a cell entry can be 

changed via an EDIT command or by simply re-typing the entry), 

this second measure was recorded to assess how many different 

spreadsheet commands were actually used during the exercise. 

Results 

Means and standard deviations for Spreadsheet Performance and 

Use of Spreadsheet Commands are given in Table 1. Due to the 

small sample sizes (and possible problems with the assumption of 

normality), the Mann-Whitney U non-parametric test statistic was 

used to analyze differences between Group I (no instructorinitiated 

interaction) Ss and Group II (instructor-initiated 

interaction) ss. 

Table 1 

Spreadsheet Performance and Use of Spreadsheet Commands 

Means and Standard Deviations 

Spreadsheet Performance 

Mean SD 

Group I (No Interaction) 58.000 18.257 

Group II (Interaction) 72.417 7.403 

Use of Spreadsheet Commands 

Mean SD 

Group I (No Interaction) 32.308 7.250 

Group II (Interaction) 30.833 8.483 

146

Exercise Performance 

Group II (instructor-initiated interaction) Ss significantly 

out performed Group I (no instructor-initiated interaction) Ss 

(Mann-Whitney U = 34.50, p < .017). 

Use of Spreadsheet Commands 

There was no difference in command usage between Group I Ss 

and Group II Ss; (Mann-Whitney U = 82.00, p < .824). 

Sex Differences 

Sex differences were not significant (for Spreadsheet Performance, 

Mann-Whitney U = 56.00, p < .289; for Use of Spreadsheet 

Commands, Mann-Whitney U = 69.50, p < .755). 

Discussion 

The hypothesis that increased student-instructor interaction 

would lead to increased achievement was supported. Given the 

limited length of the CBT program used in this experiment, the 

degree of difference of increased achievement between the two 

groups was surprising. For some reason, having the instructor 

interact with/take notice of/care about the student affected the 

student to the point where it increased his/her achievement. 

The underlying cause for the difference in achievement did not 

seem to be knowledge. All Ss seemed to "learn" the commands presented 

in the tutorial; there was no difference between groups in 

the number of commands used to solve the exercise. The difference 

was in how well the commands were used. 

Nor was the difference in achievement due to praise or 

feedback. Neither group received praise for their performance. 

Unless relatively brief human interaction is defined as praise, 

praise was not a factor in this study. Extra credit for higher 

performance on the exercise also was not a factor; all Ss 

received the same amount of extra credit regardless of their performance. 

A clue as to why Group I Ss did not perform as well as Group 

II Ss comes from observations made by the Group instructor. It 

seemed that Group I Ss used the space bar more frequently than 

did Group II Ss. In this study's tutorial, Ss had the capability 

to literally space-bar their way through the tutorial. I.e., 

rather than actually performing the requested tutorial action, Ss 

could depress the space-bar and step through the program. 

Although not measured, Group I Ss seemed to take this approach 

more frequently. Consequently, while both groups were equally 

exposed to the material, Group II Ss seem to actually perform the 

tutorial more. This -space-bar' behavior could account for the 

difference in achievement. The difference in standard deviation 

between the two groups could also be a result of the reduced 

practice by Group I Ss. 

If the explanation offered above is accurate, it suggests 

that brief human interaction serves to keep students on task more 

so than no human interaction. If no one is aware of what I am 

doing, I am more likely to try to ease my way through the CBT 

course. However, if someone is aware of what I am doing, irrespective 

of whether or not that someone gives me praise or feedback, 

then I had best stay on task. 

Due to the manner in which the Group II interactions 

occurred, instructor monitoring of the students was confounded 

with interaction. For the instructor to know when to interact 

147 

. . 

f

--.--- --- 

. -. ._ . .-. ._. __-_ _. 

with an appropriate comment, the instructor had to know when a 

student was approaching a particular point in the tutorial. In 

order to know this, the instructor had to constantly monitor the 

students' progress. Consequently, while the Group I instructor 

sat at a desk and waited for students to request assistance, the 

Group II instructor was constantly walking around the room and 

visually checking on where Ss were in the tutorial. Therefore, 

it may be that monitoring, and not interaction, was the basis for 

Group II's higher achievement. 

These results add to the results reported by Moore (1988) who 

found that, even in CBT, teachers with positive attitudes produced 

higher achievement than teachers with negative attitudes. 

Evidently, instructor interaction can also affect achievement.‘ 

Whether or not the interaction needs to be tied to course content 

is unknown. In this study, the instructor's comments were not 

content-based. Therefore', it may be CBT instructors should 

interact with students in order to maximize achievement, but that 

the interactions may not need to be related to the material being 

covered. 

Implications 

The relatively short-term nature of the tutorial used in this 

experiment obviously limits the generalization of this study's 

results. That limitation not withstanding, the specific conclusion 

from this study is that brief instructor-initiated interactions 

can increase achievement in CBT. However, instructor monitoring 

without interaction may produce the same result. 

Since the role of the instructor in CBT is frequently undefined, 

the results from this study give some direction as to what 

a CBT instructor can do to influence achievement. Moreover, 

since instructor-initiated interactions are controlled by the 

instructor, these interactions can be both built into the larger 

learning system (which includes the CBT subsystem) and also 

included in the instructor evaluation system. 

A larger implication from this study is that instructor 

behavior does seem to influence achievement in CBT. The results 

obviously support Moore's research (1988) and McCombs (1985) suggestions. 

There is simply something about having another human 

around and aware of your actions that alters your behavior. Even 

in the best designed, best built, and best implemented CBT systems, 

instructor behavior may still influence achievement. 

Rather than trying to design a CBT system which does away with 

the instructor (or to design a system which essentially ignores 

the instructor), CBT developers should try to find ways in which 

to use instructor presence to maximize achievement. 

References 

Brophy, J. E. (1986). Teacher influences on student achievement. 

American Psycholoqist, October, 1069-1077. 

Brophy, J. E. & Good, T. L. (1986). Teacher behavior and 

student achievement.- In M. c. Wittrock (Ed.), Third Handbook 

of research on teaching: 328-375. New York: Macmillian. 

Fletcher, J. D., & Rockway, M. R. (1986). computer-based educa- 

tion in the military. In J. A. Ellis (Ed.), Military contribu- 

tions to instructional technoloqy 

Gillingham, M. G., & Guthrie, J. 1. 

148 

New York: Praeger. 

(1987). Relationships 

. .

etween CBT and research on teaching. Contemporary Educational 

Psycholoqy 12, 189-199. 

Goodwin, L. D., Ghodwin, W. L., Nansel, A., & Helms, C. P. 

(1986). Cognitive and affective effects of various types of 

microcomputer use by preschoolers. American Educational 

Research Journal, 23, 348-356. 

Kulik, C. C., & Kulik, J. A. (1986). Effectiveness of computerbased 

education in colleges. AEDS Journal, Winter/Spring, 

81-108. 

Kulik, J. A., & Kulik, C. C. (1987). Review of recent 

research literature on computer-based instruction. Contemporary 

Educational Psycholoqy 12, 222-230. 

McCombs, B. L. (1985). Instruhtor and group process roles in . 

computer-based training. Educational Communication and Technology 

Journal, 33, 159-167. 

McCombs, B. L., Back, S. M., & West, A. S. (1984). Self-paced 

instruction: Factors critical to implementation in Air Force 

technical training - A preliminary inquiry. (AFHRL-TP-84-23). 

Lowery Air Force, Base, CO: Air Force Human Resources Laboratory, 

Training Systems Division. 

Moore, B..M. (1988). Achievement in basic math skills for low 

performing students: A study of teachers' affect and CAI. -The 

Journal of Experimental Education, 5, 38-44. 

O'Neil, H. F., Anderson, C. L., & Freeman, J. A. (1986). 

Research in teachina in the Armed Forces. In M. C. Wittrock 

(Ed.), Third handbook of research on teaching: 971-987. New 

York: Macmillian.' 

Rosenshine, B. (1983). Teaching functions in instructional 

programs. The Elementary School Journal, 83, 335-351. 

149

Transfer of Training with Networked Simulators' 

David W. Bessemer 


Field Unit-Fort Knox, Fort Knox, Kentucky 

The Armor Officer Basic (AOB) Course in the Fort Knox Armor 

School includes three weeks of tactical instruction followed by 

ten days of Mounted Tactical Training (MTT) in the field. During 

MTT, students rotate among tank crew and unit leader positions as 

they perform platoon mission exercises. Late in 1988, two days 

of similar training in networked tank simulators were added just 

before the MTT. Additional platoon movement training using 

wheeled vehicles also began with the next class after simulator 

training started. These changes set up a quasi-experimental 

comparison between basel.ine classes that graduated before the 

changes and later classes that received added training. Student 

records provided performance measures in an interrupted timeseries 

design (Cook & Campbell, 1979) that permitted transfer 

from simulator training to field performance to be assessed. 

The simulator networking (SIMNET) system used for the AOB 

training was produced as a test-bed for Defense Advanced Research 

Projects Agency R &I D on technologies capable of large-scale 

interactive simulation of land combat. Training devices using 

these technologies could provide increased collective training 

for units, while reducing factors such as cost, time, and maneuver 

space that now restrict combined arms training. However, 

unit training in simulators must be shown to be effective to 

justify further development and acquisition of networked simulator 

training devices. Evidence supporting the effectiveness of 

SIMNET training for some platoon tasks was obtained in a test 

with a small number of units (Gound 61 Schwab, 1988). Results 

reported here supplement the previous findings by specifically 

examining officer training for platoon leadership. An important 

issue in interpreting the results was whether SIMNET training 

caused the observed effects or if other factors contributed, such 

as the added wheeled vehicle training. 

Samole 

Method 

One group of 1098 students were enrolled in 24 AOB classes 

graduating in a 68 week baseline period. Another group of 607 

students were from 12 later classes in a 33 week period after 

tactical training in SIMNET was added to the course. There were 

one to five student platoons in a class, adding up to 71 platoons 

'The views , opinions, and findings contained in this paper 

are those of the author, and should not be construed as the 

official position of the U.S. Army Research Institute or as an 

official Department of the Army position, policy, or decision. 

--. 

150 

_,

in baseline classes and 39 platoons in SIMNET-trained classes. 

Platoons were supervised by a group of 16 officer and senior 

noncommissioned officer (NCO) instructors, called Team Chiefs, 

each assisted by a team of NC0 tank crew instructors. Every 

platoon had one Team Chief guiding all of its tactical training. 

Eauioment 

SIMNET Traininq. Training was conducted in the Combined 

Arms Tactical Training Center (CATTC) at Fort Knox that houses 

the SIMNET system. AOB classes used four Ml tank modules per 

platoon with a terrain data base portraying the Fort Knox areas 

used for AOB,field training. Vehicle crews operate SIMNET 

modules interactively through a local area computer network (LAN) 

in a manner similar to real vehicles. Scenes shown in simulated 

sights and vision blocks respond to control inputs to create the 

illusion of moving and fighting on the battlefield. Crews can 

detect and shoot enemy vehicles, and communicate both within the 

crew and to other vehicles and organizations. Operating together I 

as a unit, crews can use many standard tactical techniques to 

execute a combat mission. 

Field Traininq. Each AOB student crew in SIMNET-trained 

classes (except for the first such class) used High Mobility 

Multi-Purpose Wheeled Vehicles (HMMWVs) for some MTT-like preparatory 

training on cavalry operations. All student crews used an 

M60A3 tank (U.S. Department of the Army, 1979) and basic issue 

items furnished with the tank during MTT. 

Trainina Procedure 

SIMNET Exercises. In the first day of simulator training, 

the students were introduced to the operation of SIMNET tank 

modules, and conducted a tactical road march mission as a tank 

company. Platoons.then practiced techniques of movement and 

battle drills, and performed a movement to contact mission 

against static unreactive target vehicles placed on the terrain. 

Two force-on-force (FOF) exercises were completed on the next 

day, with pairs of platoons alternating in offensive and defensive 

roles. For every exercise, the platoon Team Chief selected 

two students to act as platoon leader and platoon sergeant. The 

Team Chief gave these students a company-level mission order, and 

allowed them about an hour to plan and prepare the platoon 

mission. The Team Chief controlled the execution of the mission 

by acting in the role of company commander. After an exercise, 

the Team Chief led an after-action review (AAR) in which the 

platoon assembled to discuss strong and weak points exhibited in 

planning and executing the mission. After a FOF exercise, the 

opposing platoons met for a joint AAP. 

Field Exercises. Students platoons completed from two to 

four on-tank exercises per day during MTT. For several days the 

exercises were relatively elementary, gradually increasing in 

complexity and difficulty. Initially, the exercises consisted of 

151

oad marches and unopposed cross-country movement. Then there 

were several movement to contact and other simple offensive 

missions with light simulated enemy contact. Defensive exercises 

began the relatively advanced level of training. Complex offense 

and defense mission exercises were intermingled in the later days 

of the MTT. The student platoons were in the field continuously 

during the lo-day MTT training period. The students' positions 

in crews were rotated frequently, and new individuals were chosen 

to serve as platoon leader, platoon sergeant, and TCs after most 

exercises. Usually each student served once in both the platoon 

leader and platoon sergeant positions during the MTT in either 

order. The sequence of events in field exercises was like that 

used in SIMNET. The Team Chief gave a company mission order, and 

then the leader and sergeant planned, prepared, and executed the 

platoon mission under the command of the Team Chief. 

Measures 

Crew instructors rated performance of students acting as 

platoon leaders or platoon sergeants in the field exercises, with 

final review and approval of the ratings by the Team Chief. 

Elements of planning, movement and control, and conduct of the 

operation were rated on a three-point scale. More than 80% of 

the ratings were in the middle (average or satisfactory) category, 

showing a strong central bias. Ratings coded as 1, 0, and -1 

were averaged for 17 items, omitting items judged "not applicable" 

in a particular exercise, to form a field performance index 

ranging between +lOO, with zero set at the middle scale category. 

The number of ratings was also used to indicate the relative 

number of field exercises completed in a platoon. The number of 

ratings roughly corresponds to twice the number of exercises. 

Separate counts were made for the categories of elementary 

movement and contact exercises, and for advanced exercises. 

At course graduation, the crew instructors evaluated overall 

leadership qualities exhibited by the students during the platoon 

tactics phase of the AOB course. Team Chiefs also reviewed and 

approved these Comprehensive Student Evaluation ratings. The 

ratings showed a strong ceiling effect, with over 90% of the 

ratings judged in the highest category (*'yes,@@ indicating the 

student possessed the rated quality) on a three-point scale. The 

platoon average percentage of lo items given the llyesll rating was 

used as a measure of graduate quality. The inverse sine transformation 

was applied to the percentages before analysis. 

Statistical Analyses 

A quasi-experimental comparison of time trends between the 

baseline and SIMNET-trained groups was used to assess transfer 

effects from the added training. The date of graduation for each 

Class was the main independent variable in regression analyses. 

The effects of primary interest were changes in intercept and 

slope Of the trend over time from those shown by the baseline 

152 

-'

--. ..____ s 

platoons. Team Chiefs, coded as dummy control variables, were 

used to partial out differences in platoon averages associated 

with instructor teams. Other variables in the analysis of field 

exercise ratings were leader position and day of MTT. Effects of 

these variables were found to be independent of the time trends, 

and are not presented here. See Bessemer (In publication) for 

further details on the analyses and statistical results. 

Results 

In baseline AOB classes, the number of movement evaluations, 

but not contact evaluations declined for elementary exercises, as, _. 

Figures 1 and 2 show. The total elementary evaluations in Figure 

3 combine these categories. The baseline change reflects efforts 

made to conserve training resources. Contact evaluations were 

reduced further in classes with SIMNET AND HMMWV training. In 

contrast, for baseline classes in Figure 4, evaluations counted 

in advanced exercise showed no trend. These evaluations then 

increased in number after the added training began. Thus, SIMNET 

and/or HMMW training produced some immediate savings in the 

amount of elementary MTT training, which then was replaced by 

more advanced training exercises in the later AOB classes. 

The effect for field ratings shown in Figure 5 was like that 

for advanced evaluations. Average student ratings across classes 

gradually increased after the SIMNET and HMMW training was added 

to the AOB course, indicating positive transfer to performance in 

the student's initial MTT exercise emerged in later classes. 

For the graduate quality measure, the best-fitting trends 

shown in Figure 6 were not quite significant. Results for the 

first SIMNET-trained class are aberrant owing to a change in the 

wording of the rating scale in the next class. Omitting the 

first class after the baseline, a rank-sum test showed that 

graduate quality increased significantly in later classes. 

Discussion 

The tactical training added to the AOB Course was associated 

with three major effects. First, elementary contact exercises 

conducted in the MTT decreased in number, and were gradually 

replaced by additional advanced exercises involving defense and 

offense missions. Second, positive transfer in terms of improved 

field exercise performance in the MTT emerged gradually after the 

pre-MTT training was expanded by SIMNET training and HMMWV field 

exercises. Third, there were indications that the transfer 

effect persisted to enhance the judged quality of AOB graduates, 

at least for the last classes examined. Careful consideration of 

several possible confounding factors led to the conclusion (see 

Bessemer, In publication) that SIMNET training, rather than HMMWV 

training, was largely responsible for the observed transfer 

effects. The gradual emergence of these effects over an extended 

time was interpreted as reflecting the accumulation of instructor 

experience in using SIMNET to train platoon tactics. 

153

1 . 

0 "."'," 

0 lo 20 30 40 

. 

I � �� 

, 

A 

3 

SIMNIT TRAINING 

. 

� � 

WEEK FROM 1 JANUARY 1988 

� 

. 

* . 

�� 

�� 

� � 

Figure 1. Adjusted number of 

performance evaluations per 

platoon for AOB students in 

movement exercises during MTT. 

20 - 

r _ 

0 I= ‘8 . 

2 16 12 14- - - - 

8 1D 3. 

6 

m B- 

$ b- 

+- 

2- 

BASELINE 

,::: i : 

. 

. 

TRAINING 

. 

. 

. 

SIMNET TRAINING 

:; 

0 IO 20 34 40 50 60 70 

WEEK FROM I .bNUARY 1988 WEEK FROM 1 JANUARY 1968 




elementary MTT exercises. 

u 

u 

12 

BASELINE TRAINING SIMNfl- TRAINING 

01 8 I I I I I 

0 10 20 Jo 40 50 50 70 bD 

WEEK FROH 1 JANUARY 1988 




contact exercises during MTT. 

4B 

44 

BASELINE TRAINING SIMNET TRAINING 




advanced MTT exercises. 

This evidence for positive transfer helps the Army to show 

that its investment in networked simulation devices has value for 

officer school training. More importantly, these findings have 

significant general implications for how the Army conducts device 

training effectiveness tests, and how it uses devices. The value 

of training devices may be seriously underestimated in tests if 

trainers are not allowed sufficiently extended experience to 

learn how to train effectively using the device. Instructors in 

many tests have only been taught to operate the device, and have 

trained few soldiers on the device before training the test 

154 

. 

.

._ 

50 - BASELINE TRAINING 

z . 

WEEK FROM 1 JANUARY 1988 

SlMNElT TRAINING I 

Figure 5. Adjusted mean 

performance rating by platoon 

for AOB students in their first 

exercise rated during MTT. 

Rating limits are +lOO. 

�� SlMNEl 

TWINING 

01 fi 3 ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ *‘I ’ ’ ’ ’ ’ 

-20 -lo 0 10 20 JO 40 M 60 70 00 

WEEK FROM 1 JANUU?Y 1988 

Figure 6. Adjusted mean arcsine 

percentage of items rated "yesI' 

on the Comprehensive Student 

Evaluation for AOB platoons. 

Angle limits are fn/2. 

sample. Quasi-experimental test designs can help overcome this 

problem, as well as limited statistical power imposed by small 

sample size. Many military training exercises are performed 

repeatedly by units in an annual training cycle. Collection of 

training records and appropriate performance measures can provide 

a large sample of baseline data to compare with results achieved 

with new training devices. 

The full benefits of training will not be obtained from 

fielded devices without consistently giving every trainer adequate 

experience to learn how to train most effectively. Turnover 

in unit trainers, and infrequent device use are factors that 

work against keeping instructor experience at a high level, and 

reduce the potential effectiveness of device training. 

References 

Bessemer, D. W. (In publication). Transfer of SIMNET trainins in 

the Armor Officer Basic Course (AR1 Technical Report). Alexandria, 

VA: U.S. Army Research Institute for the Behavioral 

and Social Sciences. 

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: 

Desian and analvsis issues for field settinos. Chicago: Rand- 

McNally. 

Gound, D., t Schwab, J. (1988, March). Concert evaluation 

proaram of Simulation Networkina (SIMNET) (Final Report, 

TRADOC TRMS No. 86-CEP-0345). Fort Knox, KY: U.S. Army and 

Engineer Board. (Available from Commander, U.S. Army Armor 

Center and Fort Knox, ATTN: ATZK-DS, Fort Knox, KY 40121-5180) 

155 

. .

CONTINGENCY TASK ‘IRAiNING SCENARIO GENERATOR 

1Lt Todd S. Dart 

2Lt Jody A. Guthals 

Ma? Timothy M. Bergquist 

Air Force Human Resources Laboratory 

INTRODUCTION 

The Contingency Task Training (CTT) project was directed at determining 

critical skills necessary in wartime or during mid to low-intensity conflicts. 

Subsequently , this knowledge would be used for training. The Air Force Human _. 

Resources Laboratory (AFHRL) was tasked with developing the methodology at the 

request of Headquarters Air Training Command (HQ ATC) and the U.S. Air Force 

Occupational Measurement Center (USAFOMC). The concept for CTT originated in a 

study performed by CSAFOMC in 1979, entitled the Air Base Ground Defense Tactics 

Analysis. A task survey for security police (SP) personnel was combined with a 

simple scenario in order to determine which tasks are more important during a 

given situation. The study was highly effective in restructuring the S? field, 

so much so that HQ ATC requested the technology be developed for combining task 

surveys w: th contingency scenarios. USAFOMC in turn produced Request For 

Personnel Research (RPR) 84-02, ‘Contingency Task Training Requirements.’ asking 

AFHRL to develop and validate the contingency technology. 

AFHRL started the CTT project in 1988. In order to develop, test, and 

validate scenarios for use with task surveys, the project was divided into two 

phases. Phase I was the development of the scenario generation technology. 

Phase II involved coupling scenarios to task surveys. The scenario and task 

survey would then be sent to senior noncommissioned officers (NCOs) w!lo would 

review the scenario and rate each task listed for their respective jobs as to 

training emphasis. The results would then be validated against Specialty 

Training Standards (STS) which list skills each airman is to be instructed in tc 

reach certain levels of proficiency. Some of the skills in the STS are marked 

with an asterisk signifying tasks to be taught during wartime. All other tasks 

not ma 1; 2. e 2 are to be dropped from instruction. The method of choosing xhich 

tasks to mark has always been left up to the senior NCO. In the past, marking 

wartime 5kills has been done at the last minute during course re-evaluation. 

Also, mart:+? .T~?‘!!s have never been validated. 

T::i: purpose of the CTT project was to provide a method to validate wartime 

skills. AFHRL undertook the task of creating scenario generation technology and 

subsequent validation via task surveys. AFHRL has now completed Phase I of the 

project (Dart, 19901 . The Phase II task survey will be performed by USAFOMC. 

CRITERIA 

HO ATC and USAFOMC were consulted to determine exactly what the scenario 

generator should comprise. Initial research indicated a need for a scenario 

generator able to generate both natural disaster scenarios and conflict/wartime 

scenar i 0s. Later, the focus changed to conflict/wartime scenarios only. A 

disaster scenario generator used in developing training for a disaster situation 

occurring infrequently in a small region would not be cost effective. The 

technology should concentrate on the mission of the Air Force, national defense, 

and the implementation of U.S. Armed Forces as part of national policy. 

156

The scenario must be short, concise and realistic. A poorly written 

credible scenario is better than a well written unbelievable one. According to 

experts in scenario generation, a scenario shouid provide the minimum amount of 

information to describe the situation (deLeon, 19731. People can only absorb a 

finite amount of data, and fine detail may distract the reader from the overal 1 

intent of the scenario. Only critical scenario variables had to be selected. 

The scenario descriptions are intended ior use with task surveys; hence, they 

must ‘paint’ a conflict situation with application to all Air Force Specialties 

(AFSsl . 

Consideration of the user group is also important. The CTT scenario 

generator is intended for use by people inexperienced with creating a scenario. 

Additionally, the primary user group, USAFOMC, is relatively small. 

To ensure necessary scenario generation guidelines are followed, and .. 

because of user inexperience, the scenario generator should be automated. The 

optimal design, with users in mind, would be a small program operable on DOS 

compatible microcomputers. The inclusion of an on-line help which would provide 

definitions to all conting;ncy variables was also deemed important. 

APPROACH 

Initial Research Existing scenario generators were investigated prior to 

any development. Typically, scenario generators are used in war games. They 

mainly deal with overall battle management as opposed to individuals, 

Therefore, the standard scenario generator used for combat tactics was of little 

to no use for CTT. 

A preliminary scenario design system was being developed by the U.S. Army 

for training intelligence gathering skills to intelligence officers. The Low 

Intensity Conflict Study Group of the U.S. Army Intelligence Center and School 

at Fort Huachuca, AZ developed a non automated scenario generator for creating 

low-intensity conflict (LIC) scenarios (Smiley, 19891. Since the material 

suited the needs of the CTT project, permission was obtained to use the 

variables in the CTT scenario generator. 

The Army’s material was appropriate for use in a LIC scenario. Future 

warfare is forcasted to be primarily in the LIC arena, but may also include 

‘normal ’ or high-intensity conflicts and mid-intensity conflicts such as 

Vietnam. The CTT scenario generator enhanced the Fort Huachuca version to 

include variables pertinent to all levels of combat. Also, certain definitions 

and variables were modified to apply directly to the Air Force and its mission. 

The different levels of conflict intensity provided structure for further 

Scenario generator development. Definitions of high, mid, and low-intensity 

conflicts were extracted from Army FM 100-20, and are listed in Table 1. 

Numerous other sources also provided input into the scenario generator 

design. Work done by AFHRL Logistics and Human Factors Division (LR) , called 

the Combat Maintenance Capability, provided information on collecting 

contingency skills information (Dunigan et al, 19851. They had developed a 

methodology to determine wartime maintenance tasks. Maintenance specialists 

we,re asked to indicate work unit codes (WUCl used to indicate repairs performed 

on aircraft. The scenario was set at Hahn AB, Germany during a Warsaw Pact 

offensive. Their study provided information helpful to contingency scenario 

design and task data collection. The Combat Maintenance Capability study 

evaluated several computer models, the most notable being the Logistics 

Composite Model (LCOMl, the Theater Simulation of Airbase Resources (TSAR), and 

the Theater Simulation of Airbase Resources inputs using AIDA (TSARINA). 

157

_1~,-.._I_ ---.. _. _ _.-. 

HIQH INTENSITY CONFLICT is war between two or more nations and their 

respective allies, if any, in which the belligerents employ the most modern 

technology and all resources in intelligence; mobility; firepower (including 

nuciear, chemical, and biological weapons 1 ; command, control, and 

communications: and service support. 

MID-INTENSITY CONFLICT is war between two or more nations and their 

respective allies, if any, in which belligerents employ the most modern 

technology and all resources in intelligence; mobility; firepower (excluding 

nuclear, chemical, and biological weapons1 ; command, control, and 

communications; and service support for limited objective under definitive 

policy limitations as to the extent of destructive power that can be 

employed or the extent of geographic area that might be involved. 

LOW INTENSITY CONFLICT is internal defense and development assistance 

operations involving actions by U.S. combat forces to establish, regain, or 

maintain control of specific land areas threatened by guerrilla warfare, 

revolution, subversion, or other tactics aimed at internal seizure of power, 

Table 1. CONFLICT DEFINITIONS 

Additional information on LIC scenarios came from the Army-Air Force Center 

For Low Intensity Conflict (CLIC) and the Joint Warfare Center. 

Work to determine medical wartime tasks is being done by the Medicai 

Wartime Hospital Integration Office (MWHIO) at Fort Detrick, MD. The project, 

titled WARMED, is designed to determine the critical wartime skills needed by 

medical personnel (Meinders, 19871. Concerns by V!ARM!D directors that the CTT 

project would overlap their own results and recommendations led AFHHL to avoid 

the medical field entirely in the scenario generator design. 

Other information sources included Air *Training Command Office of Wartime 

Plans (ATC/DPXl and the Headquarters Air Force Management Engineering Agency 

(HQ AFMEA) , Wartime Manpower Division. HQ AFMEA was concerned about common 

tasks, those critical for a wartime situation yet performed by all AFSs . For 

example, personnel in any specialty should know the tasks required for the 

donning of protective chemical gear. Most common tasks are survival skills in 

which everyone should be trained. The Air Force, while training some common 

tasks, does not have an active program of ensuring cannon tasks are learned and 

maintained by all personnel. The Army does have such a program and routinely 

tests all soldiers’ skills listed in a series of pamphlets appropriately 

entitled the Soldier’s Manual of Common Tasks (STP 21-l-SMCT, 1987). 7 h e 

concept of common tasks and peacetime tasks is best illustrated in Figure 1. 

Preliminary Design The result of the literature review and consultations 

was a manual scenario generator consisting of several categories with pertinent 

variables. A variable dictionary was also c,reated to aid in choosing the 

correct variables for a scenario. Variables not found in the Army’s LIC 

generator but included in the CTT scenario generator are factors describing the 

environment in more detail. Choice of variables was based on deleon’s w:rk 

which recommended appropriate material to include in any scenario. 

Once a written version of the scenario generator was developed, t 1: e 

necessity and later feasibility of an automated process became apparent. A 

computer program, designed to create the scenario, greatly enhances 

and consistency of scenario generation. 

the . sreed . 

158

I 

PEACiTIME 

TAIKI 

I 

,......_.,,....,., . . . . . ,..-- _ ,-( 

1 

CONTINOENCY 

TASK8 

LOW MID HIOH 

INTENBITY CONFLICT INTENSITY CONFLICT INTENIITY CONFLICT 

CONVENTIONAL OONVENTIONAL 

- 

t 

efOL0QlcAL 

CHEMICAL 

Figure 1. AFS Task Relationship8 

NUCLEAR 

BIOLOQICAL 

CHEMICAL 

Scenario Generator Pascal was selected as the programming language since it 

provided the necessary versatility and relative ease of use. The program was 

written with the use of Turbo Pascal version 4.0 and is very simple to use. T h e 

program operation speed increases when it is loaded onto a hard drive or RAM i 

drive. It will run on any IBM (DOS) compatible microcomputer with 512X RAM. A i 

color monitor is recommended but not required. The program consists of 2 files 1 

and is easily contained on a single 360X floppy disk. ; 

The main program, CTT.EXE, is a menu driven program. It presents all the 

variable categories developed for the manual version. In a step-by-step process 

beginning with the selection of intensity level, variables categories are 

presented with the specific variables listed for user selection. Table 2 

provides a listing and description of the variable categories in the. program. 

There are nine categories possible for High Intensity conflict and eight for rr,:d 

intensity. Low.intensity conflict has twelve categories, four more than for mid 

intensity, due to the complex nature of LICs. 

Variables are entered one at a time and stored in the computer until the 

last variable is selected, whereupon all variables are placed in a standard 

scenario format. In most cases, the variables are listed in a sentence format 

with no additional information. However, in a few of the variables additional 

information is drawn from a small ‘library’ within the program and displayed in 

the final scenario. This feature serves to enhance the quality of the scenario 

produced. Time constraints prevented displaying additional information for all 

variables, although such an improvement is recommended in future development. 

The second file, CTT-REV.HLP, contains variable definitions. This on-line 

‘help’ function is a useful feature of the program. When accessed, it provides 

a complete definition of all variables listed in the scenario generator. 

Supplementary information for many of the variables is also. This file must be 

loaded on the same disk with the scenario generator program to be accessed. 

The program will allow the user to generate low, medium and high-intensity 

conflicts. Special emphasis is given to low intensity conflicts as they are 

typically the most intricate. Interestingly, the low-intensity conflict is 

rapidly becoming the most common conflict the U.S. will face in coming years. 

159 

. 

, 

/

-Conflict Intenelty: choice of the degree of conflict intensity. 

-Highest NBC Threat Level (High Intensity Only): description of the 

amount of nuclear, biological, and chem:cal clothing worn. 

-Attrition: the amount of critical personnel and equipment 

damaged/wounded and destroyed/killed per month. 

-Logistics: the amount of critical supplies required to perform the 

mission which actually reach the combat area 

-Environment 

-- Area and Size: the size or area of operations. 

-- Sub-Terrain: choice of three areas of man-made environments. 

-- Terrain: choice of terrain combined with season presents a 

detailed climatic description. 

-- Season: choice of four seasons 

-Mission Duration: the length of time the scenario will last. 

-Command And Control: presents choices for particular commands or joint 

operation. 

-Low Intensity Conflict Only 

-- LIC Situations: eight of the most common types of situations. 

-- Operational Category: describes the general intent of the 

military operation undertaken to el‘tt;er combat or facilitate 

the LIC zitcation. 

-- Threat Type: type of threat US forces are expected to face 

during the LIC situation. 

-- Threat Support: the type of popu!ar support the threat will 

receive. 

Table 2. VARIABLE CATEGORIES 

The final scenarios produced by the program are short and simple as was 

specified by experts in scenario design. The user has options to print the 

scenario or send it to a computer disk file. If copied to a disk the scenario 

can then be modified using any word processor program that reads ASCII. 

The Contingency Scenario Generator User’s Manual (Dart & Guthals, 1990) 

provides additional information on the variables and the use of the program. 

VALIDATIOR 

Evaluation involved conducting what was termed a ‘reality check’. To 

perform the reality check, the program was taken to several wartime planning 

offices. 

The HQ ATC Technical Training (HQ ATC/TTIR?) division, HQ AFMEA. and the 

School of Aerospace Medicine, Battlefield Readiness (USAFSAM/EDO) office, Brooks 

AFB, were asked to review the program and provide input into its improvement. 

In addition to the above mentioned sources for scenario evaluation, other 

sources were contacted concerning specific aspects of the generator. ,Msa t 

notable was the value for attrition given in the scenario. The Air Force 

Wartime Manpower, Personnel and Readiness Team !AFW!ZRTl at Fort Riche, ,MD 

provided valuable information in this regard. 

The evaluation of the scenario generator by war planning experts led to 

several recommendations for further development. Those that were easy and 

straight-forward to implement in the time available were incorporated into the 

Scenario generator. Unfortunately, to implel;ent several recommendations wculd 

160

have involved complicated procedures or a major reprogramming of the generator, 

Therefore, although they would enhance the generator, those recommendations were 

not implemented. 

CONCLUSION 

During the CTT project a methodology was developed to design contingency 

scenarios. They can be used with task surveys to identify wartime tasks and 

subsequently, the needed training requirements. The project makes use of the 

latest information in scenario design and variable definition from both the Air 

Force and the Army. 

Phase I of the CTT project has been completed. The CTT scenario ’ generator-. 

has proven to be successful in its attempt to provide a suitable contingency 

scenario. In fact, while the program was originally designed for use with task 

surveys at USAFOMC, it has already been adopted by USAFSAM/EDO for designing 

scenarios for contingency*instruction of medical officers. 

Phase II of the CTT project, determining wartime skills through task 

surveys, would be undertaken and completed as appropriate USAFOMC. 

REFERENCES 

Army Field Manual 100-20 (19881. Low Intensity Conflict. Washington D.C.: 

Department of the Army. 

Dart, T. S., & Guthals, J. A. (19901. Contingency Scenario Generator User’s 

Manua 1 (AFHRL-TP-90-741. Brooks AFB, TX: Manpower and Personnel 


Dart, T. S. (19901. Contingency Task Training (AFHRL-SR-90-731. Brooks AFB, 

TX: Manpower and Personnel Division, Air Force Human Resources Laboratory. 

deLeon, P. (19731. Scenario Designs: & Overview (R-1218-ARPA). Santa 

Monica, CA: Rand Corporation. 

Dunigan, J. M., Dickey, G. E., Borst, M. B., Navin, D., Parham, D. P., Weiner, 

R. E. and Milier, T. M. (19651. Combat Maintenance Capability: Executive 

Summary (AFHRL-Z-85-351. Wright-Patterson AFB, OH: Logistics and Human 

Factors Division, Air Force Human Resources Laboratory. 

Meinders, M. (December 19871 . Talking Paper on Wartime Medical (WARMED) Work 

Center Description (WCDl. Fort Detrick, MD: Medical Wartime Hospital 

Integration Office (MWHIO). 

Smiley, A. A. (January 19891. Low Intensity Conflict Scenarios. Fort Huachuca, 

AZ: Low Intensity Conflict StudyGroup, U.S. Army Intelligence Center and 

School. 

Soldier Training Publication 21-I-SMCT (October 1987). Soldier’s Manual of 

- - 

Common Tasks. Washington, D.C.: Department of the Army. 

USAFOMC (April 19791. Air - Base - Ground - Defense Tactics Analysis (AFPT 90-E!‘- 1 

137, 90-812-1381. Randolph AFB, TX: Occupational Survey Branch, USAFOMC. 

161

COOPERATIVE LEARNING IN THE ARMY 

RESEARCH AND APPLICATION 

Angelo Mirabella 

U.S. Army Research Institute for the 

Behavioral and Social Sciences 

Cooperative learning (CL) is something many of us did in 

college when we took chemistry, physics, or calculus - courses 

built around problem solving exercises. We joined with a few other 

students to do homework or study for a test. We shared our 

understandings of the problems, helped each other correct 

misconceptions, and then reached consensus on how to solve' the -' 

problems. Today, formally organized cooperative learning groups, 

in classroom settings do the same things. Therefore, CL is not new 

or revolutionary. Yet, as a formal, institutionalized philosophy 

and methodology of instruction, it has been slow to take root, 

especially in the military. Public primary schools have progressed 

further. In Columbia, Maryland, for example, there is an elementary 

school whose classes are taught completely according to CL 

principles. The teacher primarily facilitates the work of small 

groups. The students and their activities rather than the teachers 

are the centers of attention. 

It is ironic that CL has taken so long to root since the more 

traditional teacher-centered approach, and even more modern 

individualized approaches are social arrangements contrary to what 

is demanded of people once they leave the school house (Raspberry, 

1987, 1988). This contradiction was especially blatant in my 

elementary school days when talking among students was punished 

with demerits, detention, or dreaded calls to parents to come to 

school for a conference. I remember vividly a visit by my father, 

who had to lose a day's pay to learn from my teacher that I talked 

too much in class. I also recall the back of his hand when I tried 

to explain my behavior and give my side of the story. Anyone who 

knows me would be astonished to learn that I once suffered from 

talkativeness. I often wonder what destructive social conditioning 

was imposed by such an environment. 

In contrast, cooperative learning implicitly recognizes that 

interpersonal relationship, i.e. communication, is one of the most 

pervasive and critical sets of skills anyone can learn. And what 

better place to foster such skills than the school house. 

Cooperative learning, is, at the same time, an effective way to 

develop other skills and thereby stretch instructional resources. 

At least that is the emerging conclusion of many years of basic 

research and some very preliminary applied research by the Army 

Research Institute. I'm hedging a bit because CL in the Army is in 

its infancy, and the work to be reported, while providing 

converging evidence for the effectiveness of CL, was not done in 

pristine, antiseptic laboratories. 

162 

I 

-

I'd like to start with an brief overview of basic research on 

CL and then review efforts by the Army to test the methodology in 

Training and Doctrine Command (TRADOC) schools. 

McNeese (1989) provides a useful tabulation of CL research. 

He cites Slavin's 1983 review showing that in 46 studies 29 had 

shown favorable effects, 15 no differences, and 2 showed advantages 

for fftraditionallf education. Johnson & Johnson (1985) reported that 

out of 26 studies 21 were favorable, two showed mixed results, and 

3 no differences: on balance, strong support for the value of CL. 

However, the reviews suggest that merely grouping students is not 

enough. Students do have to cooperate. Slavin goes further and says 

that group incentives, coupled with individual responsibility,'are -. 

essential. 

Otherwise, CL works,in many different circumstances. Johnson, 

Maruyama, Johnson, Nelson, and Skon (1981), in a review of 122 

studies extending from 1924 to 1981, found that CL was effective 

across a range of ages, subjects, and tasks. What emerges, from 

these reviews, is a type of conclusion often found for new 

performance technology. Cooperative Learning can be effective if 

properly designed. This conclusion applied to academic settings. 

Would it also apply to Army Schools? 

Research to answer this question was done at Fts. Lee and Knox 

and implemented at Ft. Lee under TRADOC-AR1 partnerships called 

Training Technology Field Activities (TTFAs). I first want to say 

a word about these, to provide a perspective on why this research 

was undertaken. In 1983 the TRADOC Commanding General concluded 

that his schools were not capitalizing on a steady stream of new 

ideas and technology emerging from the training R & D community. 

He wanted to establish a formal link from basic research to the 

Army's training community. Accordingly he invited AR1 to join with 

selected schools in TTFAs. Their purpose was to test new training 

technology, on significant Army problems, using TRADOC testbeds. 

The schools and TRADOC HQ were to lead in identifying test bed 

problems, while AR1 was to lead in identifying a prototype 

research-based solution. The partners were then to join forces in 

testing the solution. 

Activities were established at several schools including 

Quartermaster at Ft. Lee, Virginia and Armor at Ft. Knox, Kentucky. 

Cooperative learning projects were undertaken at these schools 

because basic research had shown that CL can be very efficient. But 

it had to be proven and implemented in Army settings. I'll mention 

the Knox work briefly and then focus on the work at Lee, since this 

was implemented and is still being used. 

Shlechter (1987) at Ft. Knox compared training effectiveness 

for cooperative groups of 2 or 4 students, and for individuals in 

the 19K MOS (Tank Commanders). From computer-based instruction (on 

MICROTICCIT) each student had to learn to interpret radio call 

signs and communicate in coded messages, tasks for which 

performance deficiencies had been documented. 

163

Improvements in performance were the same across training 

conditions, but the 4-student groups needed only two-thirds the 

time required by individuals to achieve comparable performance. 

Individuals and a-student groups were statistically the same here. 

Both the 4 and l-student groups made substantially fewer demands 

on instructor time as measured by ltcalls for proctor assistance", 

e.g 0 and 27 respectively compared to 115 calls from individuals. 

At the Ft. Lee TTFA, Hagman and Hayes (1986) examined the 

effectiveness of cooperative methods in a more traditional, noncomputer 

based setting. They wanted to define specific conditions 

under which CL would and would not work. From a review of the 

literature, they hypothesized that effectiveness of CL increases' - 

with increasing group size, though only when incentives were 

provided which encourage group members to share knowledge. 

Subjects were drawn from one unit (annex) of instruction in 

the 76C MOS Advanced Individual Training (AIT) course for supply 

clerks. Within this unit, the students receive a series of lectures 

each followed by a practical exercise (PE). Midway and at the end 

of the unit, the students are individually tested. Students who 

fail, go to study hall for remediation. Those who fail a retest are 

llrecycledVt, i.e required to repeat the annex. 

For the experiment, students were assigned to one of 3 groupsize 

conditions. They did the PEs alone, in groups of 2, or in 

groups of 4. The groups of 2 and 4 were further divided into two 

incentive conditions. Under one condition (Group Incentive), if any 

student in the group failed, every group member went to study hall. 

Under a second condition (Individual Incentive) only the failing 

student went to study hall. Hagman and Hayes predicted that under 

a group incentive (i.e. everyone to study hall), performance would 

increase as group size increased, but decrease with increasing 

group size under individual incentive. 

Results partially supported this prediction. For each of the 

two tests in the annex, groups of four were clearly optimal under 

the group incentive. But statistically groups of two did not out 

perform individuals. Recall that Shlechter had found a similar 

result at Ft. Knox. These similar results suggest, as a preliminary 

conclusion, that CL groups should contain more than two people. 

Other results supported the value of CL, but were inconsistent with 

the main hypothesis of the experiment. During the PEs, cooperative 

groups made fewer errors than did individuals, with or without the 

group incentive. In fact incentive made no difference at all. 

A potentially negative effect of CL was that groups took 

longer to complete PEs than did individuals. Not surprising since 

CL requires time for students to exchange information and ideas. 

However, if the added time does not exceed reasonable amounts of 

available instructional time or is off set by benefits, it can be 

discounted. Both conditions were satisfied in this study. 

164

~-~___-. -- ---- 

Brooks et al (1987) did follow-up research, using the entire 

76C course as a test bed, to assess further the benefits of 

cooperative learning. This was actually a full-Scale implementation 

test with an additional measure: recycle rate. Recycle rate is the 

percentage (per course) of students who fail end-of-annex tests, 

attend study hall for remedial review of material, fail a second 

time, and then repeat the annex. All students in three cooperative 

classes worked in groups of four - 34 groups for a total of 136 

students. These were compared with students in three other, 

regularly scheduled and conducted classes with a combined 

enrollment of 128 students. 

Results. The bad news was that the Hagman and Hayes finding 

of improved test scores for groups of four compared to individuals, 

was not seen by Brooks et al. Aggravating bad news was the 

agreement with Hagman and Hayes that CL students took longer than 

individual students to complete PEs, though here again they 

finished within the allotted training time. The investigators 

checked to see if a treatment - aptitude interaction might be 

buried in the data. They divided subjects into high and low scorers 

on the ASVAB Clerical scale, but found no interaction with training 

method.' 

The good news was that CL students made fewer errors in the 

PEs (as in the Hagman study) and that recycle rate was reduced from 

10.9% to 4.4%, i.e. 60% lower for CL students than for individuals. 

Brooks et al extrapolated this saving to a year's worth of classes 

(about 3,000 students) and estimated a cost reduction of $136,000. 

Not a large sum in the bigger scheme of things, but if CL were 

implemented Army-wide, the savings could be significant. Moreover, 

achievement scores in CL classes were not worse than in the 

llconventionallt comparison classes. This would be especially 

constructive for CL in a computer-based classroom because it 

supports assigning one workstation to 3 or 4 students, thereby 

reducing the demands for expensive hardware. A potentially positive 

effect, demand on instructor time, was not assessed, but may have 

been present. Recall in the Shlechter studies, CL students required 

notably less instructor help than did individuals. Finally, 

students and instructors preferred CL to individual practice. 

Outcome of the Research. The work by Shlechter, Hagman and 

Hayes, and Brooks et al, as well as a solid foundation of prior 

basic research led AR1 to recommend that the Quartermaster School 

implement cooperative learning. Brooks (1987) wrote an instructor's 

manual on how to set up and manage a CL classroom. The methodology 

has since been used in AIT for 76C MOS. Moreover, if and when 

computer-based instruction becomes wide-spread in the Army, this 

same methodology could save millions of dollars. With multiple 

students per workstation, the number of required stations could be 

reduced by two-thirds to three quarters. And cooperative learning 

could very well revolutionize the way the Army trains. 

165

REFERENCES 

Brooks, J.E. (1987). An Instructor's Guide for Implementinq 

.Coonerative Learninq in The Equipment Records and Parts 

Snecialist Course ((AR1 Research Product 87-35). 

Alexandria, VA: US Army Research Institute for the 

. Behavioral and Social Sciences. 

Brooks, J.E., Cormier, S.M., Dressel, J.D., Glaser, M., Knerr, 

B.W., & Thoreson, R. (1987). Cooperative Learnina: A New 

Approach for Traininq Equipment Records and Parts 

Specialists (AR1 Technical Report 760). Alexandria, VA: 

US Army Research Institute for the Behavioral and Social 

Sciences. 

Hagman, J. D, & Hayes, J.F. (1986). Cooperative Learnina: Effects 

of Task, Reward, and Group Size on Individual Achievement 

(ARI Technical Report 704). Alexandria, VA: US Army 

Research Institute for the Behavioral and Social Sciences. 

Johnson, D.W., & Johnson, R.T. (1985). The Internal Dynamics of 

Cooperative Learning Groups. In R.E. Slavin, S Sharon, S. 

Kagan, R.H. Lazarowitz, C. Webb, & R. Schmuck (Eds), ~_ Learninq 

to Cooperate, Cooperatinq to Learn. New York: Plenum. 

Johnson, D.W., Maruyama, G., Johnson, R.T., Nelson, D., & Skon, L., 

(1981). The Effects of Cooperative, Competitive, and Goal 

Structures on Achievement: a Meta-analysis. Psvcholoqical 

Bulletin, 89. 47-62. 

McNeese, M.D.(1989). Explorations in Cooperative Systems: Thinkinq 

Collectively to Learn, Learninq Individuallv to Think (AAMRL- 

TR-90-004). Wright-Patterson Air Force Base, Ohio: Armstrong 

Aerospace Medical Research Laboratory. 

Raspberry, W. (1987, September 29). Why Should Kids Compete in 

Class. Washinqton Post. 

Raspberry, W. (1988, August 1). From School to the Real World. 

Washinaton Post. 

Shlechter, T.M. (1987) . Grouped Versus Individualized Computer- 

Based Instruction (CBI) Traininq for Militarv Communications 

(AR1 Technical Report 1438). Alexandria, VA: US Army Research 

Institute for the Behavioral and Social Sciences. 

Slavin, R.E. (1983) . When Does Cooperative Learning Increase 

Achievement? Psvcholoqical Bulletin 94, 429-445. 

166 

---7 

. 

I

BATTLE-TASIUBATTLEBOARD TRAINING 

APPLICATION PARADIGM AND RESEARCH DESIGN 

John C Eggenberger PhD, Director Personnel Applied Research and Training Division, 

SNC Defence Products Limited 

Ronald L. Crawford PhD, Professor, Concordia University 

1. Introduction 

Competitive advantage occurs when one protagonist creates and exploits superior relative certainty in 

an area which is uncertain or problematic within the industry. Porter, Khandwalla, Waterman & Peters, 

and others have proposed typologies of ways in which one can gain competitive advantage (vie: product, 

promotion, investment, scope, etc.), but they give a false impression that these represent institutional, 

executive level, quantum events. The opposite, in fact, is the typical case. Quinn, Mintzberg;Crawford, 

Gram, and Star, Cyert, March, Cohen, Drucker and others emphasize that competitive advantage is more 

typically achieved cumulatively through successions of w ram to locally evident ambiguities, 

threats, opportunities and variations. In other words, through voluntary improvisations which 

people undertake on their own initiative in relation to disciplined actions. 

The improvisation process has been studied extensively by members of the SNC Personnel Applied 

Research (PAR)team in both military and civilian settings. It corresponds very clearly to the behavioural 

theory of the firm, consisting broadly of: 

� applying heuristic diagnostic and response skills; 

� using experimentation to test beliefs, learn more, and influence the constellation of factors; and, 

� creating uncertainty among one’s competitors. 

In typical populations, psychological readiness and capacity to exercise “disciplined initiative” thence 

improvise, are statistically uncommon. These populations are characteristically stabilized well before 

entry to the work force, and are developed over long periods of intensive investment of time, energy, and 

resources. There is substantial evidence, however, that comparable skills can be achieved by adults, 

although most current examples tend to be costly and harrowing experiences for the participants. Real 

or simulated equivalents of combat, for example, do produce high levels of intuitive problem solving and 

experimental learning, but with significant casualty rates and considerable cost. 

In this regard, the PAR team has identified the following: 

� a method and content which can be employed in broad based training & development 

settings to produce effective improvisations from the application of “Disciplined Initiative” 

� a reformulation of that content into a format which retains a high level of psychological 

engagement but reduces the resource requirements and real/psychological casualty rates; 

� a refinment of the content and method of instruction into a form suitable for field trial in a 

military setting; and 

� a development from the field trial parallel curricula tailored to the context of other industry 

applications and levels of management. 

2. Disciuline and Initiative: 

What are the major determinants or sources of initiative ? Discipline, on one hand, is acquired by 

learning how to deliver predictable and standardized outcomes when appropriately cued (certainty j. 

Improvisation, on the other hand, is the delivery of a satisfactory outcome when initiative is exercised, 

i.e., action is called for but the cues have not been experienced before, nor is there an available 

repertoire of rehearsed responses to cope with the situation (uncertainty). A far more complete 

Copyright SSC PAR DIV .&far 1990 

167

----- - .-_.-------_.. -_.-I-. .--. 

treatment of these notions in relation to prior research has been done and reported elsewhere. 

For the military commander, regardless of appointment, as well as for other vocations, the distinction 

between “discipline” and “initiative” is important. The commanders of sections, platoons, companies, 

battalions, divisions, corps, armies, who can handle both the determinate and indeterminate 

aspects of their responsibilities would appear to project a number of advantages as follows: 

� the commander would cope more effectively with both foreseen and unforeseen events, 

� the commander would require substantially less attention or supervision, and 

� the commander would have a greater capacity to assume authority. 

Within the context of the military commander some of the research questions we propose to ask in 

relation to Discipline, Initiative and the capacity to Improvise are as follows: 

� How is discipline developed? � How is initiative developed? . How do discipline and 

initiative interact? � How does “disciplined initiative” influence improvisation outcomes? 

3. The maior prouositions are listed as follows: 

� the more a person experiences intimate, emotional, idealistic and reinforcing socialization 

experience, the more a person will have the propensity to exercise “disciplined initiative” under 

conditions of uncertainty; 

� the more a person exercises “disciplined initiative” under conditions of uncertainty, the 

more a person will be able to exploit available options (improvise) in a battle situation; 

. the more opportunities the person has to rehearse battle scenarios under controlled conditions, 

the more the person ‘will exercise appropriate “disciplined initiative” decisions and actions; 

and, 

� the more a person acquires and copes with difficult assignments, the more a person will 

continue to exercise “disciplined initiative” under conditions of uncertainty. 

The matrix at Figure 1 shows the importance of DISCIPLINED INITIATIVE to the Military Commander. 

Clearly, it is important to design and deliver a curriculum of continuing training and education 

that will result in the bulk of the Commanders belonging to the upper left quadrant, and none 

found in the lower right quadrant. 

Copyright SSC PAR DIV Mar 1990 

HIGH 

HIGH 4-e LOW 

IMPLICATIONS OF DISCIPLINE AND INITIATIVE FROM THE 

PERSPECTIVE OF THE MILITARY COMMANDER 

168

4. Situational Awareness and the Militarv Commander 

Situational awareness has also been developed to deal with recent reanalyses of the sorts of thinking 

that goes on under complex and rapidly changing conditions, especially when information inputs 

and ouputs are degraded by blockages and noises of a variety of kinds and intensities. Essentially, 

the Commander must be able to act upon knowledge of himself and his forces, the disposition of the 

enemy forces, anticipate reaction of the enemy to his intiatives in the context rapidly changing conditions 

and timelines. 

The basis of the Commanders action is inputted information COMMUNICATED to him, mainly 

audio (voice) and visual (eye); outputted action information COMMUNICATED by him, mainly 

audio (voice) and psychomotor (eye-mind-finger-hand). What is of concern in the production of . 

qualified COMMANDERS is the types and ranges of thinking that must occur in order to decide on, 

and communicate, courses of action that are appropriate for given scenarios. 

5. Training to Tactical and Strategic Actions 

The effectiveness of combat elements depends to a great extent upon the ability of their personnel to 

carry out three kinds of actions: 

� Highly, efficient enactment of predictable routines, such as mobilizations, preparation for 

action, decamping, assembly, deployment into and out of movement formation, and establishing 

formations for classical types of actions. These are activities which recur with regularity in such 

consistent form that a well-drilled unit literally has them down to a well honed science. These are 

performed with minimal judgement because the “solution” is already known. 

� Applying “Directing Staff Solutions”, or “classical tactics” effectively in appropriate field 

and simulated situations. The clearest illustrations of these are the action sequences or drills in small 

unit tactical manuals. Those tell the participant what to do and how to do it under most tactical 

conditions. Directing Staff solutions require an element of active diagnosis of the context (i.e. a 

military appreciation), a choice among alternative responses from a standardized repertoire, and adaptation 

of those responses to match situational particulars. 

� Improvisation. Patton observed that plans never survive the initial engagement. Substantially 

the same sentiments of commanders and theoreticians across millenia demonstrate that under 

firing line conditions, the classical solution sometimes cannot be ascertained, may not apply because 

of locally evident threats or opportunites (or may even be counterproductive if it represents definitive 

intelligence for the opposing force). 

Under all battle conditions improvisations are required. Typical kinds of improvisation include: 

� making tentative and partial diagnoses under uncertainty; 

� using action to test diagnoses, clarify the context, and alter the context; and 

� creating uncertainty for the opposing.force. 

6. RelationshiD of, fPpG to ombat Related Doctrinal Training 

0 

Training to doctrine usually takes up the following three forms: 

Copyright SSC PAR DIV -Mar 1990 

169 

.

a) SCRIPTED ROUTINES, comprised of, 

. Rationale; Components; Chained components; Whole (Insight - Gestalt). 

b) ADAPTED ROUTINES, comprised of, 

� Pattern recoenition - more of the same situations. 

� Repertoire- more routines and variations. 

� �� 

c) IMPROVISATIONS, comprised of, 

� An Act -Watch stream. 

7. Scrinted Routines 

� A Convergent stream - � process of elimination � working backwards 

� partial solution � simplified modes � analogues 

� An Enacting stream � via networks � forcing errors � tactics of mistakes 

� A Comnetitive stream � using edge of certainty � creating uncertainty for others 

Scripted routines are the action equivalent of commodity strategies, or mass production. They 

depend for their effectiveness upon speed, precision, predictability and integration of more or less 

complex but fixed routines. Realtime thinking is largely replaced by decision loops and redundancy. 

The optimal training scenario for such manoeuvre is the rehearsal. In rehearsals, the “big picture” 

(e.g., a drill, movement, parade...) is broken down into its constituent components, such as tasks and 

actions. These are rehearsed until the trainee achieves complete command. The components are 

strung together in progressively longer trains of action until the entire routine is represented. Psychologically, 

the process is a direct application of behavioural conditioning (chaining). 

The challenge of teaching scripted routines is that the actions involved tend not to be very exciting 

or involving. This requires developing imaginative training methods such as competing against the 

clock, against scoring systems, or against other teams. Varying the training content will also help: 

here it is important to move from board simulations to field simulations at an early stage (e.g., 

convoying in snowstorms, at night without lights...). 

8. Adanted Routines 

The fundamental training objectives here are to create repertoires of stored situational patterns, and 

to match these with “DS Solutions”, or repertoires of behavioural routines appropriate for each situation. 

Unlike scripted events, where a workable outcome is effectively guaranteed by rote enactment 

of a fixed recipe (e.g., a parade or unobstructed road movement), enactment of adapted routines 

requires making more or less continuous judgements and reassessments. Those are necessary, first, 

because tactical situations all differ in important detail, and because they change as actions evolve. 

These judgements are also necessary to modulate actions and to maintain unit control and external 

coordination. Under operational conditions there is precious little time or attention for anything else. 

That means that the “basics” of situational appreciaticn (information gathering, interpretation, summarizing 

in a working model) then identifying and implementing the appropriate tactical response 

must be almost reflexive. 

The classical approaches to situational assessment and theoretical doctrine actually work reasonably 

well. That is, breaking the processes into stages, and then working through numerous examples of 

each stage, starting with simple, small scale examples and working up to more difficult and complex 

Cqwight SSC PAR DIV Mar I990 

*-,. ,.,r .,..,_ 

170

examples then chaining the stages successively into processes. The main problems with contemporary 

training are that it is not engaging or realistic enough, that trainers are usually insufficiently 

prepared or supported with aids, scenarios and materials, and that the trainers do not experience 

enough examples/cycles, and variations, for recognition and response to become second nature. The 

ultimate objective is to prepare someone who can act like he has seen the situation before, understands 

it, can visualize the opponent’s perspective, and can select and enact the actions needed to 

ensure favourable outcomes. The response of a BATTLE-TASK TRAINING/BATTL,EBOARD 

based curriculum is to create surrogate experience, and to meter that experience at a controlled but 

challenging rate of exposure. 

9. Imnrovisations 

The objective behaviour here is taking effective local action under uncertainty or ambiguity that 

obviates rational, calculated decisions. These conditions are commonplace, yet they are poorly 

. addressed (and in some quarters actively denied) in the training curricula. Improvisation accounts 

for the extent people are able to respond to, understand, exploit, and occasionally create transient, 

locally evident threats, opportunities, and ambiguities (versus becoming immobilized or simply 

plowing ahead according to the initial set of orders). 

Current knowledge of how people respond under uncertainty concentrates upon the interacting 

processes of: 

� heuristic problem solving, 

� learning by experimentation, and 

� creating opponent uncertainty & loss of initiative. 

To teach to these interacting processes the simulation scenarios and curriculum is modified. The 

scenarios follow a simpler progression than pattern recognition and/or repertoire development. 

These scenarios would directly link, simple - complex, small scale - large scale. However, they will 

be made deliberately ambiguous, with cues of increasing subtlety regarding threats, opportunities, 

dispositions and intentions. The objectives will be attaining tactical certainty and initiative (i.e., 

bringing the simulation back to an accepted routine format). The role of the umpire/trainer will be 

made much more active, as he will effectively be reorienting the scenario to reflect what is learned 

from each move as well as the objective outcomes. The discussion emphasis will shift from recognition 

to inference. Physically the simulations will employ progressive disclosure. As capacity develops, 

options such as fractures in command - coordination can be included. The core issues are: 

� how can I think and experimentally work my way into a situation where I know what’s 

going on and can employ my tactical handbook, and 

� how can I prevent my opponent from getting to that stage first? 

10. TheTrials and the Setting. 

The initial focus of the trials is the Army Reserve Unit. Decentralization, resource constraints, limited 

personnel time, and the Army Reserve Unit’s need for an experience payoff which will enhance both unit 

and civilian career opportunities of participants, and the critical role of the Army Reserve Unit in a 

scenario of future national defence make the Army Reserve Unit a particularly attractive site for the trials. 

Further, favouring the choice of a Army Reserve focus is the availability of the training simulation 


171

device BATTLEBOARD, a robust and transportable table-top terrain modeling simulator, and readily 

adapted doctrinally based Battle-Task training scenarios, as well as pre-existing knowledge of the 

general nature of tactical uncertainties of combat arms units, and the compressed time frames and clear 

field testing which Reserve settings offer. 

11. Interrupted Training Schedules 

Moreover, Army Reserve Unit training curriculum has not yet been specifically addressed in terms of 

the real constraints confronting a Army Reserve Unit. The Army Reserve Unit soldier trains “part time”, 

while the “Regular Force” soldier can train “fulltime”. Courses and exercises are not interrupted for the 

Regulars while they always are for the Reserve Unit. The usual method of fitting training requirements 

to Reserve Unit needs is to cut parts out of a curriculum, sequence the course curriculum differently, and/ 

or stretch it out over a longer time period. Clearly these approaches will not be adequate for enlarged 

Reserve Units in a Total Force Army. 

Thus, throughout this project there is a concurrent activity devoted to applying the ingredients of a 

theory/model of linked learning. This theory/model is needed in order to accomodate the time 

available for training the Reserve Unit person. Usually this time is available in “dribs and drabs”. 

As a consequence, the curriculum must be parcelled out for the Reserve Unit in such a fashion that 

the results of training are the same as for the Regulars, who engage the curriculum as a coherent 

whole. 

The core assertion of this linked learning notion is that each BATTLE-TASK (e.g. “Advance to 

contact” - Infantry alone), is taught “wholistically”, in the teams that are in command, using terrain 

models, with the instructor using the “inductive” mode of instruction. The objective is to increase 

situational awareness in the team, and enable them to distinguish between “discipline” and “initiative” 

to increase the teams comprehension of, and the use of, “Disciplined Initiative” 

12. Action 

Figures 2,3,&4, overleaf portray the format of the trials that conform action science research strategy 

encouraged by Argyris et el(1985), and the Personnel Applied Research method used by the PAR team. 


172

I 

L 

THE PERSONNEL APPLIED RESEARCH FORMAT 

COMBAT FORMATION = INFANTRY 

TEE MAh’OELVER li ADVAKCE ‘l-0 COSTACT 

TEE “DELIVERABLE” = A KILLTAKEN FROM AS ESEMY FORCE 

TACTICAL DOCI‘RIXE = (CEOOSE ONE) 

+ 

choose one 

COMBAT FGRMATlONS TRAMPD 

USING THE BATTLEBOARD 

TRAl?SlSG SYSTEMS 

THE 

ACTION 


THE 

RESLZT 

THE THE THE 

THE TRAm7b.G OFTlOS ACTION RESULT COMPARISON 

Chax.. OSE 

. SCRIFI-ED ROLmES 

. ADAkTED ROL?IThXS 

. MPROWBATIOSS 

� SCRIPTED ROLXXES 

� ADAPTBD ROUI-LSES 

. IMPROVISATIOSS 

. SCRWlED ROL-33 

. ADAPTED ROLTCIIES 

. IMPRO”ISA77OSS 

I I 

THE PERBOhrlYEL APF‘IED RESEARCH - BASIC DRSIGN 

THE COMARITWE ANALIS= 

177 

THE 

COMPARISON 

COw*AT FoRx4HATIONS TRANED 

USING CURREWT 

TRAIMNG SYSTEMS 

THE 

ACTIOX 

THE 

RESULT

_-- ..--. _-. ._ _--... ---- 

__ _ . _ _ 

COMBAT VEHICLE COMMANDER'S SITUATIONAL AWARENESS: 

ASSESSMENT TECHNIQUES 

Carl W. Lickteig 

Major Milton E. Koger 


Field Unit-Fort Knox 

Captain Thomas F. Heslin 

2nd Squadron, 12th Calvary Regiment 

Fort Knox, Kentucky 

Abstract 

The ability to "see the battlefield" is critical to 

successful execution of the battle. This precept is true at all 

echelons including commanders of small units and individual 

weapon systems. To train and foster this ability, however, 

methods for assessing and enhancing the commander's situational 

awareness (SA) are required. Recent efforts (Endsley, 1988; 

Fracker, 1988) have focused on the development of objective 

measures of fighter pilots SA. This paper extends this effort to 

measures of SA for land combat vehicle commanders. 

As part of the Army Research Institute's (ARI) program of 

research in support of future Combat Vehicle Command and Control 

(CVCC) systems, small unit commander's SA was identified as a 

potentially important measure of system effectiveness. Parallel 

forms of two SA instruments were developed for objective 

assessment of a commander's perception, comprehension, and 

projection of the battlefield situation. This paper provides a 

description of these SA instruments and their utilization in 

support of the CVCC simulation-based program. 

Background 

The combatant's SA represents his knowledge of the world and 

his role in it. SA includes both lower and higher order mental 

processes ranging from the simple perception of individual 

elements of the situation to an assessment of their meaning and 

impact on immediate and overall mission objectives. Endsley's 

model of SA details three distinct levels--perception; 

comprehension, and projection--included in the following 

definition of SA: ". ..the perception of the elements in the 

environment within a volume of space and time, the comprehension 

of their meaning, and the projection of their status in the near 

future" (Endsley, 1988, p. 97). 

For ground forces, SA is more commonly described as the 

commander's ability to "see" the battlefield in relation to his 

mission and the overall mission. Combined arms combat, 

particularly for ground systems, entails coordination and support 

of multiple units. Situational awareness for combined arms 

commanders must include, perhaps more so than for combat pilots, 

the context of the combined mission. 

174

Typically a commander's awareness of a combat situation 

begins with the assignment of his unit's mission embedded in the 

concept or schema of the overall mission that his unit is 

supporting. The mission specifies the area of operations on the 

battlefield, the place(s) in the world that the commander is to 

occuPYl as well as the objectives and time frame driving mission 

pace. The mission brief and order of operations describe the 

known and suspected enemy forces and activities in that area, key 

terrain features and locations related to mission accomplishment, 

and friendly combat, support, and service support units 

responsible for mission execution. 

Once the battle commences, the commander's perception 

(Endsleyls SA Level 1) of the situation is enhanced by the direct 

or reported detection of enemy units. When initial contact and 

spot reports are received by the commander, his perception of the 

situation must be quickly updated. As a commander, he must also 

attempt to comprehend (SA Level 2) this information, its 

significance to his unit and mission. Given the reported type 

and number of enemy units detected, he may begin to estimate the 

size and type of the overall force committed, their weapon 

systems and range, their organization and support. 

As his understanding of the situation develops, the commander 

begins to project (SA Level 3) or reassess probable courses of 

action. Given the location and heading of units reported and his 

estimate of force structure, he may begin to calculate when, or 

if, the main unit will reach his location, at what point he may 

need to displace his unit from their current location, and what 

impact the current situation will have on the future situation 

such as his unit's next proposed location. 

This effort in SA development for ground systems is part of 

ARI's program of research in support of future CVCC systems. 

These systems will provide ground vehicle commanders a unique 

capability for the digital communication of text and graphic 

battlefield information, in addition to conventional FM radio. 

The CVCC program objective is the development of soldier-tested 

specifications for future automated command and control systems 

for ground combat vehicles. AR1 conducts simulation-based tests 

o.f prototype CVCC systems using the Armor Center's Close Combat 

Test Bed (CCTB), formerly Simulation Networking Developmental 

(SIMNET-D), at Fort Knox. 

Simulation-Based Methodology 

An objective measure of commander's SA is based on a 

comparison of the actual situation with the commander's 

assessment or report of the situation. Maintaining an accurate 

knowledge of the battlefield situation, however, is difficult for 

both commanders and SA researchers. For the latter, simulationbased 

scenarios provide a capability to control and know the 

battlefield situation. 

175 

. .

To ensure an accurate knowledge of the actual situation at 

the time of SA assessment, a set of battlefield situations, 

vignettes, were developed in which all the informational elements 

pertaining to the situation were prespecified and prerecorded. 

Prespecification ensured standardization of situation 

determinants. 

Prerecorded materials for the vignettes included simulationbased 

files designating commander and friendly unit locations, 

operational overlays to be displayed on the commander's Command 

and Control Display (CCD), and message sets to be received on his 

CCD during the vignette which would provide updates on his 

battlefield situation. 

At the start of each vignette, the 

commander was provided a map and map board with acetate 

operational and note overlays, and a brief description of the 

battlefield situation leading up to the vignette. 

The tactical situation for the vignette placed the commander 

in his tank simulator occupying a stationary defensive battle 

position (BP) for a delay-in-sector mission. The time frame for 

the vignette began after the postulated successful delay of 

initial enemy elements by his unit. The vignette was terminated 

prior to his unit's displacement to a subsequent BP. 

Immediately after a lo-minute message reception and 

processing phase, the commander was escorted out of his simulator 

to an adjacent workstation. He retained the map and map board 

used while receiving messages, but the operational and note 

overlays were replaced with another acetate sheet depicting only 

his own BP and the BPS of the adjacent companies. The commander 

was given one version of both the plotting and 'Iseeing" 

questionnaires, described in the following section, and lominutes 

to record his answers. For each vignette, one 

questionnaire pertained to the current situation and the other to 

the future situation in a counterbalanced sequence across the 

series of vignettes. 

SA Instruments 

The primary goals in the development of the situational 

measures for this effort were (a) to develop a set of items that 

addressed each of the primary levels of SA for small unit ground 

commanders, and (b) to develop a response format that supported 

objective scoring of a commander's SA responses. 

Perception: Plotting 

To capture commander's perception of the situation, a 

situational awareness form was developed which required 

commanders to plot on a military map the locations of reported 

enemy units, friendly units, and key control measures. The 

location data selected for these items were based on SME's 

176 

.

Table 1 

Situational Awareness Items: plotting the Battlefield Situation 

Current Situation Future Situation 

Largest unit engaged 

Largest unit approaching 

Friendly scout unit 

Target reference points 

Largest unit outside sector 

Support unit to rear 

Company's subsequent BP 

Obstacle(s) to rear 

Enemy scouts to rear . . . 

Mortar unit to rear 

estimates of the more important location information provided 

during the vignette. A five-item series of plotting questions 

was developed for both the commander's current situation and 

future situation as indicated in Table 1. 

The current situation was defined by informational elements 

of more immediate concern to the commander including enemy 

elements currently being engaged by his unit. The future 

situation was defined by less immediate information including 

enemy units in the area but well beyond current range, or 

information related to his next location, the subsequent BP. 

Comorehension and Projection: 18Seeino" 

To assess the commander's comprehension and projection of the 

battlefield situation, a second SA form was developed. Items on 

this form required commanders to compile isolated report 

information into aggregate reports, to estimate the size of 

designated enemy units including main and attacking units, and to 

project the impact of the information received on his unit's 

current and future situations. Five close-ended items were 

developed for both the current and the future situation, Table 2. 

For the current situation the items addressed the commander's 

ability to comprehend the more immediate battlefield situation to 

the front of his current BP. The first two items required him to 

compile reported information received during the vignette into 

summary reports detailing the number and type of enemy units 

destroyed and damaged by his company, and the number and type of 

enemy units still approaching his current BP. The remaining 

items addressed the commanderls ability to go beyond the data 

actually reported, to understand the nature of the threat facing 

both his company unit and the overall task force. These items 

asked the commander to estimate in turn the size and type of the 

enemy unit actually engaged, the unit approaching his company, 

and the total unit committed against the overall task force. 

177

Table 2 

_-_-.-__. .-_.. -. .-.- 

Situational Awareness Items: "Seeing*' the Battlefield Situation 

Current Situation Future Situation 

Number 61 type enemy damaged Distance/direction to main Unit 

Size t type unit engaged Heading of main enemy unit 

Number & type unit approaching ETA main unit < 2,000 meters 

Size & type force approaching Distance/direction next BP . 

Overall size & type unit Impact of obstacle(s) on unit's 

confronting the task force next BP 

~For the future situation, the items addressed the commander's 

ability to project beyond his immediate situation and use the 

information provided during the vignette to anticipate upcoming 

events. The initial items focused on the commander's awareness 

of the main enemy unit approaching his company sector. Reports 

received during the vignette had provided information about the 

heading and location of a relatively large enemy unit in the 

company's sector but well beyond engagement range. The commander 

was required to provide the location and heading of this main 

unit, and then estimate if, and when, that unit would approach 

within 2,000 meters of his current location. 

The final two items assessed the commander's awareness of key 

information related to his unit's proposed future location. One 

item asked him to provide estimates of distance and direction to 

his unit's subsequent BP, and the final item asked him to assess 

the impact that reported obstacle(s) might have on movement to, 

and occupation of, that BP. 

Obiective Responses: Scoring 

A key concern in the construction of the SA items was to 

develop a set of questions that clearly specified the situational 

information requested. The simulation-based vignettes driving 

the scenarios were designed by subject matter experts (SMEs) to 

' provide a wide range of battlefield reports of differing 

relevance to the commander's mission. To ensure commanders 

clearly understood what information was being requested for each 

item, special attention was given to item wording. The item 

stems consistently provided and emphasized, for example, 

distinctions between enemy units engaged versus not engaged, 

locations in the unit's sector versus adjacent sectors, and 

elements to the front versus the rear of the unit's BP location. 

To meet the goal for SA instruments that could be objectively 

scored, the response formats required commanders to provide 

178

answers that precisely indicated their knowledge of the 

information requested. For items in which commander's were 

required to plot the locations of designated elements, objective 

assessment of location accuracy was straightforward. For the 

remaining items directed at comprehension and projection of the 

situation, a combination of fill-in-the-blank (e.g., enemy type, 

number) and multiple choice (e.g., mechanized rifle battalion 

versus tank company) item formats were used. SMEs assisted in 

the construction of all response options to provide commanders 

appropriate and meaningful response alternatives. 

Two pilot sessions with active duty Armor commanders, three 

platoon leaders and one company commander per pilot, were 

conducted to obtain user feedback on the SA procedures and items 

developed. During the first pilot, commanders provided detailed 

feedback during structured debriefs. Their comments assisted, 

particularly, in the identification of items requiring more clear 

or explicit wording. Their recommendations were included in 

revisions to the SA measures, and the revised questionnaires used 

for the second pilot appeared quite adequate. 

SA Utilization 

The SA forms are currently being used in ARI's CVCC program 

of research to investigate small unit commander's information 

requirements. An initial evaluation compared commander's SA as a 

function of message sets received on their CCD that differed in 

volume, number of messages per set, and relevance to their 

battlefield situation. Results of this effort are expected to 

provide recommendations for improving the design of this future 

automated command and control system. In addition, this data 

will be used for empirical validation of the SA method and 

instruments described. 

A follow-on baseline evaluation of commanders using only 

conventional FM radio systems without a CCD will provide 

comparison data on the speed and accuracy of the CCD for 

receiving and relaying battlefield communications. As an 

additional dependent measure, the SA instruments will provide 

comparison data on the CCD's ability to help the commander 

integrate command and control information into a more accurate 

awareness of his battlefield situation. 

References 

Endsley, Mica R. (1988). Situation awareness in aircraft systems. 

In Proceedinas of the Human Factors Society-32nd Annual 

Meetinq, I, 96-101. Santa Monica, CA: The Human Factors 

Society. 

Fracker, Martin L. (1988). A theory of situation assessment: 

Implications for measuring situation awareness. In Proceedings 

of the Human Factors Society-32nd Annual Meetinq, 1, 102-115. 

Santa Monica, CA: The Human Factors Society. 

179 

. .

An Aviation Psychological System for Helicopter 

Pilot Selection and Training 

F.Fehler 

Consulting Psychologist, German Army Aviation School, Bi.ickebu1.g 

1. Current Situation in Aviation Psychology 

111 Germany, aviation psychology looks back OIL an impressirig 

history which had its beginnings way back in 1916 as some 

mythical accounts would have it. Although it is untrue that the 

late "Red Baron" made the acquaintance of aviation 

psychologists, it is certainly true to say that all German 

military pilots since the end of WW I have been confronted with 

aviation psychology in one way or another, if not with an 

actual aviation psychologist, then at least with aviation 

psychology methods and instruments. AS a general rule, such 

instruments would include paper and pencil tests, and boxes 

with all kinds of levers, buttons, lights and bells. III the 

sphere of aviation, psychology was essentially synonymous wit.11 

pilot candidate selection. Presumably this is also true fol 

other countries where aviation psychology is practiced. 

On the aLher Iland, aviation psychologists were surprisingly 

hesitant in touching two other important areas of aviation, 

namely 

- pilot training 

- psychological support for aviators. 

Obviously, this is a short-sighted Lttitude, for it is tile 

training that will show whether or not the previous 

psychological screening methods were succesful. Psychologists 

should therefore attend flight training, either by making 

active contributions or by merely acting as obeservers, to 

ensure that the criteria applied to conducting the trainir1.g and 

to assessing the achievements made are the same as those that 

were applied to devising and evaluating their own test me,thods. 

Any other approach would not lead .to representative validation 

coefficients. 

An aviation psycholgist who descends from the heights of his 

ivory Lower research to offer his knowledge to an aviation 

school and commit himself to solving its practical, everyday 

problems will soon find himself left alone and discover thtit he 

does not have the psychological tools required. Wtlat i.s Lhc: 

reason for- this and what call be Doyle about it'? 

180

2. Identifying The Problem 

2.1. Screening Methods Used Ry Aviation .Psychologists 

The need to screen pilot candidates is undisputed; screening does 

not only serve the purpose of making the training cbst-effective, 

it also intends to save unsuitable applicants from having to abort 

a career. The problem of choosing suitable applicants seems to be _ 

an easy task for the practitioner in psychology, as he can choose 

freely from a plenitude of psychological methods that have been 

accumulated by two generations of psychologists having done 

exstensive research in this particular area. On looking more 

closely at the existing litera,ture, however, he will discover the 

following: the best achievements made so far are validation 

coefficients that lie with r=. 5 in the most favorable cases! 

For the selection of applicants this means that he has to apply 

the most uncompromising cut-off-scores if he wants to satisfy the 

management with less than 10 % of candidates who have to be washed 

out from pilot training. It is obvious that such an approach would 

be synonymous with a sharp increase in the percentage of 

mistakenly rejected candidates, which is totally unacceptable 

unless one can draw on a large number of applicants. The latter is 

not the case in German army aviation. This means that the 

conventional testing methods are exhausted. Now that the old test 

methods have been modified and renamed over decades it is hard to 

imagine the occurance of a major breakthrough to yield validation 

coefficients that are clearly above r=.5 . 

2.2. Pilot Training 

The contribution aviation psychology has made to pilot training is 

in fact very small when one looks at the contributions made in the 

field of pilot candidate selection. There is not much point in 

taking enormous pains to choose student pilots and then abandon 

them to their fates. In the German Army Aviation Branch, the 

psychologist will look after every student pilot who gets into 

trouble in the course of flight training. This approach has hel.ped 

to gain the following experiences: 

Problems resulted from 

- flight instructors being unable to establish a personal 

relationship with the student 

- vague learning objectives which were not clearly understood by 

students and interpreted differently among instructors 

- inconsistent teaching methods 

- the structure of the training program being based on a too 

demanding learning progression and paying no attentiorl to 

student's individual training needs and learning speeds. 

181

In his attempt to overcome these difficulties, the aviation 

psychologist learns that he lacks an important tool: to develop 

and to test training concepts he must have free access to a full 

mission simulator and to helicopters. Availability and flexible 

use of both these types of systems under varying experimental 

conditions are extremely limited due to severe operational 

restrictions. 

2.3. Further Tasks Of Aviation Psychology 

The final success of the screening methods used in aviation 

psychology also depends on the following aspects: 

t training provided to instructors 

t prophylactic stress prevention anii coping programs 

+ analysis of pilot behavior displayed in the cockpit. 

From all these tasks one can conclude that the aviation 

psychologist needs a simulator-like system the primary asset of 

which is the capability to simulate psychological demands rather 

than maximum realism in terms of aircraft control . 

3. Solution 

The solution to the problems arising from pilot candidate 

selection, flight -training, ergonomics and psychological support 

must be looked for in the cockpit, However, the cockpits of full 

mission simulators or actual aircraft are not suited for the 

purposes of a scientific analysis for various reasons. This means 

that an aviation psychologist has to develop his own cockpit 

optimized for his specific aims. Such a concept will be described 

in the following paragraphs: Tile procurement of this system called 

"Aviation Psychological System/Helicopter (APS/H)" for the Germati 

Army Aviation Branch has already beer1 initiated. 

3.1. Task Structures 

The task structures of the APS/H are based or1 psychic problems 

that typically arise in the course of actual training missions: 

3.1.1. Problem Area: Psychomotor Control 

The intricacies of controlling an aircraft may lead to psychomotor 

problems . The APS/H will therefore be equipped with all the 

controls that can be found in a helicopter. Control inputs made 

with the APS/H controls will not only convey the same fi:~;iing iis 

those of a real helicopter, it \qill also be possible to c!:ange Lli~ 

control st?nsitivity over a wide j*ringe. 

182

3.1.2. Problem Area: Handling Complexity 

Sophisticated helicopter cockpits give rise to individual handling 

difficulties. It will therefore be possible to analyze these 

difficulties by switching displays and control elements on and off 

and thereby vary handling complexity. 

3.1.3. Problem Area: Mission Demands . 

Certain flight missions and the demands that go with them may 

overtax the pilot mentally or physically. The APS/H therefore will 

have the capability to simulate all individual tasks and 

requirements pilots ha$e to fulfil during typical missions. 

3.1.4. Clinical Aviation Psychology i 

The special mission requirements inherent of flying military 

helicopters may also push experienced pilots to their performance 

limits. The APS/H is therefore designed such that flying-related 

requirements and psychotherapeutical measures (e.g. 

desensitisation, autogenic training etc.) can be combined with one 

another. 

3.1.5, Aviation Psychological Research I t,: 

It is a matter of course that designers of sophisticated equipment 

develop special test set-ups to be applied throughout the test 

phase in order to analyze and evaluate the operating performance 

of the device under development. APS/H will be a similar tool 

which will not only be used for solving ergonomic problems but 

also for 

- developing training systems and methods, and for 

- testing crew concepts. 

3.2. Realism Requirements For The APS/H 

Simulator realism is not an end in itself. An improvement in 

realism does not automatically improve simulator efficiency. An 

increase in realism will primarily go hand in hand with a linear 

increase in cost. So what is the degree of realism required for 

the APS/H to ensure maximum efficiency? 

3.2.1. lYotor Realism 

Operation of the controls, latency between control input and 

instrument display need to be as close to reality as possible 

since automatic handling patterns internalized by the pilots would 

make it difficult to overcome the effects of negative transfer. 

183 

!

3.2.2. Motion Realism 

Motion is essentially perceived by the visual organs. APS/H can 

therefore do without a motion system. Nevertheless, vibrations 

typical of helicopter flying will be generated by Pitting a 

vibration device to the pilot seat. 

3.2.3. Visual System Realism 

The APS/H needs a visual system for the following purposes: 

+ flight attitude-related visual feedback 

+ visual cueing for landing approaches 

+ projection of obstacle images for nap-of-the-earth flying 

t projection of images of prominent terrain features for visual 

navigation. 

All these images will be schematic in nature. It should be clear 

that no gain will be made in dealing with the above-mentioned 

tasks.by adding the image of leaves to the trees simulated. 

3.2.4. Acoustic Realism 

Realism in motion cueing will be enhanced by a realistic 

simulation of environmental sound patterns. This creates the need 

for an acoustic system with duemy head microphone quality via a 

head set. 

All in all, a detailed analyzis shows that the level of realism 

required for the APS/H need not be extraordinarily high to serve 

its purpose. Especially in the field of visual systems design, 

schematic images will do and thereby reduce overall costs. 

4. Summary 

Conventional test methodes (paper/pencil etc.) are firmly 

established tools to be applied in all phases of pilot candidate 

selection processes, but it should be borne in mind that their 

validity is limited and cannot be improved considerably as can be 

seen from the experiences gained, An in-depth analysis of the 

psychic potential required for meeting flying demands presupposes 

the esistencc of methods that are in keeping with real flyilhg 

demands and, additionally, permit the application of scientificexperimental 

criteria. When one loolcs at physicists who, in,sekrch 

of minute particles, venture to demand equipment of inconceivable 

dimensions and are actually provided with it, then it is justified 

to say that the outlined APS/H designed to study behavioral 

Patterns of helicopter pilots is fairly modest demand. 

184

Analyzing User Interactions 

With Instructional Design Software 

J. Michael Spector 

Daniel J. Muraida 


Brooks AFB, TX 78235-5601 

Abstract 

Many researchers are attempting to develop 

automated instructional design systems to guide subject 

matter experts through the courseware authoring 

process. What appears to be lacking in a number of 

existing research and development efforts, however, is 

a systematic method for analyzing the interplay between 

user characteristics, the authoring tool's structure 

and organization, and the resulting quality of 

computer-based instruction (CBI). This paper describes 

the initial application of a particular approach that 

focuses on the analysis of inputs, processes, and 

outputs that occur in human-computer interactions (HCI) 

between end users and a prototype of a CBI design tool. 

Instructional Systems Design (ISD) is an established process 

for designing and developing instructional materials. ISD models 

were first elaborated in the 1950's using a behavioral learning 

paradigm and have since undergone many revisions and refinements 

(Andrews & Goodson, 1980). Traditionally, ISD has,been viewed as 

the practical application of knowledge about learning and tasks 

to be learned to the design of instruction (Gagne, 1985). 

Many Researchers have pointed out the need to provide an 

update of ISD based on the findings of cognitive science 

(Tennyson, 1989). What is also needed is an update of ISD that 

takes into account computer-based interactive methods for 

presenting instruction (Muraida, Spector, & Dallman, 1990). 

Using computers to design, develop, and deliver instruction 

complicates ISD considerations. Some instructional strategies 

appropriate for certain classroom-based settings are not 

appropriate for certain computer-based settings. For example, 

some common classroom strategies involve the teacher making 

provocative statements and asking leading questions. Likewise, 

it is possible to construct alternate computer models of various 

devices and simulate their performance; this is not easily 

possible in a classroom. As a result, instructional strategy 

differences exist between classroom and computer settings. 

In addition, the design of computer-based instruction (CBI) 

185 

.

must be accomplished with great care. In a classroom, there is 

usually an alert and experienced teacher to compensate for 

unclear or inadequate instructional presentations. In a computer 

setting, it is essential that the initial instructional be clear: 

otherwise, the instruction is likely to fail. Courseware is 

computer software that is designed for instructional purposes. 

Courseware that is not carefully designed is most likely to be 

expensive and ineffective (Jonassen, 1988). As a consequence, to 

make optimal use of CBI it will be necessary to develop 

techniques for evaluating the success and efficiency of various 

ISD methodologies applied in computer-based settings. 

Problem 

CBI has proven to be an appropriate instructional solution 

in many settings (Hannafin and Peck, 1988). CBI has also proven 

to be expensive and often ineffective (MacKnight 61 Balagopalan, 

1989). What is needed, then, is a means to insure that CBI 

course designs are effective and produced in a cost-effective 

manner. 

There are two aggravating factors to this problem: 1) It is 

often true that courseware developers have had no special 

training in computer-based methodologies, and 2) It is not 

completely clear what cognitive aspects of learning are best 

instructed using various computer-based methodologies. In short, 

in determining how to optimize CBI developments it will be 

necessary to determine how novice and experienced CBI developers 

interact with the courseware authoring environment, and it will 

also be necessary to evaluate the success of the resulting 

courseware. 

The methodology proposed below represents an attempt to 

build an initial model of CBI authoring that can eventually be 

used as a predictor of success when combining particular 

courseware authoring environments, CBI developers, subject 

matter, and student populations. The Air Force Human Resources 

Laboratory (AFHRL) is interested in refining this model in order 

to evaluate the usability of transaction shells (Merrill, Li, & 

Jones, 1990) in the Advanced Instructional Design Advisor (AIDA), 

an automated and integrated set of tools to facilitate and guide 

the process of developing effective courseware (Muraida t 

Spector, 1990). 

The AIDA project focuses on the design and development of 

CBI (Spector, 1990). It is assumed that the Air Force will 

continue to expand its use of CBI, that the Air Force will 

continue to experience a shortage of courseware authors with 

backgrounds in instructional technology, and that the subject 

matter of immediate interest is maintenance training for 

apprentice level maintenance personnel. 

To provide CBI design guidance consistent with these 

assumptions, AFHRL has decided to pursue the use of intelligent 

186

lesson templates. Intelligent lesson templates have preestablished 

instructional parameters and are executable upon 

input of informational content by a subject matter expert. In a 

sense, intelligent lesson templates l'know how" to present the 

kind of instruction they contain. Experienced instructors can 

alter the instructional parameters in order to customize 

instruction. The most noteworthy intelligent lesson templates 

are Merrill's transaction shells (Merrill et al., 1990). 

AFHRL and Merrill signed a Memorandum of Agreement wherein 

Merrill loaned two transaction shells to AFHRL for purposes of 

evaluation. AFHRL is using these transaction shells to develop a _ 

model of CBI authoring interactions that affect the productivity 

and the quality of developed CBI courseware. 

Methodology 

The purpose of the initial evaluation study of Merrill's 

transaction shells was to develop a working model of user 

interactions with instructional design software. In addition to 

determining if Merrill's transaction shells with particular user 

interfaces were worthy of refinement and continued development, 

the aim was to establish an initial model with relevant 

characteristics that predict user success with other authoring 

environments. 

The answer to the question about the value of using 

transaction shell technology is that transaction shell technology 

appears to provide a very usable and productive courseware 

authoring environment. Details are elaborated in subsequent 

sections of this report. 

The primary question, however, concerned the establishment 

of a model of courseware authoring interactions that would 

influence the productivity and quality of a CBI authoring 

environment. Because all of the relevant characteristics were 

not known ahead of time, an approach that allowed iterative 

refinement of a quantifiable and predictive model was required. 

Falk's soft modeling technique satisfied this requirement and was 

used to guide the design of the study (Falk, 1987). 

The initial phase of developing a soft model consists of 

identifying inputs, processes, and outputs that are relevant to 

the task being modelled. Weighted links between input and 

process measurements and output measurements are then 

hypothesized. Additional subjects are then tested using the 

proposed tentative model. The model and its associated measures 

and weights are modified to reflect the outcome of new subjects. 

New input, process, or output measurements may be added as deemed 

necessary in the model development phase. Over time, the model 

stabilizes and can be used as a predictive or analytical tool. 

Initial input measures for this soft model included the 

187

following: instructional experience, subject matter experience, 

computer experience, and cognitive style. Some of this data was 

gathered by direct questioning and was easily quantified (e.g., 

number of years of teaching experience). Cognitive style was 

determined by questioning and by observation, and was not as 

easily quantified. Some aspects of computer experience were 

easily determined by questioning (e.g., number of computer 

courses taken), but other aspects were not as straightforward 

(e.cbI level of expertise with an operating system). 

Initial process measures for this soft model included the 

following: time spent on an authoring event, sequence chosen for 

authoring events, number of revisions attempted and accomplished, 

and purpose of revisions. Again some of these processes were 

easy to measure and to quantify, but other processes were more 

difficult to assess. For example, it was easy to measure how 

long an author spent indicating the particular function of a 

device that was part of the lesson content. However, determining 

the purpose of a particular revision without interrupting the 

integrity of the authoring process was more difficult. The only 

way to accomplish this was to note that a revision had been made, 

look at the revision, if its purpose was not obvious (e.g., 

correcting a misspelled word was an obvious revision), then the 

author was asked after the session about the purpose of the 

revision. 

Initial output measures for this soft model included the 

following: total time to produce the lesson module, total cost 

to produce the lesson module, student achievement on tests, 

retention, student motivation concerning the material, level of 

interactivity of the lesson, instructor motivation to use the 

authoring environment in the future, and peer review by other 

instructional developers. Once again some of these measures are 

direct and straightforwardly quantifiable (e.g., total 

development time, student scores, etc.), while some are indirect 

and more qualitative (e.g., instructor and student motivation). 

The initial subject was observed completing a lesson module 

to teaching the names, locations, and functions of 125 parts in 

the T-37 cockpit. Subject's experience was determined in an 

extensive interview prior to the study. Subject's motivation was 

observed throughout the study. In addition, the subject was 

queried midway through the study concerning his progress and 

problems encountered. The subject also kept a diary of authoring 

events, including problems encountered and general impressions. 

Results 

The relevant input measures of the subject were as follows: 

1) Medium instructional experience, 2) High subject matter 

experience, 3) Low computer experience, and 4) Reflective 

cognitive Style with a self-directed locus of control. A formula 

for connecting each of these factors with output measures is 

currently being developed and will be tested in the second 

_ _ ^ -..-.-.- 

188

iteration of the evaluation study. 

The relevant process measures were as fOllOWS: 1) 4.75 

hours in introductory exercises, 2) 14.25 hours in on-line 

authoring, 3) 11.83 hours in off-line design and planning, 4) 10 

groupings, nested 3 levels deep, with a total of 21 lesson 

modules, top level module completed first, teaching 125 parts, 5) 

20 picture files identified and utilized, with minor revisions 

requested for 4, 6) Approximately two minor revisions per module, 

7) Approximately 5 minutes of debugging per individual module, 

and 8) Complete linkage of all modules into a course module in 20 

minutes. This data was collected by observation. The software 

has since been modified to collect and record this data 

automatically (Canfield & Spector, 1990). 

The relevant output measures were as follows: 1) 30.83 

hours in total development time (graphics were produced by 

support personnel and graphic production time is not included), 

2) 3 plus hours expected for student instructional time, 3) cost 

data not available, 3) student scores and motivation not 

available, 4) medium level of interactivity, 5) high instructor 

motivation (wants to be included in follow-on studies, and 6) 

acceptable.guality of courseware (will be administered to cadets 

in lieu of current instruction). 

The subject's diary and responses to interview questions 

indicated a sustained high level of motivation and satisfaction 

with the authoring tool in spite of known deficiencies 

(occasional mouse failures). The subject experimented with 

default instructional parameters during the exercises but rarely 

changed the defaults for the instruction he developed. More 

specifically, the subject chose timed presentations for the 

student practice interaction rather than learner control. The 

subject also modified the default testing parameters to reflect 3 

samples per item instead of 2 and a criterion level of 75% 

instead of 90%. In addition, the subject altered allowable 

interactions per individual lesson as appropriate, which 

reflected complete understanding of the transaction shell 

environment. 

Conclusion 

This initial study prompted the addition of automatic data 

collection for both instructors and students to the transaction 

shell software. The general results indicate a high level of 

acceptability and productivity using transaction shells to author 

courseware. Assessment of the quality of the CBI produced has 

yet to be completed, although initial data collection on student 

performance is underway. 

Initial indications are that students require in excess of 

3 hours to complete the course module. This means that the 

subject's development time to instruction time ratio using this 

tool was approximately 1O:l. Using traditional authoring tools 

189 

_

for this type of material (ignoring the time to create graphics) 

woul _~.-d have involved a 2OO:l development to instruction time ratio 

(Lippert, 1989). Both the tool and the model are worth refining. 

References 

Andrews, D. H. & Goodson, L. A. (1980). A comparative analysis 

of models of instructional design. Journal of Instructional 

Desicrn, 3(4), 2-16. 

Falk, R. F. (1987). A Primer for Soft Modelinq. Berkeley, CA: 

University of California Institute for Human Development. 

Gagne, R. M. (1985). The Conditions of Learnins and Theory of 

Instruction. New York, NY: Holt, Rinehart, and Winston. 

Jonassen, D. H. (Ed.) (1988). Instructional Desians for 

Microcomputer Courseware. Hillsdale, NH: Lawrence Erlbaum 

Associates. 

Hannafin, M. J. & Peck, K. L. (1988). The Desisn, DeveloDment, 

and Evaluation of Instructional Software. New York: NY: 

Macmillan Publishing Company. 

Lippert, R. C. (1989). Expert systems: Tutors, tools, and 

tutees. Journal of Computer-Based Instruction, 16(l), ll- 

19. 

MacKnight, C. B. and Balagopalan, S. (1989). Authoring systems: 

Some instructional implications. Journal of Educational 

Technoloav Systems, 17(2), 123-134. 

Merrill, M. D., Li, Z., & Jones, M. C. (1990). Second generation 

instructional design (ID2). Educational Technoloav, 30(2), 

7-14. 

Muraida, D. J. & Spector J. M. (1990). The advanced 

instructional design advisor (AIDA): An Air Force project 

to improve instructional design. Educational Technolosv, 

30(3), 66. 

Muraida, D. J., Spector, J. M., & Dallman, B. E. (1990). 

Establishing instructional strategies for advanced 

interactive technologies. Proceedinss of the 12th Annual 

Psvcholosv in the DOD Symposium, 12(l), 347-351. 

Spector, J. M. (1990). Desianinq and Develonins an Advanced 

Instructional Desiqn Advisor (Technical Report AFHRL-TP-90- 

52). Brooks AFB, TX: Training Systems Division. 

Tennyson, R. D. (1989). Cosnitive Science Undate of 

Instructional Svstems Desian Models (AFHRL Contract NO. F-F- 

F3365-88-C-0003). Brooks AFB, TX: Training Systems 

Division. 

L.?-. -.- -. -. 

190

MILITARY TWTING ASSOCIATION 

iYH) Annuiil timkrcnw 

FORECASTING TRAINING EFFECTIVENESS (FORTE) 

Mark G. Pfeiffer and Richard M. Evans 

.--- .-- - -- - 

Naval Training Systems Cent.er and Training Ferformance Data C:ent.er 

Orlando, FL 

A model was developed to simulate a variety of aviation training device 

evaluation outcomes. This simulation model is designed to explore sources 

of error threatening the sensitivity of device evaluations. Selection of ._ 

evaluation designs is guided by a model that elicits information from 

experienced flight instructors. This practical knowledge is transformed 

into data that are used i,n simulating a training effectiveness evaluation. 

Effects of variables such as instructor leniency, task difficulty, and 

student ability are estimated by two different methods. Available in the 

output is an estimate of transfer ratios based on trials-to-mastery, a 

diagnosis of deficiencies, an exploration of possible sources of variance, 

and an estimate of statistical power and required sample size. Finally, all 

data analyses can be accomplished in less than 2 man-days and prior to the 

actual field experiment. Estimates of accuracy, reliability, and validity 

of the model are high and in an acceptable range. 

Backwound 

Major sources of error variance that can mask the true contribution of 

a training device to training effectiveness include instructor leniency, 

student ability, and task difficulty (McDaniel, Scott & Browning, 1983). 

First, instructors' grades are often unreliable criterion measures. Next, 

individual abilities among students vary widely. Finally, tasks vary 

greatly in difficulty level. Some tasks can be mastered by students in one 

or two trials, while others may require 30 trials. These sources of 

variance make ratings of students' performance insensitive measures of 

training device effectiveness. However, the magnitude can be identified 

with sensitivity analysis prior to actual field experiments. 

Sensitivity Analysis 

Sensitivity analysis is a planning technique (Lipsey, 1983) which 

focuses on the impact of variance on variables of interest. The device 

evaluation must be carefully planned if the results are to have practical 

value and show a true difference between experimental and control groups. 

During the planning phase for device evaluations an investment in time may 

help identify the problems that introduce unwanted error variance into the 

device evaluation. Performance data qenerated by flight instructors can be 

used for this purpose. 

The basic framework of the present "sensitivity" analysis differs from 

that described by Lipsey (1983) in that it employs the "insensitive" 

instructor's rating of students as a performance measure. Lipsey .wouid 

rather seek a more sensitive measure. While this rating measure may not be 

a particularly good psychometric measure, it is dictated by operational 

constraints. Instructors' ratings are used extensively in the transfer of 

training literature. 

“Approved for public release; distribution is 

unlimited.” 

191

SIMJLATION UOOEL 

The model described here is designed to simulate experimental dnd 

quasi-experimental training effectiveness evaluations of aviation devices. 

Values are generated by training experts. Major features of the model 

include the following: 

. programmable for microcomputers 

. extendable to different transfer designs / 

. helpful in planning field experimental and quasi-experimental . 

evaluations of devices 

. possible data collection by computer or by questionnaire. 

Input to the model comes from the ratings made by flight instructors. These 

expert judges make estimates of trials-to-mastery needed in the airplane by 

replacement pilots with and without prior simulator training using different 

device features. Estimates are made by two different methods to permit a 

check on cross-method variance and rater reliability. 

VARIABLES 

In order to gain a perspective of the scope or size of the model it is 

helpful for the reader to examine the levels permitted for key variables. 

These are shown in table 1. The model is designed so that these 1 imits can 

be changed to fit a variety of evaluation designs (Pfeiffer & Browning, 

1984). 

Variable 

Treatment (Xl) 

(Experimental vs. Control) 

Student Ability (X2) 

(Fast-Average-Slow) 

Task Difficulty (X3) 

(Easy-Average-Tough) 

Instructor Leniency (X4) 

(Easy-Average-Tough) 

Table 1 

Model Limits 

192 

Levels Permitted 

I 

/ 

I

DATA INPUT 

Two methods are provided for entering data: the interactive method and 

the additive method. The data from both interactive and additive methods 

are compatible with the following evaluation design: (Xl) treatment, (;:I 

student ability, (X3) task difficulty, and (X4) instructor leniency. 

combinations of two levels for Xl and three levels for X2, X3, and X4 

require 54 data elements. 

Interactive Method 

An expert is asked to estimate the trials required for a replacement 

pilot to achieve mastery in the aircraft for each set of training conditions 

listed in table 2. These estimates are made twice: first, for the 

experimental group (e.g., with prior simulator training) and second for the 

control group (e.g., without simulator training). Training conditions and 

the data collection instrument for the interactive method are illustrated 

below as table 2. 

Table 2 

Interactive Questionnaire Instrument for Estimating Trials-to-Mastery 

ESTIMATED 

CONDITION INSTRUCTOR STUDENT TASK TRIALS 

1 Easy Fast Easy 

2 Easy Fast Tough 

3 Easy Slow Easy ,_ 

4 Tough Fast Easy 

5 Easy Slow Tough 

6 Tough Fast Tough 

7 Tough Slow Easy 

8 Tough Slow Tough 

The model calls for data on trials-to-mastery for the 27 combinations of 

conditions describing the experimental group and the 27 combinations of 

conditions describing the control group, a total of 54 conditions. Training 

experts need only estimate trials for eight conditions in each group, a 

total of 16 conditions. The remaining 38 values (representing the 

difference between 16 and 54) are estimated by a regression subroutine in 

the model. 

193 

.

Valuable time of experts is saved by having the model compute intermediate 

data elements. 

Paramters. The parameters identified in table 3 were selected to make 

the model flexible, i.e., capable of simulating conditions where the 

relative importance of the variables listed can be changed at will by the 

analyst. By using a computer terminal, the analyst may input alternative A, 

8, C, D, E, or F to establish the relative importance of the variables in 

determining expected trials-to-mastery. Relative importance of these 

variables is expected to vary from one aircraft community to another. 

, 

Table 3 

Parameters for Weighting Trials-to-Mastery 

Parameter Relative Importance 

Addftive Method 

A Instructors Students Tasks 

8 -Students Instructors Tasks 

Tasks Instructors Students 

s Instructors Tasks Students 

Students Tasks Instructors 

s Tasks Students Instructors 

The mean trials-to-mastery for the experimental and control groups, 

obtained by the interactive method, are used as a basis for the values used 

in the the additive method. Here the same expert is asked to estimate 

trials-to-mastery for each of the conditions one at a time. The questions 

are phrased as deviations around the mean trials-to-mastery (table 4). 

Training experts estimate six conditions in each group, a total of 12 

conditions. The remaining 42 values (representing the difference between 12 

and 54) are estimated by the computer model according to the rules of 

additive conjoint measurement (Lute & Tukey, 1964). 

Reljabilitv Check 

Since each training expert is asked for inputs to the model by two 

different methods, a check on methodological variance is possible by 

correlating the values obtained by the interactive and additive methods (N = 

54). This correlation fs computed across methods for experimental and 

control groups. 

SUMMtYOFMOOEL FLOU 

Input, output, and 

figure 1 and figure 2. 

interactive aspects of the mode1 are summarized in 

194 

4

10 WE 1ASI 

YlslRUc70n IS 

PROMPlED 10 SELECT- 

, 

INTERACllVE METHO 

t 

i 

EXPERIMENTAL GROUP 

I 

I 

I 

I 

TO BaSTER” 

CONTROL GROUP 

ADDITIVE MET”OD 

Figure 1. Model flow and data eethating procedure. 

ANALISIS OF EXPERWENT*L 

A”0 CONTROL OUWPS- 

. DlFFERENCLS 

. CORRELAllONS 

. TAAHIFER R*noS 

I 

, 

Flgure 2. Anelysie at � xperlmental and con?ruf group8 and data storage. 

195 

1

Table 4 

Additive Questionnaire Instrument for Estimating Trials-to-Mastery 

IF AN AVERAGE STUDENT REQUIRES *N* TRIALS TO LEARN TO 

MASTERY, HOW MANY TRIALS WILL A . . . FAST LEARNER REQUIRE? 

. . . SLOW LEARNER REQUIRE? 

IF AN AVERAGE INSTRUCTOR REQUIRES *N* TRIALS TO TRAIN 

STUDENTS, HOW MANY TRIALS WILL . . . AN EASY INSTRUCTOR NEED? 

.., A TOUGH INSTRUCTOR NEED? 

IF *N* TRIALS ARE NEEDED FOR AVERAGE TASKS, HOW MANY 

TRIALS WOULD... . . . AN EASY TASK REQUIRE? 

. . . A TOUGH TASK REQUIRE? 

VALIDATION AND APPLICATION 

The model was validated in the helicopter community using a concurrent 

validation design. Criterion data for the simulation were collected during 

an experimental evaluation of Device 2F64C, an SH-3 simulator located at the 

Naval Air Station, Jacksonville, Florida. Trials-to-mastery obtained from 

the simulation model were compared with the trials-to-mastery obtained from 

the field experiment (Evans, Scott & Pfeiffer, 1984). 

SUBJECTS AND PROCEDURE 

Thirteen flight instructors currently involved in training pilots in 

Device 2F64C were asked to estimate trials-to-mastery by two different 

methods. The subjects, one at a time, made their estimates at a computer 

terminal. One half-hour per subject was required to complete both the 

additive and interactive rating tasks. 

VARIABLES 

Four independent variables (shown following ) were included in the 

validation desi n: (XI) device feature, (X2) student ability, (X3) task 

difficulty and PX4) 

instructor leniency. All combinations of two levels for 

Xl and three levels for X2, X3, and X4 produced 54 data points for a 

re ression analysis against estimated trials-to-mastery. Trials-to-mastery 

(Yy in the 

aircraft was the dependent variable. 

EVALUATION DESIW SE#SITIVIM 

The usual purpose of a device feature evaluation is to extract the 

variance due to the device features, e.g., visual and motion vs. motion 

onlY.The modeled data can also be used to do a power analysis of the onetrial 

difference (actually A = 1.04) between device features. Power 

analysis provdes an estimate of the sample sizes needed to demonstrate that 

this one-trial difference (experimental mean = 4.61, SD = 1.83: control 

mean = 5.65, SD = 2.07) is reliable (pfeiffer, Evans & Ford, 1985). 

196

The linear model indicates that the smallest amount of variance is 

accounted for by device features (.07). The combined other sources of 

variance: Instructor leniency, student ability, and task difficulty, 

(.21+.27+.42*.90) are predicted to mask out the variance due to the device 

features. Evaluators could also artificially change their ratings to 

reflect the impact of anticipated evaluation design changes. A 

reexamination of summary statistics would permit evaluators to assess the 

impact of hypothetical design modifications on the anticipated outcome of 

the device evaluation. 

DISCUSSION 

Using data from a simulation model, the training effectiveness analysis 

estimated that the one-trial difference between training under the visual 

plus motion condition and motion alone would not be statistically 

significant with a reasonable sample size (Ott, 1977). This outcome ofothe 

model was confirmed through analysis of actual field data (Evans, Scott & 

Pfeiffer, 1984). With this insight, from the model, the evaluator of a 

.device would know in advance that control of task difficulty, student 

ability, and instructor leniency in a field experiment would be necessary to 

increase statistical power. True training effects attributable to the 

device features are more likely to be revealed when extraneous errors are 

controlled. Cochran and Cox (1957) have presented a theoretical discussion 

of this problem. Instructors' rating variance, for example, may be 

controlled by utilizing a standardized method for identifying when the 

student has achieved mastery (Rankin & McDaniel, 1980). Some criterion 

measure other than instructors' ratings could also be employed. A specific 

example is automated performance measurement on the tactical range, which 

unfortunately is not widely available for scientific measurement of ai,rcraft 

in free flight. However, performance measurement is available in flight 

simulators. Computer-aided techniques for providing operator performance 

measures have been provided by Connelly, Bourne, Loental and Knoop (1974). 

coNausION 

This study shows that flight instructors who have knowledge of a 

training situation . but who are not necessarily proficient with the 

intricacies of research design and statistics can provide data useful for 

planning a field experiment (device evaluation). The programs described 

herein are "user-friendly" and resident in a portable microcomputer. Should 

the computer be unavailable, a questionnaire could be used (Appendix). The 

utility of this approach depends, in part, on asking the right questions for 

a particular training environment and in part on developing the responses to 

such questions into meaningful information. The model just described has 

provided that utility for the present situation. Additionally, this model 

may be easily adapted to other training problems involving expert ratings(see 

Pf'eiffer and Horey, 1988). 

197 

_

REFERENCES 

Cochran, W. G., & COX, G. M. (1957). Exoerimental designs. New 

York: John Wiley. 

Connelly, E. M., Bourne, F. J., Loental, D. G., & Enopp, P. A. 

(1974). Comnuter-aided techniuues for orovidina operator 

performance measures. (AFHRL-TR-74 87). Dayton, OH: 

Wright-Patterson Air Force Base. 

Dawes, R. M. (1979). The robust beauty of improper linear models 

in decision making. American DSvChOlOUiSt, 34, 571-582: 

Evans, R. M., Scott, P. G., & Pfeiffer, M. G. (1984). SH-3 

helicoDter fliaht traininq: An evaluation of visual and 

motion simulation in Device 2F64C. (Technical Report 161). 

Orlando: Training Analysis and Evaluation Group, Naval 

Training Equipment Center. 

Lipsey, M. W. (1983). A scheme for assessing measurement 

sensitivity in program evaluation and other applied research. 

Psvcholoaical Bulletin, 94, 152-165. 

Lute, R. D., SI Tukey, J. W. (1964). Simultaneous conjoint 

measurement: A new type of fundamental measurement. Journal 

of Mathematical Psvcholoav, 1, l-27. 

McDaniel, W. C., Scott, P. G., t Browning, R. F. (1983). 

Contribution of nlatform motion simulation in SH-3 helicoDter 

pilot traininq. (Technical Report 153). Orlando: Training 

Analysis and Evaluation Group, Naval Training Equipment 

Center. 

Ott, L. (1977). An introduction to statistical methods and data 

analysis. North Scituate, MA: Duxbury Press. 

Pfeiffer, M. G., & Browning, R. F. (1984). Field evaluations of 

aviation trainers. (Technical Report 157). Orlando: 

Training Analysis and Evaluation Group, Naval Training 

Equipment Center. 

Pfeiffer, M. G., Evans, R. M., & Ford, L. H. (1985). Modelinq 

field evaluations of aviation trainers. (Technical Note l-85. 

Orlando: Training Analysis and Evaluation Group, Naval 

Training Equipment Center. 

Pfeiffer, M. G., t Horey, J. D. (1988). Forecastinu traininq 

device effectiveness: Three devices. (Technical Report 

88-028). Orlando: Naval Training Systems Center. 

Rankin, W. C., & McDaniel, w. c. (1980). Camouter aided traininq 

evaluation and scheduling (CATES) system: Assessinu fliuht 

task proficiency. (Technical Report 94). Orlando: Training 

Analysis and Evaluation Group, Naval Training Equipment 

Center. 

,...-.” .._ 

198

Cost-Effectiveness of Home Study using Asynchronous 

Computer Conferencing for Reserve Component Trainings2 

Ruth H. Phelps, Ph.D. 

Major Robert L. Ashworth, Jr. 



Heidi A. Hahn, Ph.D. 

Idaho National Engineering Laboratory 

Abstract 

The resident U.S. Army Engineer Officer Advance 

Course was converted for home study via asynchronous 

computer conferencing (ACC). Students and instuctors 

communicated with each other using computers at home, 

thus creating an ttelectronic classroom". Test scores, 

completion rates, student perceptions and costs were 

compared to resident training. Results showed that: 

ACC performance is equal to resident and costs are less 

than resident. 

Geographical dispersion, limited training time and civilian 

job and family demands make travel to resident schools for 

training and education difficult for the Reserve Component (RC). 

Not only is it a hardship for soldiers to leave jobs and family, 

but their units are unable to conduct collective training when 

soldiers are absent. In addition, training soldiers at resident 

schools has become so costly that HQ TRADOC has proposed a 50% 

reduction in the number of soldiers traveling to resident 

training by 2007 (TRADOC PAM 350-4). 

The purpose of this paper is to summarize an investigation 

of an alternative means for meeting the educational requirements 

of the RC. The goals are to (1) develop and test a new training 

option, using asynchronous computer conferencing (ACC), that 

1 These data are summarized from Hahn, H., Ashworth, R., 

Wells, R., Daveline, K., (in preparation). Asynchronous 

Cornouter Conferencinq for Remote Delivery of Reserve 

Comnonent Traininq (Research Report). Alexandria, VA: U.S. 

Army Research Institute for the Behavioral and Social 

Sciences. 

2This paper is not to be construed as an official Department 

of the Army document in its present form. 

199

would not require soldiers to leave their homes and units and 

yet maintain the quality of training typically found at the 

branch school: (2) determine the cost-effectiveness of 

developing and operating the ACC alternative. 

Asynchronous computer conferencing is a means for 

communicating from different locations at different times (i.e., 

asynchronously) using a computer network. For training 

purposes, an llelectronic classroom*‘ is established by connecting 

all students with each other and the instructional staff. A 

student or instructor can participate in the classroom from any 

location using existing telephone lines and a computerequipped 

with a modem. Students can work together in groups, ask 

questions of the instructors, tutor their classmates or share 

their thoughts and experiences. Instructors can direct 

individual study, conduct small group instruction, answer 

questions, give remedial instruction and provide exam feedback 

to the students. 

Participants 

Method 

Fourteen RC officers (13 males: 1 female) took Phase III of 

the Engineer Officer Advanced Course (EOAC) by ACC homestudy. 

For comparison purposes, performance data were collected from 

RC students taking the same course in residence at the U.S. Army 

Engineer School from October, 1986 to June, 1989. 

The instructional staff consisted of a civilian full-time 

course manager/administrator responsible for the overall 

operation of the course and three part-time instructors. The 

part-time instructor responsibilities included directing group 

discussions, remedial instruction and/or monitoring student 

progress. 

Course Descriotion 

Course materials consisted of Module 6 of the EOAC (66 

program hours of instruction). Media used included paper based 

readings and problems, computer-aided instruction, video tapes 

and computer conferencing discussion. Topics covered were Army 

doctrine (e.g., rear operations), technical engineering (e.g., 

bridging, flexible pavements), leadership and presentation 

skills. The program of instruction was identical for the ACC 

and resident classes. 

200 

.

- ---.. 

Eouinment, Procedure and Data Analysis 

- - 

Each student was provided with an IBM XT computer with 20 

megabyte hard disk, color monitor and printer. Software and 

courseware loaded on each computer consisted of: (1) a 

specially developed course management system and communications 

package; (2) computed-assisted instruction and tests; (3) word 

processing package; (4) spreadsheet. 

Communication software for asynchronous computer 

conferencing was provided through U.S. Army Forum, Office of the 

Director of the Army Staff. The host computer was located at 

Wayne State University and used the CONFER II conferencing 

software system. I 

The course was conducted from September, 1988 to April, 

1989. Students were mailed all their computer equipment with 

written assembly and operation instructions and course 

materials. In addition they were provided with a toll free 

"hot line" telephone number for resolving hardware/software 

problems. The first lessons to be completed were self-conducted 

and designed to familiarize the student with the operation of 

the computer and software. Scores for computer training were 

not included in overall course grades. 

Part-time instructional staff were provided the same 

equipment and software as the students. In addition they were 

given a 40 hour training course on operating the hardware/ 

software, instructional responsibilities and 

teaching/motivational techniques. Instructional staff and 

researchers met together to conduct this training using a 

combination of lecture and hands-on practice with the computer. 

There were four types of data collected: (1) test, 

practical exercise and homework scores: (2) pre- and post course 

student perceptions of their amount of knowledge on the course 

topics: (3) course completion: (4) cost of converting and 

executing the course. Comparisons of the resident to the ACC 

course were made using multivariate analysis variance procedures 

for a two-group design. 

Results 

As shown in the top of Table 1, there was no reliable 

difference between the test scores of students in residence 

versus ACC. A comparison of the students' self ratings of their 

level of knowledge before and after the course, showed that the 

ACC group had significantly greater gains in their perceived 

amount of learning, as shown in the bottom of Table 1. 

Completion data showed that 95% of resident students completed 

the course compared to 64% of the ACC students. 

201 

-.

Table 1 

Student Scores and Ratinqs 

Scores Resident Sisnificance 

Tests 

Homework 

Practical 

Exercise 

92.0% 86.4% NS 

88.8% 92.0% NS 

90.4% 89.9% NS 

Perceived Amount 33% 12% PC.05 

Learned 

(% Post-Pre) 

Cost data were computed separately for (1) converting an 

existing course for delivery by ACC and (2) executing each 

iteration of the course. If the conversion were done 

by within-government staff, then the cost would be approximately 

$296,100. If it were done under contract, then the cost is 

estimated at $516,200. Start-up costs of equipment purchase and 

instructor training were estimated to be $73,100 for withingovernment 

and $96,000 for contractor. Costs that will recur 

with each iteration were estimated at $234,400 for withingovernment 

and $420,900 for contractor. 

SK 

so00 

5500 

5000 

4soo 

4cQo 

3500 

3004 

2500 

2000 

1SW 

loo0 

so0 

0 

Fiaure 1. Relative costs of EOAC alternatives over 10 course 

iterations. 

202

Figure 1 shows the total course conversion, start-up plus 

the recurring costs over 10 course iterations. Initially 

resident and ACC (within government) are similar with ACC 

(contractor) costs being nearly twice as much. However, when 

'the costs of conversion and execution are amortized, ACC 

(contractor) becomes less costly than resident training after 

four course iterations. After five iterations ACC (within 

government) would save 47% and ACC (contractor) would save 6%. 

Cost-effectiveness ratios were computed by combining the 

cost and completion rate data. The ratio was greatest for ACC 

using government staff (.64), second for resident training . . . 

(.41), and lowest for ACC using contractor staff (.36). 

‘% Discussion 

It has been shown in this report that there is a costeffective 

alternative to sending RC soldiers to branch schools 

for resident training. Training by ACC can be conducted just as 

effectively and for less money. Thus, this technology appears 

to meet the need of the RC to complete educational requirements 

from the home or homestation, without long absences from the 

unit. The llelectronic classroom" could be conducted remotely 

from existing educational institutions such as the branch school 

and/or the U.S. Army Reserve Forces School in order to maintain 

standardized instruction. 

Additional research is needed, however, to improve the 

completion rate for ACC home study. Reasons for dropping out of 

the experimental course were related to limited time due to 

competing activities such as civilian jobs and family. A means 

of predicting which soldiers are likely to succeed or drop out 

of home study will assist Army trainers in both selecting 

students and providing assistance for those at high risk. 

References 

U.S. Army Training and Doctrine Command. (1989). Army Training 

2007. (TRADOC Pamphlet 350-4). Ft. Monroe, VA: Author. 

203

TEST DESIGN AND MINIMUM CUTOFF SCORES 

Sandra Ann Rudolph, Training Appraisal 

Chief of Naval Technical Training 

INTRODUCTION 

It has become increasing obvious in the last few years that 

the United States government cannot continue to operate with little 

concern for who will pay the bill. The apparent message is to 

do better with less. This means we must become more efficient in 

our way of conducting business. For many of us--our business, 

is training. Being efficient means we must use our resources 

wisely for the purpose intended. In training our resources are 

numerous--training devices, curriculum, instructors --while our 

purpose is solitary--provide the training necessary for graduates 

to perform in the fleet. While performance is the key, there is 

background knowledge that is necessary for the trainee to grasp the 

performance. 

BACKGROUND 

In the training environment of yesterday, where money was no 

object, training was easier. There was little concern for statistical 

evaluation, effectiveness, or efficiency. We trained by 

the seat of our pants--experience wasn't the best teacher, it was 

the ONLY teacher. Today, lack of attention in these areas could 

mean loss of training dollars. One of the big areas of concern 

deals with attrition--or the dropping of trainees from,a designated 

training program. While there are many causes for attrition, 

recent attrition analysis visits to such schools as Air Traffic 

Control School, Music School, and Boiler Technician /Machinist Mate 

School, indicate that testing programs may be at the very core of 

many of our problems. The following questions were used to 

determine how knowledge testing was being used to measure success: 

(1) Have critical course objectives been identified with 

corresponding emphasis on testing? 

(2) Have the knowledge tests been designed to measure the 

objectives to the learning level required? 

(3) How was the minimum cutoff score for the knowledge 

tests determined? 

(4) Has the test design and cutoff score been validated? 

(5) Have alternate versions of the tests been developed 

that are consistent with the valid test design? 

It became apparent that testing was a problem. It was 

discovered that the emphasis and training had been placed on 

individual test-item development and test-item analysis, not on 

test development and test analysis. In other words, there was no 

assurance that the objectives were being tested nor any evidence 

on how the cutoff score was determined. To standardize the 

approach to test design, the following process was established: 

204

(1) Determine criticality of the objectives. 

(2) Determine test design. 

(3) Establish a minimum cutoff score. 

(4) Validate the test. 

Criticalitv of the obiectives 

DISCUSSION 

The objectives of a course are those behaviors the trainee is 

expected to exhibit upon completion of training. Regardless of the 

method of development, objectives are established with var.ying ._ 

degrees of importance or criticality. Therefore determining the 

importance of the objectives must occur prior to designing the 

test. While there is not an established set of procedures to 

determine criticality, 'the following examples have proven to be 

valid. 

(1) Rank orderins of obiectives. Subject matter experts rank 

the objectives from the most important to least important. This 

method is most useful when courses have a few number of objectives. 

(2) Yes or No. Subject matter experts determine critical-ity 

by responding ltYesl' or *rN~ll. The greatest disadvantage to this 

approach is that some critical objectives are more critical the 

others and visa versa. 

(3) Criticality based on a scale rankinu. This method uses 

a set of questions to guide in determining criticality. 

(a) How important is this behavior to successful 

performance in the fleet? 

(b) How difficult is the behavior to learn? 

(c) How important is the behavior to successful 

performance in the course? 

A scale is normally established as O-5 or O-10. Based on the 

above or similiar questions, each objective is reviewed by subject 

matter experts; a number value assigned and the average calculated. 

The objectives are then ranked. Objectives falling above the 

established cutoff are considered critical. The cutoff score is 

normally a number based on the scale used. For example, any 

objective ranked 3 or above on a O-5 scale might be considered 

critical. This number will vary and is based on the individual 

course and its mission. This method provides the most complete way 

to determine criticality. The disadvantage is that 

it may be complicated and time consuming. 

205

Test Design 

As with any research project, the researcher must have a plan. 

Without this plan, the researcher would be looking for information 

with little or no direction. The test design is a plan for 

ensuring that the objectives are tested and a plan for measuring 

the student's success in accomplishing the objectives. 

The process of designing a test builds upon the previous step 

of determining criticality of the objectives. 

There is no proven scientific method to determine the exact . 

test design. 'It is an opinion based on experience. This opinion 

can be strengthen through consensus. Therefore the design must be 

based on the ideas of several subject matter experts and not one or 

two individuals. If a consensus cannot be reached, then an average 

should be taken. Consensus should be an underlying concern 

throughout the test design process. Consensus of the right persons 

improves the chances of producing a valid test. 

Sten One. Group the objectives in the order in which they 

will be tested. Factors to consider are: 

(1) The difficulty of the material needed to accomplish the 

objective. 

(2) The length of the material needed to accomplish the 

objective. 

For more difficult material, fewer objectives should be 

grouped for testing purposes. For example, an objective that is 

very difficult to accomplish may require individual testing, while 

several simpler objectives may be tested together. The longer it 

takes to teach the objective, the fewer objectives should be tested 

together. For example, an objective that is taught in three days 

may ,require individual testing while the objective that is taught 

in three periods may be tested with other objectives. 

Step Two. Determine the number of test items per objective. 

The concern is to have enough test items on a test to ensure the 

measurement of each objective. Several factors to consider are: 

(1) Criticalitv of the obiective. The more critical the 

objective, the more items may be required. 

(2) Tvne of obiective tested. If the test is comprised of 

both critical and noncritical objectives, normally the critical 

objectives should contain more items. The more items asked, the 

more confident that the trainee has grasped the objective. 

(3) Number of obiectives tested. If the test contains 

several objectives, be aware of total number of items on the test 

and the time constraints. 

206

(4) Length of the material tested. If an objective can 

be taught in three periods, it should require fewer test items 

than the objective that is taught in three days. 

(5) Difficultv of the material. When the material is very 

difficult it may require fewer items written to a much m o r e 

difficult level. 

Step Three. Determine the level of learning of the test 

items. Depending on the status of the curriculum, test items may 

or may not already be available. While several levels of learning 

exist, the following five levels are suggested for use: . _ 

(1) Knowledcte. Test items that measures a student's 

ability to identify or recall specific terms, facts, rules, etc. 

as they are taught. Knowledge represents the lowest level of 

learning for a test item. 

(2) Comprehension. Test items that measure a student's 

ability to grasp the meaning of material. This may be done by 

interpreting, explaining, or translating information. This is a 

higher level of learning than knowledge, but the lowest level of 

understanding. 

(3) Amlication. Tests items that measure the student's 

ability to use learned material in new and concrete situations. 

This type of test item requires a higher level of understanding 

than comprehension. 

(4) Analysis. Test items that measure the student's 

ability to break down material into components so that an 

organizational structure may be understood. This may require the 

identifcation of parts, analysis of relationships between parts, 

and recognition of the organizational principles involved. These 

types of test items represents a higher level of learning than 

comprehension and application because they require an 

understanding of both the content and the structural form of the 

material. 

(5) Evaluation. Test items that measure the student's 

ability to judge the value of material for a given purpose. The 

judgements are based on definite criteria. This type of test 

item represents the highest learning level because it contains 

all the other categories. 

When determining the learning level that the test item 

should be written to, the objective must be reflected. The 

following factors should be considered: 

(1) Test items must be written that support the objective. 

This means that if the objective calls for a basic knowledge of 

the material, the test items should be written to the knowledge 

learning level. 

207

- __-.-. -.. .~ .-... _ .._... 

‘(2) If the objective calls for an understanding of the 

material, then the test item should be written to one of the 

higher learning levels. 

(3) If the objective calls for a higher learning level 

not all test items should be written to the highest level. 

Enough must be on the test to ensure that the student has met the 

objective to the learning level required. 

Step Four. Select appropriate test items from the test bank 

or develop test items. If a test bank is already in existence, 

each item must be cross-referenced to the objective it 

supports and a level of learning identified. 

If the test bank 

does not have an adequate number of items, new items may be 

required. If it appears that new items that support the objective 

are difficult to prepare, the plan may need to be altered. 

Stels Five. Establish a minimum cutoff score. Setting a cutoff 

score means that a point must be determined that differentiates 

between the student that has achieved the objective and the student 

that has not. If the first four steps have been followed, it is 

safe to assume that the test has content validity. If there is any 

doubt, the test should be reviewed again before attempting to 

establish the minimum cutoff score. 

Setting a cutoff score, as with the other steps is a judgemental 

process. While several methods of establishing the minimum 

cutoff score exist, the following methods are suggested. 

METHOD 1 

(1) A panel of subject matter experts are selected based on 

their current knowledge of the job and the performance required 

of the graduate in the fleet. 

(2) A discussion should be conducted centering around what 

is the minimally competent person. Caution should be taken not to 

allow one person to dominate the discussion and that the goal 

should be one of consensus. The discussion is designed to get all 

members thinking along the same lines. 

(3) Next, the technique of establishing the cutoff score 

should be explained. 

(a) Review each test item on the test. 

(b) Check items that the student with minimum 

competency should be expected to know. Care 

should be taken that this is not what the average 

student will know, or what the subject matter 

expert would like for them to know. 

(c) If there are any items that the student must know, 

these items will be noted. 

(d) Add the number of checks for each objective. 

208 

.

(e) Average the total responses and this becomes the 

minimum cutoff score for the objective. 

METHOD 2 

(a) With this method, subject matter experts determine 

the percentage of the students that should answer 

the test item correctly. Again this is dealing 

with minimum competency. 

(b) An average of the percentages yields the minimum 

cutoff score. 

Regardless of the method used, there is never any hard and 

firm criteria for what is competency and what it is not. Some 

students are clearly competent based on their scores. Some 

students are clearly not competent based on their scores. There 

is a certain group of students that may meet the cutoff score and 

not be competent. There is normally an equal number of students 

that do not meet the cutoff score that are competent. 

In the final analysis of the cutoff score, it comes to a 

decision concerning which is the greater danger; to fail a 

qualified person or to pass an unqualified person. For progress 

tests, it is probably alright to pass an unqualified person. For 

exit exams, particularly when safety is a factor, it is probably 

better to fail a qualified person than to pass an unqualified 

person. 

Step Six. Validation process. Content validity has already 

been established. Validating the minimum cutoff score is a process 

achieved over time by administering the test and plotting the 

scores. If the scores indicate that most all students are passing, 

the cutoff score may be too low. This is true only if non 

competent students are passing. If all the students who pass are 

competent, then the cutoff score may be acceptable. If the scores 

indicate that most students fail, the cutoff score may be too high. 

SUMMARY 

In conclusion, the process is being tested. Training has been 

provided to all the sites where attrition analysis visits have been 

conducted. Since the training is recent, it is difficult to assess 

what impact the process has had on attrition. While attrition has 

been lowered in each case, it is not possible to pin point any 

specific cause. One thing we feel confident with is that this 

process leads to better test validity and that the objectives are 

being measured to the degree specified. 

REFERENCES 

Grondlund, Norman (1985). Measurement 

MacMillian Publishing,-New York p. 515. 

209 

. .

_-.-. 

Subjective and Cognitive Reactions to Atropine/2-PAM, 

Heat, and BDU/MOPP-IV 

John L. Kobrick, Richard F. Johnson, and Donna J. McMenemy 

US Army Research Institute of Environmental Medicine 

Natick, Massachusetts 01760-5007 

The current US armed forces nerve agent antidote is a 

combination of 2 mg atropine sulfate (atropine) and 600 mg 

pralidoxime chloride (2-PAM) administered by paired intramuscular 

injections. Although these drugs provide good physical 

protection, they have side effects which could lead to adverse 

subjective reactions and Impaired performance (Taylor, 1980). 

The major physiological reactions to atropine alone 

(Marzulli & Cope, 19501, and to atropine in combination with heat 

stress (Kolka, Stephenson, Bruttig, Cadarette, & Gonzalez, 1987) 

have been identified. Effects on psychological, perceptual, and 

cognitive behavior are less clear, although some performanceoriented 

studies have been reported (Baker, et al., 1983; Moylan- 

Jones, 1969; Penetar & Henningfield, 1986; Wetherell, 1980). The 

physiological effects of P-PAM alone and in combination with 

atropine have also been studied (Holland, Kemp, & Wetherell, 

19781. Much less is known about associated psychological and 

perceptual effects (Headley, 19821, although such knowledge is 

essential in view of their paired use as the standard nerve agent 

antidote. 

Chemical warfare in tropic and desert areas also creates 

problems due to heat stress, especially when troops must wear 

MOPP-IV chemical protective clothing, since the total 

encapsulation of that ensemble traps heat and body moisture. 

This paper reports subjective symptoms, mood changes, and 

cognitive performance observed during a research project on the 

effects of heat exposure, atropine/2-PAM administration, and 

wearing of both the BDU and MOPP-IV ensembles. The overall 

project consisted of two separate studies which were identical 

except that the BDU ensemble was worn in one of the studies, and 

the MOPP-IV ensemble was worn in the other study. 

Study 1. Effects of Atropine/2-PAM and Heat on Symptomatic, Mood, 

and Cognitive Reactions While Wearing the BDU Ensemble 

Method 

Fifteen male soldiers, ages 18-32 years, were screened 

medically and were tested for normal vision and hearing. They 

were trained intensively 6 hours dally for 5 consecutive days on 

a battery of performance tasks and then performed the task8 on 4 

separate test days, each day corresponding to one of the 

following experimental test conditions: (a) control (saline 

placebo, 70°F 121.1°C1 30% RH); (b) drug only (2 mg atPOPine, 

600 mg 2-PAM, 70°F E21:1°C3 

(saline placebo, 95'F [3!j"cj, 

30% RH); (c) ambient heat only 

60% RH); and (dl drug and ambient 

. .

-- ---. ----- --- --.._ _ .I-. I 

heat (2 mg atropine, 600 mg 2-PAM, 95OF c35°C1. 60% RH). On each 

test day, the soldiers received either atropine/2-PAM or 

equivalent volumes of saline placebo, injected into the thigh 

muscle by 22-gauge syringes. Drug conditions were double-blind; 

however, the study medical monitor knew the identities of both 

drug and placebo participants. Test days were separated by at 

least three days for recovery from the preceding drug conditions. 

Daily testing began 30 min after drug administration. 

Participants attempted to complete three cycles of the 

performance tasks each testing day, and performed until either 

they withdrew voluntarily or were removed by the medical monitor-. 

Cycles began at standard 2-hr intervals to maintain uniformity of 

daily heat exposure. Participants were allowed to drink water ad 

lib from standard military canteens; lunch and snacks were 

omitted. 

Three subjective tests were administered periodically during 

each experimental session: (a) the US Army Research Institute of 

Environmental Medicine Environmental Symptoms Questionnaire (ESQ; 

Kobrick a( Sampson, 1979), as modified by Kobrick, Johnson, and 

McMenemy (19881 ; (bl the Profile of Mood States (POMS; McNair, 

Lorr, & Droppelman, 19811; and (cl the Brief Subjective Rating 

Scale (BSRS; Johnson, 19811. The ESQ is a self-rating inventory 

for sampling subjective reactions and medical symptomatology 

during exposure to environmental and other stressors. The'POMS 

is a rating scale of 65 items to assess 6 mood states (tension, 

depression, anger, vigor, fatigue, confusion). The BSRS 

appraises subjective feelings of warmth, discomfort, and 

tiredness on separate rating scales by selection of descriptive 

words or phrases. The ESQ and POMS were given once at the end of 

each daily session. The BSRS was given once at the beginning of 

each session (30 min post-injection) and once at the end of each 

cycle (150, 270 and 390 min post-injection). 

Participants performed the following cognitive tasks in each 

2-hour testing cycle: (11 verbal reasoning - judging the 

correctness of grammatical transformations (Baddeley Grammatical 

Reasoning Test, 1968); (2) simple reaction time - pressing a key 

to the onset of a signal 1,lght; (3) choice reaction time - 

pressing one of two keys to the onset of one of two signal 

lights; (4) digit -symbol substitution - substituting code symbols 

for their symbol counterparts (Digit Symbol Substitution Test, 

Wechsler, 18551; (51 speech intelligibility - correctly 

identifying spoken words among other similar words (Modified 

Rhyme Test, House, et al, 19651. 

Results 

The group mean ratings for each of the 68 ESQ items in each 

of the four test conditions showed the fewest severe symptoms in 

the control condition. The two heat conditions generated more 

symptoms, and different patterns of incidence related to heat. 

Atropine/2-PAM generated high ratings on symptoms usually 

attributed to those drugs (dry mouth, thirst). Drug/heat, the 

most severe test condition, generated the greatest number of high 

211 

_

atings. Headache and lightheadedness were reported only under 

drug/heat. 

Two-way (Temperature x Drug) analyses of variance (ANOVAs) 

on the POMS ratings showed significant main effects for both 

temperature and drug, acting to Increase tension (F(l,14) = 

5.36,E

On the POMS, two-way ANOVAs for repeated measures on each of 

the scares showed significant drug and temperature main effects 

and significant Drug x Temperature interactions, indicating that 

the drug led to feelings of tension (F(1,7) = 7.06, ~

performed, to elicit early reactions prior to withdrawl. The ESQ 

and POMS could not be analyzed in this manner because they were 

.given only once at the end of each test day. Significant 

temperature main effects were found for warmth (F(1,7) = 37.19. 

B

Journal of Clinical Pharmacology, 2, 367-368. 

House, A. S., Williams, C. E., Hecker, M. H. L., & Kryter, 

K. (19651. Articulation testing methods: Consonantal 

differentiation with a closed response set. Journal of 

the Acoustical Society of America, 37, 158-166. 

Johnson, R. F. (19811. The effects of elevated ambient 

temperature and humidity on mental and psychomotor 

performance. In Handbook of the Thirteenth Commonwealth 

Defense Conference on Operational Clothing and Combat 

Equipment (pp. 152-1533). Kuala Lumpur, Malaysia: 

Government of Malaysia. 

Kobrick, J. L., Johnson, R. F., & McMenemy, D. J. (1988). 

Nerve agent antidotes and heat exposure: Summary of 

effects on task performance of soldiers wearing BDU and 

MOPP-IV clothing sy‘stems (Technical Report Tl-89). 

Natick, MA: US Army Research Institute of Environmental 

Medicine. (DTIC Accession No. A 206-2221 

Kobrick, J. L., Johnson, R. F., & McMenemy, D. J. (1990). 

Effects of nerve agent ,antidote and heat exposure on 

soldier performance in the BDU and MOPP-IV ensembles. 

Military Medicine, 155, 159-162. 

Kobrick, J. L., & Sampson, J. B. (1979). New inventory for 

the assessment of symptom occurrence and severity at 

high altitude. Aviation Space and Environmental 

Medicine, 50, 925-929. 

Kolka, M. A., Stephenson, L. A., Bruttig, S. P., Cadarette, 

43. s., & Gonzalez, R. R. (19871. Human thermoregulation 

after atropine and/or pralidoxime administration. 

Aviation S ) 58, 545-549. 

Marzulli, F. N, & Cope, 0. P. (19501. Subjective and 

objective study of healthy males injected 

intramuscularly with 1, 2 and 3 mg atropine sulfate 

(Medical Division Research Report No. 241. Aberdeen, 

MD: US Chemical Corps, Army Chemical Center. 

McNair, D. M., Lorr, M., & Droppelman, L. F. (19821. EITS 

manual for the Profile of Mood States. San Diego, CA: 

Education and Industrial Testing Service. 

Moylan-Jones, R. J. (19691. The effect of a large dose of 

atropine upon the performance of routine tasks. British 

Journal of Phsrmacoloffy, 37, 301-305. 

Penetar, D. M., & Henningfield, J. E. (1986). Psychoactivity 

of atropine in normal volunteers. Pharmacology and 

Biochemistry of Behavior, 24, 1111-1113. 

Taylor, P. (1980). Anticholinesterase agents. In A. G. 

Gllman, L. S. Goodman, & A. Gilman (Eda.), The 

pharmacological basis of therapeutics (6th ed., pp.lOO- 

,119). New York: Macmillan. 

Wetherell, A. (1980). Some effects of atropine on short-term 

memory. British Journal of Clinical Pharmacology, 10, 

627-628. 

215

GUTS : A BELGlAN GUNXER ?'ESTl?;G SYSTF2-I 

F. LESCREVE 

W. SLOWACK 

CRS - Belgian Armed Forces CRS - Belgian Armed Forces 

Brussels 

Brussels 


To fclfil the need of expert-selection for gunners, the Belgian Army 

has developed a selection-simulator. First a job analysis of different 

weapon systems was completed. This was the base for the construction of 

GUTS. Different physical and psychological stressors are important. 

2. Theoretical Background 

GUTS is constructed from a holistic point of view. We chose to put 

the candidate gunners in a complete, real life-like situation instead of 

confronting them with different subtasks from the gunners job, one at 

the time. 

3. Job Analysis 

Following weapon systems were carefully observed to extract the 

crucial taskcomponents': 

- Leopard-tank 

- CVRT (Combat Vehicle Reconaissance Trackted) 

- GEPARD (Anti-Aircraft tank) 

- HAWK (Anti-Aircraft Missile) 

- JPK (Jacht Panzer Kanone) 

- AIFV (Armored Infantry Fighting Vehicle; 

- MILAN (Missile Leger Antichar > 

a. Tasks 

The following tasks were common to practically all the weapon 

systems. 

1. Knowledge of procedures. _- 

2. Ranging and target recognition. 

3. Target engagement and acquisition. 

4. Target identification. 

5. Choice of ammunition. 

6. Loading of ammunition. 

7. Tracking and firing. 

The working space of the gunner was measured. For the construction 

of the simulator we took the average of the measures of the different 

weapon systems. 

216

. Stressors 

An anal.ysis was made of the possible physical and psychological 

stressors. 

1. Physical Stressors 

1. Limited working space. 

2. Iieat caused by instruments, engine, clothing. 

3. Vibration due to the vehicle movements. 

4. Noise, especially in a war situation. 

5. Darkness. 

2. Psychological Stressors 

1. Overload of information, visual and auditory. 

2. Permanent concentration needed. 

3. Time presscre. 

4. Unexpected events. 

5. Feeling of isolation;, 

6. Feeling of claustrofobia 

C. Ability and Aptitude Requirements 

Based on the tasks analysis and the inventory of the different 

stressors, it is clear that several ability and aptitude requirements 

the 

are needed for beeing a good gunner. As we shall see later, 

different requirements are also needed for a good performance in GUTS. 

Requirements : 

- Learning ability. 

- Memory. 

- Reaction time. 

- Motor coordination. 

- Stress management. 

- Concentration. 

4. Construction of GUTS 

The aim was to incorporate the different tasks and stressors in the 

selection-simulator. Ke did this in the following design. 

16 

I

a. The Cabine 

For the size of the working space we used the average of the 

measurements of the different weapon systems. Following sizes were taken 

into consideration; the depth of the working space, the space for the 

head movements, size of the seat, distance between look-hole and the 

handle, distance between head and top of the cabine, space for the legs 

and distance between seat and top of the cabine. 

b. The Instruments 

The instruments in the simulator are life-like copies of real 

instruments. We discuss here only the most important ones, with are 

indicated at the figure. . 

1. Lookhole : ynrough the lookhole you can see the screen of the 

computer on with you see a landscape with the 

different: targets. You also see the circle for the 

engagement of the targets and the reticle for the 

tracing of the targets. 

2. Identification box :This box is used for identifying the targets 

: 6 possibilities. 

3. Control box : With this box you start the whole procedure. 

4. Radio box : Here are the headphones for the candidate connected. 

In this way the candidate recieves his weapon 

control orders. There are also a lot of disturbing 

sounds and non important conversations on the radio. 

5. Ammunition box : To choose the type of ammunition depending on 

distance and sort of the target : 4 possibilities. 

6. Heating device : By means of a thermostat the temperature in the 

simulator raises to 30” C. 

11. Loudspeakers : The loudspeakers at the bottom of the cabine 

produce war-sounds. By making low frequency sounds 

they produce a disturbing vibration. 

15. Handle : The candidate must use the handle in order to engage 

a target with the circle he has on his screen, 

tracking a target by means of the reticle and firing 

with the fire buttons. The handle can move in all 

directions. 

__ 

The candidate -gunner has to wear a gasmask. This is connected by a 

tube to a valve. Every five minutes the air supply is cut off for five 

seconds. 

5. The Testsession 

a. Learning the Testinstructions 

The goal of the testsession is explained to the candidate. He gets a 

description of all the instruments. He has to learn the different- kind 

Of tanks, the ammunition, the identification procedures and the 

engagement procedures. Special attention is payed to the weapon control 

orders (WCO). 

218

. The Test 

After a short demonstration of the instruments of the simulator, the 

candidate' puts on a gasmask and a battle dress and climbs in the cabine. 

The actual test consist of 3 identical cycles. Every cycle has 4 

periods, one period for each WCO. The test takes 30 minutes. 

C. The Engagement-firing Cycle 

To eliminate a target, a candidate must follow a strict procedure. 

1. Engagement of the target : steering the circle on the target 

and pushing the engagement buttons on the handle. 

2. Selection of ammunition, depending upon the kind of target and 

the distance of the target. 

3. Loading the ammunition 

I. Confirmation of the ammunition 

5. Aiming by putting the reticle on the target. 

6. Firing, by pressing-the firing buttons on the stearing gear. 

The engagement-firing cycle must be repeated for each target. The 

computer (APPLE MAC II) registrates all the actions of the candidate. 

d. Discussion 

A tes.tsession in GUTS is a heavy experience. This is caused by the 

physical and the psychological stressors. The candidates come out 

sweating. 

6. Results 

The candidates are scored in following categories : 

a. Time between appearance of a target and firing at a target. 

b. Decision errors : engaging the wrong target. 

c. Manipulation errors : for example choosing the wrong ammuilition. 

d. Procedure errors : errors made in the engagement-firing cycle. 

e. Numbers of hits. 

7. Validation of GUTS 

In 1991 a study will be carried out concerning the reliability and 

validity of GUTS. 

219 

.

CHAKAC'l'EKIZlNCi Kfi:jPON!~ELj TO fi’l’HE!;S UT-LLL% ING DOSP: EQUlVALENCY ME’L’1i01~0LOC,Y 

Robert S. Kennedy, Essex Corporation: William P. Dunlap, Tulane University: 

Janet J. Turnage, University of Central Florida: 

Jennifer E. Fowlkes, Essex Corporation*, Orlando, FL 

1NTRODUCTlON 

One of the chief problems in quantifying the effects of stressors on 

operational performance, such as heat and combat stress, is the lack of 

reliability in the criterion tasks. To circumvent the problems which hinder 

development of a quantitative definition of workforce performance decrement, 

we offer two methodologies: surrogate measurements and dose equivalency 

testing. 

Surroqate Measurement 

Insufficient attention to reliability can lead to attenuated validities, 

reduction of statistical power, higher sample size requirements, increased 

cost of experiments, and when hazard or discomfort is involved, human use 

problems. These problems focused us on development of highly reliable measure 

sets such as may be obtained with microcomputer-based mental acuity tests 

(Kennedy, Baltzley, Lane, Wilkes, & Smith, 1989; Kennedy, Wilkes, Lane, & 

Homick, 1985). We recognized these are separate from the operational 

criteria, but highly similar to the criteria in skill requirements. We 

reasoned that, if the measures correlate well with the criteria and behave 

similarly under changing task conditions, perhaps they could be used as a 

surrogate in place of the criteria. We called this approach “surrogate 

measurement” ( Lane, Kennedy, & Jones, 1986) and listed requirements for 

surrogate tests as those which are related to or predictive of real-world 

performances but are not actual measures of the performance per se. In our 

plan, surrogate measures are composed of tests or batteries that exhibit five 

characteristics: 

1. Stable so that the “what is being measured” is constant: 

2. Correlated with the operational performance; 

3. Sensitive to the same factors that would affect performance as the 

performance variable would: 

4. More reliable than field measures; and 

5. Involving minimal training time. 

An obvious candidate for a surrogate to measure military performance would 

be the Armed Services Vocational Aptitude Battery (ASVAB). ASVAB scores are 

used to determine eligibility for various military occupational specialties 

based on construct validity and continuing programs of empirical studies. The 

tests of the ASVAB also have considerable content and criterion-related 

validity, including training performances at military formal schools as well 

as operational performance studies. In at least one case (Wallace, 19821, 

performances during war games with tank forces were correlated with subtest 

scores from the ASVAB better than with any other variable in the study. But 

the ASVAH is not meant to be administered repeatedly. If it could be shown 

that the ASVAB was highly correlated with a repeated measurement test battery. 

- 

*Dr. Fowlkes is now employed at Engineering and Economics Research, Orlando, FL 

. m-,-i-. ..- _.... _ _,, 

220 

f

then by the principle of transitivity (things equal to the same thing are 

equal to each other), one might link changes in the surrogate with changes in 

the operational performance about which one wished to make statements. 

Dose Equivalency 

A second methodology that can be employed in studies of real-world 

performance is called “dose equivalency.” Dose Equivalency is a strategy used 

in conjunction with surrogate measures in order to quantify degradation of 

operational performance by selecting an indexing agent(s) or treatment and a 

set of target performance tasks. Then graded “dosages” of the indexing agent 

are administered and performance decrements as a function of the indexing 

agent are marked or calibrated against the various dosages. One is left with 

a functional relationship between an agent and performance(s). 

This strategy has been applied in a study we conducted using different 

dosages of orally-administe,red alcohol (Kennedy, Baltzley, Lane, Wilkes, & 

Smith, 1989). Alcohol was selected as the indexing agent for several 

reasons : 1) alcohol is known to be a global depressant, having wide-ranging 

and well-documented impacts on performance and operational readiness has been 

directed to the identification and calibration of what are to be considered 

“safe” and “unsafe” doses of this agent, 3) equipment and assay procedures are 

readily available for calibrating both blood alcohol levels (BALI and alcohol 

detected in expired breath (breathalyzer) and 4) because alcohol is widely 

used, it is feasible to administer to male subjects who, by self-report, use 

alcohol to a moderate degree, thereby obviating potential threat to volunteers 

and meeting requirements for ethical treatment of subjects in human 

experimental research. 

EXPERIMENT 1 - APTS AS SURROGATE FOR ASVAB SUBTEST 

In this analysis, 16 women and 11 men ranging in age from 18 to 38 were 

tested with a synthetic Armed Services Vocational Aptitude Battery (ASVAB) 

(Steinberg, 1986), and the microcomputer-based assessment used was the 

Automated Performance Test System (APTS), which is fully described in Kennedy 

et al. (1989). Seven of the tests used were from the original APTS (Pattern 

Comparison: Two-Finger and Nonpreferred Hand Tapping: Code Substitution: 

Simple Reaction Time: Grammatical Reasoning: and Manikin) and four additional 

subtests were selected from the Unified Tri-Service Committee Performance 

Assessment Battery (UTC-PAB) (Englund, Reeves, Shingledecker, Thorne, Wilson, 

h Hegge, 1987). 

The most dramatic findings were the consistently high reliabilities of the 

battery subtests; the smallest reliability was 0.85, which in our judgment is 

sufficient for statistical power and differential purposes. 

Scores on the performance battery were averaged across the seven trials 

and then correlated with the subscales and total score from the ASVAB. 

Multiple regression was used to examine the predictive power of the battery as 

a whole on the total ASVAB criterion. The multiple R was 0.94 (R2 = 0.881, 

and, even when corrected for shrinkage, the multiple H was 0.88. This 

indicates that when shrinkage owing to the particular sample used is taken 

into account, 7’1% of the ASVAB variance is explained by performance on the 

battery subtests. 

221 

.

A second mult ip1.e regression ana1ysi.s was conducted including the 

candidates’ surrogate performance subtests - those that would be used in the 

second study - Code Substitution, Pattern Compar.ison, Grammatical Reasoning, 

Manikin, Math PrOCeSSirlg , TWO Hand Tapping, Non-Preferred Hand Tapping, and 

Reaction Time. The multiple R was -92, indicating that we lost very little 

common variance with the ASVAB by using the shortened surrogate battery. 

Method 

EXPEK~MENT ~-INDEXING AGENT (ALCOHOL) ADMINISTEKED 

TO EXPERIMENTAL SAMPLE* 

Subjects. Male students, ranging in age from 21 to 42, were recruited as 

subjects. Acceptable candidates were those indicating some, but not 

excessive , experience with alcohol, no past history of chronic dependency of 

any types good general health, and indications of low risk for future 

alcohol-based problems. Students indicating problem family histories of 

chemical abuse/dependency and/or past personal histories of chemical 

abuse/dependency were advised not to participate. Various paper-and-pencil 

and computer software materials were employed in screening and assessing the 

individual subjects and are discussed in detail elsewhere along with &he 

criteria employed in Kennedy, Baltzley, Lane, Wilkes, and Smith (1989). 

Microcomputer testing was conducted with eight identified NEC PC8201A 

microprocessors, and the Intoximeter Model 3000 breath analyzer was used to 

estimate alcohol concentrations in the blood. 

Procedure 

Alcohol was consumed in a group setting with subjects completing the 

drinking between several minutes to slightly more than one hour. Each subject 

was brought to 0.15 BAC and monitored as the descending limb of the BAL curve 

was achieved. 

Upon completing data collection, subjects were returned to supervised 

housing where they were required to stay for the remainder of the evening and 

abstain from further consumption of alcohol. Upon wakening the following 

morning, subjects self-administered one battery of the APTS. This “hangover” 

measure was completed within one hour of wakening and all measures were 

finalized by 9:30 A.M. The hangover measure typically occurred within 13 to 

17 hours of the pre-alcohol APTS measure taken the previous day. 

Results 

The means for each of the APTS performance measures were monotonicallY 

related to blood alcohol levels and all were significant (p < -001). Figure 1 

*depicts the performance measures for a sample subtest (Code Substitution) in 

*Many oE the technical details regarding methods, procedures, and safeguards 

in studying the effects of orally administered alcohol on APTS performance 

wet-e worked out in previous research and are described extensively elsewhere 

(Kennedy. Wilkes, and Rugotzke, 1989). 

222 

i

the order they were obtained. FoLLowing the alcohoL challenge, perEormance 

dropped dramatically on all subtests, then recovered, in most cases in a 

monotonic or near monotonic Eunction, as determined by BAL during the period 

of alcohol metabolism. If one were to choose a single subtest to index BAL, 

Code Substitution would be a likely candidate. For this test it can be seen 

that each change of one hundredth of a percent BAL is indexed by a change of 

approximately 1.5 points on the code Substitution task. 

Figure 1. Code Substitution Number Correct for 

Baseline and Blood Alcohol Levels As Shown 

Formulation of the Quantitative Dose Equivalency Model 

Multiple regression was used to develop scores that maximally predicted 

BAL. The multiple regression between BAL as predicted from all nine surrogate 

battery subtests was 0.77. Subsequent stepwise regression analysis revealed 

that an optimally selected subset of only four oE the subtests produced a 

multiple R of 0.765; therefore, virtually no loss in predictive power resulted 

from use of the shortened battery. When this latter coefficient is corrected 

for shrinkage, R equalled 0.75; therefore, 57% of the variance in blood 

alcohol is predictable from the four subtest battery. The resulting 

regression equation (simplified by rounding to whole numbers) is shown in 

Equation (1): 

BAL = 0.3 - (9CS+2GR+SMP+6TFT)/lOOO, (1) 

where CS, GR, Mp, and TFT, refer to percent decrement from baseline of Code 

Substitution, Grammatical Reasoning, Math Processing, and Two-Finger Tapping, 

respectively. 

223

- --.- -. -..- .._.._ 

To further demonstrate how the Eour test surrogate battery surfaced by the 

above research can serve as a bridge between alcohol (the indexing agent) and 

military performance readiness (the synthetic ASVAB) we computed one further 

regression equation from the synthetic ASVAB data described above. Equation 

(2) predicts the ASVAB (scaled wit.h mean = 100 and SD = 15) from Code 

substitution, Grammatical Reasoning, and Math Processing. Two Finger Tapping 

was not used as its beta weight in the equation was quite low. The equa t ion 

is: 

ASVAB = .92CS + .42MP + .15GR + 26 (2) 

where CS, MP, and GR are raw scores for Code Substitution, Math Processing, 

and Grammatical Reasoning, respectively. Using this equation to fit the data 

from the alcohol study, we can represent the perEormance decrements produced 

by the various BAL levels relative to a metric based on a standardized ASVAB 

as follows. These relationships are shown in Table 1. 

Table 1. Predicted Standardized ASVAB Means and Standard Deviations 

from Surrogate Battery Performance as a Function of Blood Alcohol Level 

BAL Mean ASVAB SD 

Baseline 103.6 12.3 

0.050 104.7 13.4 

0.075 101.8 12.7 

0.100 96.5 13.9 

0.125 90.3 15.9 

0.150 85.9 13.4 

CONCLUSION 

The objective of the effort reported herein was to provide a quantitative 

methodology to permit assessment of performance degradation in humans which 

may result Erom exposure to toxic or stressor agents encountered on the 

battleEield. The scientific literature has shown that performance on the 

ASVAB is correlated with military job performance, and tests from a 

microcomputer test battery have been developed which are sensitive to gases 

like halon and to various toxic agents. Using these relations, the present 

analyses were performed, the results of which are: 

0 Performances on APTS subtests are correlated with subtests of a synthetic 

ASVAB. 

0 Increasing dosages of alcohol result in monotonically greater performance 

decrements on APTS subtests. 

0 The performance decrements can be indexed to percent blood alcohol via a 

linear regression equation. 

0 Performance decrement on APTS can be indexed to performance decrement On 

aptitude tests via a linear regression equation. 

0 Performance equivalency and dose equivalency relationships were 

successfully demonstrated so that: 

224

a regression equation can be created which translates reductions in 

APTS performance due to any treatment (such as an irritant gas or 

psychological stress) into ASVAB equivalent PerfOtmanCeS, and 

a regression equation can be created which translates reductions in 

performance due to such agents or treatments into units 0E percent 

blood alcohol. 

ACKNOWLEDGMENTS 

Support for this research was from the U.S. Army' Medical Research 

Acquisition Activity under Contract DAMD17-89-.C-9135. The views, opinions, 

and/or findings contained in this report are those of the authors and should 

not be construed as an official Department of the Army position, policy or 

decision unless so designated by other documentation. The authors . are 

indebted to Gene G. Rugotzke for conducting the blood alcohol analysis and to 

Robert L. Wilkes for collection of APTS data. 

.p 

REFERENCES 

Englund, C. E., Reeves, D. L., Shingledecker, C. A., Thorne, D. R., Wilson, 

K. P., & Hegge, F. W. (1987). Unified Tri-Service Coqnitive Performance 

Assessment Battery (UTC-PAB): I. Desiqn and specification of the battery, 

Report No. 87-10. San Diego, CA: Naval Health Research Center. 

Kennedy, R. s., Baltzley, D. R., Lane, N. E., Wilkes, R. L. & Smith, M. G. 

(1989). Development of microcomputer-based mental acuity tests: Indexing 

to alcohol dosaqe and subsidiary problems (Final Report, Grant No. ISI- 

8521282). Washington, DC: National Science Foundation. 

Kennedy, R. S., Wilkes, R. L., Lane, N. E. & Homick, J. L. (1985). Preliminary 

evaluation of a microbased repeated measures testinq system, 

Technical Report (EOTR 85-l). Orlando: Essex Corporation. 

Kennedy, R. S., Wilkes, R. L., & Rugotzke, G. G. (1989). cognitive performance 

deficit regressed on alcohol dosage. Proceedinqs of the 11th Tnternational 

Conference on Alcohol, Drugs, and Traffic Safety (p. C-27). 

Chicago, IL. 

Lane, N. E-, Kennedy, R. S. & Jones, M. B. (1986). Overcoming unreliability in 

operational measures: The use of surrogate systems. Proceedinqs of the 

30th Annual Meetinq of the Human Factors Society. Santa Monica, CA: Human 

Factors Society. 

NEC Home Electronics (USA). (1983). NEC PC-8201A Users Guide. Tokyo: Nippon 

Electric Co., Ltd. 

Steinberg, E. P. (1986). Practice for the Armed Services test. New York: Acco 

Publishing Co. 

Wallace, J. R. (1982). The Gideon criterion: The effects of selection criteria 

on soldier capabilities and battle results, Research Memorandum 82-l. 

Fort Sheridan: U.S. Army Recruiting Command, Research, Studies and 

Evaluation Division, Program Analysis and Evaluation Directorate. (NTIS 

No. AD Al27 975) 

225

Job Sets for Efficiency in Recruiting and Training (JSERT)’ 

Jane M. Arabian and Amy C. Schwartz’ 


for the Behavioral and Social Sciences 

Alexandria, VA 

The Army is facing radical changes brought about by the reduction in the siz:.of its 

force. The challenges encountered by the Army will require different and more clllcicnl 

ways of going about the business of recruiting, selecting and classifying young nlcll :md 

women as they enter the service. Changes in enlisted end strength will have a dyn:tmic 

impact on, for example, MOS fill and training seat utilization. In the past, changes in 

authorizations have caused a manpower surplus or shortage in various MOS. The dcla~cd 

entry program (DEP) has not been able to provide enough flexibility to compens:ltc fog 

such near term authorization changes. Therefore, the Army has begun to evalu:ltc the 

potential for a “job sets” concept to improve manpower and personnel managemctll I)!( 

fostering more timely, accurate personnel classification. 

This paper will describe the rationale and tailoring of the JSERT concept to the 

particulars of the Army’s current manpower and personnel environment. The gcneml 

approach was to devise two parallel tracks: 1) the pragmatic identification of occup:l(ions 

(MOS) which would comprise a given “Job Set” and 2) an empirical research progr:lm i’or 

confirming the “Job Sets”, devising a means for selecting an appropriate classification tcs! 

battery, and developing a feedback/appraisal system for the JSERT concept. 

Currently, in the vast majority of cases each recruit receives a contract for a spccii’ic 

occupation, such as M-l turret mechanic. This contract is a legal commitment by the Army 

to provide the individual with the specific training for M-l turret mechanics. This mc:lns 

that if the Army finds that it doesn’t need as many M-l turret mechanics it had cstim:\l4 

or that it needs more Bradley turret mechanics than expected, contracts must he rcnegotiated 

and the individuals involved may decide not to enlist. This may be costly, both 

in terms of dollars and loss of desirable individuals for service. 

The Army has been able to accommodate small discrepancies in its estimates for 

personnel by tapping into the pool of recruits in the Delayed Entry Program (DEl’). 

However, this does not always provide a satisfactory solution; individuals’ contracls still 

need to be honored. Given the anticipated changes in the size of the force and its 

composition, it is expected that it will become even more difficult to estimate accurately 

’ Paper presented at the 32nd Annual Conference of the Military Testing 

Association, November 1990, Orange Beach, ALA. 

’ All statements expressed in this paper are those of the authors and do not 

necessarily express the official opinions or policies of the U.S. Army Research Institute 

or the Department of the Army. 

226

--__-. -. _ 

the Army’s near-term personnel needs and to make up for the differences with the DEP. 

More flexibility in manpower management and personnel assignment is needed. The 

development of “Job Sets”, as described below, would give the Army such flexibility. 

Grouping Jobs 

Many MOS have the same or very similar entry requirements (i.e., Armed Services 

Vocational Aptitude Battery (ASVAB) Aptitude Area [AA] composite score cut-offs and 

physical [e.g., vision] profiles). This is especially true of MOS in the same CMF or Career 

Management Field (such as Mechanical Maintenance). It would therefore seem possible 

to group such MOS into “sets” for recruiting and enlistment purposes. The Army would 

then be able to enlist soldiers as turret mechanics, for example, without specifying, at the 

time of enlistment, which type of turret mechanic training they would receive. This would 

give the Army just that much more manpower management flexibility. Much closer to the 

point of actually filling training seats, the Army would be able to determine which 

individual would receive which specific course of training. 

As with any change to an established system, implementation of this JSERT concept 

will cause disruptions and periods of awkward adjustment. However, the concept does fit 

well with other current Army cost-saving initiatives (e.g., consolidating MOS and reducing 

the number of reception battalions) and appears to offer important benefits. This is not 

to trivialize the adjustments that will need to be made by, for the example, the recruiting 

and especially the training communities. Therefore, several steps have been taken to 

minimize the potential down-side of JSERT-related changes. These measures are described 

below. 

Identification and Coordination With Key Players 

Working closely with the Army’s Manpower and Personnel Management/Enlisted 

Accessions Policy office, key agencies and functions that would be affected by JSERT were 

identified. A “strawman” concept was circulated, briefed and discussed with each key player 

over a four-month period. The concept was refined and modifications were made based 

on the inputs received.. One key refinement was the addition of parallel tracks, one for 

testbed implementation and one for research and development. The tracking will be 

described shortly. 

In addition to exploring the concept with Army personnel, the Air Force classification 

system was also examined. While many of the manpower and personnel issues faced by the 

two services are quite different, the JSERT concept is not drastically different from the 

Air Force’s current classification system. 

After these information gathering and coordination efforts, a key players research and 

planning meeting was held. This provided the opportunity for further explication of the 

JSERT concept, exchange of concerns, identification of roles and responsibilities, selection 

of candidate job sets for a testbed, and the joint determination of milestones for the 

implementation of the testbed. 

227

Parallel Tracks 

-...---- 

Given the general desire for a swift remedy to the manpower management difficulties, 

the development and execution of a comprehensive R & D program to address the issues 

raised by changing the Army’s recruiting, enlisting, and training systems was simply not 

feasible. Therefore, two tracks have been devised for the JSERT concept. 

Track One. The key feature of this track is that it is driven by practical considcrtltiorls. 

In order to put the JSERT concept into practice as quickly as possible, jobs can be formed 

into sets based on “face validity”. A primary concern is to minimize disruption to the CNF 

structure, and to take into account logistic, training and cost considerations. Therefvre, at _ 

least the initial job groupings would involve MOS that currently use the sanie aptitude are:\ 

composites and cut scores, have the same proponents and are trained in the same loution. 

The candidate MOS identified at the key players meeting represent a key milestone of 

the Track One effort. The candidate MOS have been circulated among the appropriate 

proponents for review and comment. Their input will be used to make the final 

determination of job sets for the testbed implementation. 

Track Two. While Track One is getting under way, the Track Two efforts have begun. 

Track Two efforts form an empirical, applied research program characterized by three 

primary features: 1) Within job set validity confirmation, 2) Classification battery 

selection, and 3) System appraisal/feedback. 

The job set validity confirmation consists of developing analytic tools or models to 

determine the fit of jobs with any given set. For example, attribute (skill/ability) or job task 

taxonomies can be used by individuals familiar with the jobs to provide “job profiles” ( e.g., 

identification of the tasks making up a job and their importance or criticality). These 

profiles can then be compared across jobs and a judgement made as to the acceptability of 

the similarities or dissimilarities. If the profiles of the jobs appear too dissimilar or if one 

job stands out as too different from the other jobs then there would be a basis for 

eliminating some job(s) from the set. 

The job descriptions or profiles described above can also be used to help the Army 

identify additional classification tests. The elements of the profiles can be linked to tests 

of individual abilities (i.e., predictor tests). The tests may then be used to help place (or 

classify) individuals into jobs where they are most likely to perform well. 

In fact, as part of Project A, the Army’s comprehensive project to improve the selection 

and classification system, a new battery of predictor tests was developed. So now, in 

addition to ASVAB which measures primarily cognitive ability, the Army has the 

opportunity to assess an individual’s spatial and psychomotor abilities, temperament and 

vocational interests. The additional information provided by these tests can help the Arm> 

make better use of its human resources by improving the match between a soldier’s abili tics 

and the job’s requirements. 

For this part of the JSERT program, research will be conducted to develop a 

methodology for building tailored classification batteries. These special batteries would 

be used for assigning soldiers within a job set to a particular job. The first requirement, 

228

however, would be to determine whether or not special classification testing is warranted. 

Since job performance can frequently be improved by selecting individuals with particular 

skills and/or by training particular skills, the trade-offs implied by testing for selection 

versus training would have to considered. 

Two research efforts are currently planned to develop the tailored classification 

batteries. The first effort will expand upon the Cognitive Requirements Model (CRM) 

developed by Hay Systems, Inc. to include spatial and psychomotor elements. This new 

model, CRM+, will employ decision flow diagrams to guide job experts through the 

elements of the model. Attributes identified in this manner will be compared across jobs 

to determine the similarity and differences in job requirements. The same information will 

also be used to identify classification tests most likely to improve the person-job match 

within the job set. 

The second effort to develop tailored test batteries will build upon the research 

conducted in the Army’s Synthetic Validation Project (SynVal) by the American Institutes 

for Research with Personnel Decisions Research Inc. and the Human Resources Research 

Organization. Subject matter experts will be asked either to identify directly the importance 

and level of attributes for jobs or to identify job tasks from the Army Task Taxonomy. 

Either visual inspection of the resulting profiles or more formal clustering algorithms will 

then be used to compare the profiles. The profile elements may then be matched up with 

the predictor tests. The identified tasks can be linked, using the results of SynVal, to 

attributes and to the tests which measure those attributes (or the directly identified 

attributes may themselves be used to identify the appropriate predictor tests). 

An important consideration of both these efforts will be to develop valid procedures 

that are credible and can be employed by non-scientists. Indeed, it is expected that any 

additional testing that may be adopted by the Army will be administered, scored and used 

in the assignment decision-making process by Army personnel during basic training, prior 

to the start of Advanced Individual Training. Therefore, the procedures must be 

straightforward, non-technical, and cost-efficient. A demonstrable value for administering 

any additional tests (i.e, improved job performance, reduced training time, lower attrition) 

to off-set the costs and inconvenience of specialized testing must be clearly evidenced. 

The third JSERT research focus is on the development of an appraisal feedback system 

for the JSERT “system” itself. The goal of the feedback system is to monitor the 

performance of JSERT, not the job performance of individuals per se. Although ratings 

of performance or training needs may be solicited from supervisors and individual soldiers, 

the ratings would be used for research purposes or for operational changes to the JSERT 

system, not to affect the careers of the rated individuals. The concern is to set up a 

monitoring system so that if jobs change over time or there is a shift in the overall abilities 

of soldiers being enlisted, the Army would have some consistent means of evaluating the 

change, documenting the impact on job performance and notifying the system that some 

action is needed. 

It may be, for example, that initial job analyses did not include some ability which, 

alihough not currently measured by the Army, turns out to be important for job 

performance. The Army may wish to specifically select individuals with this ability, but 

presently there is no mechanism in place that would even uncover the requirement for that 

229

ability, [The closest “system” the Army has for modifying the classification requirements is 

to notice there is some problem, such as high attrition, and then request technical advice 

from AR1 to identify the problem and suggest solutions.] 

The JSERT feedback system would provide a more formal, standardized mechanism. 

The feedback system must be proven to be scientifically valid and reliable for not only 

ensuring that the classification system is working satisfactorily, but also to provide a 

framework for intervention. Feedback results indicating a gap between soldier abilities, for 

example, and job requirements may indicate that a modification of the training curriculum 

is needed and/or that the classification test battery should be altered. The Army will have 

the results of the feedback, in addition to any cost-benefit analyses, upon which to base its 

correction strategy. Basically, the feedback system will create a means for getting 

information about how well the classification battery is working from the field (end-users) 

back to the classifiers. ., 

Conclusions 

While there will be disruptions to the recruiting, selection/classification and training 

systems, changes in roles and responsibilities (e.g., a shift in responsibility from recruiting 

to training commands for managing MOS fill), and modifications to computer programs 

(ATTRS, REQUEST), the potential benefits of the JSERT concept are considerable. The 

concept will: increase the opportunity for MOS consolidation and CMF restructuring, 

reduce MOS codes for recruiting, fit well with efforts to consolidate Basic Training sites, 

increase the potential for improved soldier-job matches (classification), and increase much 

needed manpower management flexibility. 

The project has the support of the Office of the Deputy Chief of Staff for Personnel 

and the selection of MOS for three potential testbeds (Infantry, Quartermaster, and 

Ordnance) is currently being finalized. Although a target start date for the testbed has 

been selected (July 1991), it is not clear how the testbed will proceed. The downsizing of 

the Army together with high recruiting levels means that approximately 30% of the FY91 

accession mission is already in the DEP with contracts for specific MOS training. It is 

conceivable that if this pattern keeps up, it will be very difficult to change over fairlv 

smoothly to the more general enlistment contracts needed to implement the JSERT 

concept. Nevertheless, the portions of the project that can proceed (the research elements) 

are underway. 

.

Development of a New Language Aptitude Eattery 

The Defense Language Institute Foreign Language Center (DLIFLC) is the 

proponent for the current Defense Language Aptitude Battery (DLAB) and is 

also the primary agency with the mission of providing language training for 

DOD military personnel. DLAB currently exists in one form. The range of 

correlations between DLAB and post-training measures of language 

proficiency across different language courses and skill modalities is from 

.25 to .55. DLIFLC is seeking to develop an improved aptitude test that 

would predict the degree to which a potential student will develop language 

proficiency in speaking, reading, and listening skills, and also determine 

the language or languages to which a potential student is best suited. 

This development effort builds upon an extensive database gathered on 

DLIPLC students in a a major ongoing project to identify predictors of 

success in language training and facto:cs associated with the presence, 

direction, and extent of language skill change after training. 

B-ground 

__.. - u 

At initial screening, candidates for language training must attain a 

minimum score on a specified composite of the Armed Services Vocational 

Aptitude Battery (ASVAB) in order to be elgibile to take the DLAB. There 

is some variation in the definition of required ASVAB composites across the 

Services, and certain variations in composite cut scores contingent on 

eventual training assignment. 

Approximately forty different language courses are taught either at 

the DLIFLC Monterey campus or through contract arrangements at other 

training locat ions. The length of basic foreign language courses varies 

from 24 to 47 weeks. 

DLI concentrates on general foreign language skill training with only 

relatively modest specialized training oriented toward specific job 

applications. After graduating from DLIFLC and prior to job assignment, 

military linguists typically receive advanced individual training (AIT) 

building on prerequisite basic language skills. Linguists perform a 

variety of sensitive jobs in signal intelligence, human intelligence, and 

in a liason capability with foreign governments and military forces. 

DLIFLC maintains significant contacts with other government and 

non-governemnt language training schools and universities. These contacts 

have been helpful to DLI in developing instructional systems and measures 

of training success that are relatively general in nature, while allowing 

more specialized training to benefit from the generally high positive 

transfer from basic language skills to more specialized training. 

Previous Research 

- - - - - - -~ - 

Since 1985, DLIFLC has actively participated in a joint research effort 

under the sponsorship of the U.S. Army Intelligence Center and School 

(USAICS), with support from the Army Research Institute for the Behaviorial 

and Social Sciences (ARI). This project known as the Language Skill Change 

Project (LSCP) investigated the following factors: 

1. Optimal predictors of success in language training available at 

initial screening prior to assignment of language training. 

2. Predictors of training success available during training. 

3. Variables related to change in language skills after DLI language 

training. 

The research design involved the collection of an extensive data base 

231

on 1903 Army linguists in four 1inguist military occupational special.ities 

(~0s) who had received language training in Spanish, German, Korean, and 

Russian at DLIFLC. Data collect ed at several points in the career cycle of 

these linguists included the following elements: 

1. ASVAB and DLAB scores at time of selection. 

2. Personality measures, interest inventories, and supplementary 

cognitive measures collected prior to the beginning of language training. 

3. Measures of the extent and nature of motivation to learn foreign 

languages collected prior to and during language training. 

4. Inventories of student learning strategies collected at two 

different times during their language courses. 

5. The Defense Language Proficiency Test (DLPT) , a series of measures 

of foreign language proficiency in listening, reading, and speaking skills 

collected immediately after graduation from language training, after 

subsequent AIT , and at subsequent annual administrations as mandated by 

Army regulations. 

DLIFLC and AR1 coordinated with the Office of Personnel Management 

(OPM) to obtain contractor support to build, collect, manage, and analyze 

the LSCP data base. In order to build upon information derived from the 

LSCP analyses, DLIFLC requested the contractor, Advanced Technologies 

Incorporated (ATI) to submit a plan for the design and development of a 

revised DOD language aptitude battery. The remainder of this paper draws 

heavily from that plan. 

Proposed e f_I_. fdevelopment o r t s 

The following conclusions drawn from the LSCP data analysis were 

relevant to the design of the new battery: 

1. Substantial prediction of success in language training, as measured 

by the DLPT, was afforded by factors not presently considered in language 

selection. 

2. The relationship of predictors to criterion performance differed 

across languages represented in the study, and within individual languages 

across the criterion skills of listening, reading, and speaking. 

Consequently, the AT1 management plan recommended two approaches for 

improving linguist selection and subsequent military linguist performance: 

1. to expand the range of factors considered in predicting success 

beyond those presently reflected in ASVAB composites and the current DLAB. 

2. to attempt to tailor predicton by language and language skill. 

From the very beginning, certain constraints on the development of a 

new language aptitude instrument were recognized. 

First of all, although the new aptitude test will attempt,\to provide a 

more exhaustive assessment of the potential military linguist s 

capabilities, the large-scale nature of its use, the time and resources 

that are likely to be available for its administration, and concerns about 

fatigue on the part of those taking the instrument necessitate that the 

time allotted for the the new test not greatly exceed that of the current 

DLAB. A possible mechanism for achieving maximal efficiency in measurement 

would be to use adaptive testing techniques; however, careful consideration 

would have to be given to the nature and interrelation of the traits 

underlying the abilities to be measured (as yet underspecified), and to the 

hardware requirements of such a system and their implications for test 

administration. 

Secondly, it would not be desirable to test different abilities or use 

different prediction and scoring formulae for every one of the forty 

languages taught in the Defense Foreign Language Program (DFLP). It would 

232

e preferable to group languages into a small number of categories sharing 

similar ability requirements and prediction characteristics. 

Strategy for Development Effort 

-_ -_.. .-. - - --- - - 

The sequence of activities proposed in designing the new aptitude 

battery is depicted graphically in Figure 1. 

- - - - - 

Define L 

MeasUremerit 

Options 

-..__. 

A 

ACTWlY 4 

ldentlfy Language 

Ability Requirements 

Figure 1 

Produce DUB II 

ACTIVITY 6 

-’ GroupLanguages 

By Ability 

Requhemnts 

The first activity listed is to define the components of the criterion 

to be predicted--foreign language proficiency. A first cut might be the 

the traditionally identified skill modalities of listening, seading, 

speaking, and writing; these traditional categories will need to further 

analyzed into much more specific task components. 

The next three activities planned are to identify how languages differ 

on proficiency dimensions, to identify abilities required to develop each 

type of proficiency, and to identify language differences in ability 

requirements. Note that the arrows in Figure 1 do not all point in a 

forward direction toward higher numbered activiites. As explained below, 

these activities are expected to be interactive and iterative processes and 

to involve the synthesis of several types of information into a final 

product based on consensus of project team members. 

The contractor and DLIFLC decided that the accomplishment of Activities 

1 through 6 would be best facilitated by an interdisciplinary approach 

involving a DLI expert in language proficiency testing, a cognitive 

psychologist specializing in the area of foreign language learning, and 

foreign language curriculum specialists with expertise in a wide variety of 

foreign languages. The intent was to combine insights from the traditional 

233 

.

perspective of linguistic analysis with a cognitively oriented analysis of 

the language learning process. The interdisciplinary team is expected to 

to develop a comprehensive list of abilities involved in learning foreign 

language S , including abilities that may be required in some languages but 

not all. On the one hand, this involves insuring that the definition of 

langugage proficiency is detailed enough that the cognitive abilities 

required to develop each category of proficiency can be specified. On the 

other hand, it involves reaching a consensus that the list of abilities is 

broadly applicable across foreign languages and across diverse training 

programs in these foreign languages. 

It is anticipated that the team would draft a preliminary taxonomy of 

abilities as a baseline list of skills which would be iteratively modified 

and improved as attempts were made to apply it to successive individual 

languages. At the same time the team would highlight any features of each 

successive language that experience had shown were relatively hard or 

relatively easy for native English speakers to learn. This process would 

serve both to validate the taxonomy and to identify differences between 

languages in cognitive abilities required. 

In an effort parallel to the validation of

Implementation of Content Validity Ratings 

in Air Force Promotion Test Construction 

Carlene M. Perry 

United States Air Force Academy 

John E. Williams 

Paul P. Stanley II 

USAF Occupational Measurement Squadron 

The USAFOMS has implemented a procedure in which subject-matter experts 

(SMEs) rate the content validity of individual test questions on Air Force 

promotion tests. This paper describes that procedure and assesses its impact 

upon test content and the perceptions of those involved in test,developm.ent. 

Specialty Knowledge Tests (SKTs) are loo-item multiple-choice tests which 

measure airmen's knowledge in their assigned Air Force specialties. Promotion 

to the ranks of staff sergeant (E-5) through master sergeant (E-7) is 

determined solely by relative ranking under the Weighted Airman Promotion 

System (WAPS). Each airman receives a single WAPS score, the sum of six com- 

ponent measures, with the SKT accounting for UP to 22% of the total. Since 

the other components generally yield little variance among individuals, the 

SKT is often the deciding factor in determining who gets promoted. 

SKTs are written by teams of senior NCOs with the assistance of USAFOMS psychologists. 

They are constructed using the content validity strategy of validation. 

The following components ensure a direct and close relationship 

between test items and important facets of the specialty being tested: 1) 

the specialty training standard, an Air Force document which identifies the 

primary duties and responsibilities in a specialty, 2) CODAP-based occupational 

analysis data collected by USAFOMS, and 3) the expertise of the SMEs 

themselves. 

Content validity is thoroughly documented for the more than 400 SKTs in the 

USAF inventory. However, the documentation is predominantly qualitative 

rather than quantitative in nature, as is the norm with tests based on this 

strategy. Test developers at USAFOMS felt that a quantitative means of assessing 

content validity would be useful to improve their feedback to test 

writers and to make it possible to study test resulfs longitudinally to help 

identify problem tests. 

Lawshe (19751 developed just such an approach, the first to focus on content 

validity in a quantitative, rather than a qualitative manner. His method 

called for a panel of subject-matter experts (SMEs) to independently rate 

test items using the following scale: 

Is the skill (or knowledge) measured by this test question: 

Essential (21, 

Useful but not essential Cl>, or 

- Not necessary (01, 

for successful performance on the job? 

He then used a test of statistical significance with the content validation 

panel’s ratings as a basis for eliminating items from a test item pool. 

Lawshe’s procedure has been applied by a number of investigators in a variety 

of situations. Distefano, Pryer, and Craig (1980) used his content valida- 

235 

.

- 

tion procedure to assess a job knowledge test being used as a criterion measurement 

of psychiatric aide training success. They stated, "It is evident 

that the content validity could be improved in subsequent revisions if the 

method is used as part of the test construction process.” 

Kesselman and Lopez (1979) developed an accounting job knowledge test using 

the procedure which they found to be superior to a commercially available 

mental ability instrument in predicting two criteria; supervisor assessment 

of subordinate job knowledge and supervisor assessment of overall job perfor- 

mance. 

Distefano, Pryer, and Erffmeyer (1983) showed that a variation of the Lawshe 

method could be used in the development of a behavioral rating scale of job 

performance, while providing quantitative content validity evidence of, the 

criterion scale. 

Finally, Carrier, Dalessio, and Brown (1990) used Lawshe’s three-point scale 

to focus on the correspondence between inferences made using content validity 

and criterion-related validity strategies. They found that for experienced 

candidates, job experts seemed to be able to identify those items on an interview 

guide that predicted the commissions of personnel into the Life Insurance 

Marketing and Research Association. They also noted that ". . .using 

content validity as the sole evidence for test validity should be limited to 

situations where test developers are working with well-defined constructs, 

such as acquired skills or specific knowledge.” 

Method 

Two content validity rating (CVR) forms were developed using a Lawshe-type 

scale, one form fo.r development of the SKT taken for promotion to E-5 and one 

for the development of the SKT taken for promotion to E-6 and E-7. The 

USAFOMS forms incorporated minor adjustments to the Lawshe approach. In par- 

titular, it was necessary to reference the grade level of the test, since 

knowledges required for successful performance of the E-5 duties may be considerably 

different from those required to perform E-6 and E-7 duties. In 

addition, the USAFOMS forms focused on successful performance in the specialty, 

not just in the job, a much broader view, since a specialty encompasses a 

family of related jobs. 

Whereas Lawshe used a test of statistical significance with the panel's ratings, 

this was not practical at USAFOMS because of the small number of individual 

raters being used. Only two to six SMEs normally participate in a 

test development project. To require statistical significance with such a 

small sample would require unanimous agreement of item essentiality. Rather 

than impose strict statistical criteria with the new ratings, the USAFOMS 

policy was stated as follows: YVRs will be used to encourage SMES to focus 

first on the appropriateness of test item content as it relates to successful 

performance in the specialty." SMEs were not required to take special actions 

as a result of the various ratings. In essence, SMEs could retain (ei- 

ther reuse on the next revision of the test or designate as an alternate) or 

deactivate (designate as unacceptable for reuse) an item without regard to 

their ratings on the CVR forms. There are, however, other requirements such 

as clear reference support and acceptable item statistics that must be met if 

an item is to be retained. 

This research examines how Lawshe's procedure, with the noted modifications) 

was employed at USAFOMS -- an organization whose promotion tests impact most 

236 

.

Air Force enlisted specialties. The first objective was to determine the 

extent to which SME ratings on the CVR forms impact subsequent identification 

of items as acceptable or unacceptable for reuse. The second is to determine 

how SMEs and project psychologists perceive the value and usefulness of the 

forms. Ninety-four SMEs, representing 25 AFSs, assigned to USAFOMS for SKT 

rewrite duties were asked to rate test items from their respective E-5 and 

E-6/7 grade-level SKTs using the CVR forms. USAFOMS test development proce- 

dures requires the completion of this step prior to the SMEs’ designation of 

an item as either acceptable for continued use on subsequent SKTs or as unac- 

ceptable for reuse. Once again, these test items were rated using Lawshe’s 

3-point scale. A rating of 2 was given to items whose content was essential. 

A rating of 1 was assigned to those items whose content was useful, but not 

essential, and a rating of 0 was assigned to those items whose content was 

not necessary for successful performance in the AFS. In all, 19,700 ra~tings . 

were obtained from 94 raters for 2 SKT levels (E-5 and E-617). 

Results 

Intraclass correlations for each of the 25 AFSs were computed to determine 

the interrater reliabilities for the group of SMEs from each specialty. All 

but two of the calculated values had p < .05. (The higher reliability values 

obtained seemed to be associated with the more technologically specialized 

fields where there is little room for variance of procedures across the 

Air Force, thus leading to more agreement among SMEs on items which test essential 

knowledge. Lower values seemed to be associated with broader specialties 

where there is more variance in day-to-day jobs performed and hence, 

less agreement and lower values of reliability.) 

The average CVR for items chosen as acceptable and for those designated for 

deactivation were also calculated for each test project. The mean CVR value 

for all deactivated items was 1.28 and the mean CVR value for all acceptable 

items was 1.43. These results conformed to our expectations that on the 

whole, items selected for deactivation would have lower content validity ratings 

than those chosen as acceptable. The average CVR value for all 19,700 

ratings obtained was 1.40. This average reflects the fact that on the whole, 

Air Force SKTs are viewed as being relatively high in content validity. 

To determine the actual impact, if any, of the content validity ratings on 

the subsequent identification of an item for reuse, a chi-square test of statistical 

significance was computed. The null hypothesis (H 1 for this test 

states that there is no difference between the proportion o? items selected 

as acceptable and unacceptable in each rating category. The alternative hypothesis 

(H 1 is that the distribution of items in each rating category differs 

from t%e hypothesized one. The results, as shown in Table 1, indicate 

significant differences between expected and observed values for acceptable 

and deactivated items in each rating category, with the largest differences 

occurring between ratings of 2 and 0. As shown, 203 more ratings of 0 were 

observed for deactivated items than was expected, while 234 more ratings of 2 

were observed for acceptable items than was expected. A chi-square value of 

199.7 (df=2) was obtained, indicating significance at the .Ol level. On 

this basis, the null hypothesis was rejected, indicating a disproportional 

representation of items selected as acceptable and unacceptable in each rating 

category. This shows that item content validity did impact subsequent 

identification of item acceptability. 

A point-biserial correlation coefficient relating identification of an item 

as either acceptable or deactivated with the item's average content validity 

237

ating was also computed. The resulting correlation coefficient, .0894, is 

significant at the .Ol level, yet is rather low. This can be attributed to 

the fact that of the 19,700 total ratings, 8,387 were ratings of 1. Since 

the largest differences in proportional representations were found to be in 

the rating categories of 0 and 2 and the number of ratings of 1 were within 

31 ratings of the expected value, it appears that the correlation coefficient 

may have been decreased by the large number of ratings of 1. 

Table 1. Comparison of Acceptable and Deactivated Test Items 

OBSERVED 

Acceptable Deactivated 

Rating 0 1213 515 1728 

Rating 1 6840 1547 8387 

Rating 2 8086 1499 9585 

16139 3561 19700 

-.- 

EXPECTED 

Acceptable Deactivated 

Rating 0 1415.6 312.4 1728 

Rating 1 6871 1516 8387 

Rating 2 7852.4 1732.6 9585 

16139 3561 17700 

X2 = 199.7 df = 2 PC.01 

Through the analysis of the data, a number of unusual cases surfaced. The 

data contained in these cases was contrary to expectations and evoked further 

analysis . For instance, 10 of the 5,200 test items evaluated were given rat- 

ings of 0 (not necessary) by all SMEs, yet still retained as acceptable 

items. Perhaps the SMEs reconsidered the item content and found it essential 

to successful job performance. Furthermore, 117 of the 940 deactivated items 

were given ratings of 2 (essential) by all SMEs. After obtaining the reasons 

for deactivation, these items were broken down into six categories: 

REASON FOR DEACTIVATION # ITEMS % OF TOTAL 

Poor Statistics/No Acceptable Revision 

Inadequate Reference 

Obsolete I tern 

Two or More (or No) Correct Answers 

Low Content Validity 

Inadvertent Duplication of Item 

57 49% 

37 32% 

11 9% 

8 7% 

3 3% 

1 1 .! 

As stated earlier, there are other requirements not directly related to content 

validity that must be met if an item is to be retained. These include 

238

clear reference support and acceptable item statistics, thus most of the 117 

items were deactivated for valid reasons. It was surprising, however, that 

3% of the items in question had "low content validity" cited on the i tern 

record card as the reason for deactivation when originally all SMEs had felt 

the item content was essential for successful job performance. These results 

could be due to administrative error, SME reconsideration of the item con- 

tent, or a number of other possible reasons. 

The second objective of this research was to determine how SMEs and project 

psychologists perceived the value and usefulness of the CVR forms. A three 

question survey was administered to 21 project psychologists at USAFOMS. A 

similar survey was administered to 151 SMEs upon completion of the rating 

forms and selection of items for deactivation. A four-point rating scale was 

used for the responses: strongly agree, agree, disagree, and strongly. dis- 

agree. The questions and summary of responses are as follows: 

(11 When selecting previously used items to be reused, the Content Validity 

Rating forms hglped identify those items most essential to successful 

performance in the specialty. 

49% (74) SMEs answered positively (agree or strongly agree) 

52% (111 project psychologists answered positively 

(2) The Content Validity Rating forms helped bring out different points 

of view for discussion. 

56% (85) SMEs answered positively 

76% (16) project psychologists answered positively 

(3) The Content Validity Rating forms were a valuable tool in selecting 

Discussion 

items to be reused. 

39% (59) SMEs answered positively 

43% (9) project psychologists answered positively 

Through the analysis previously described, it became apparent that there was 

a significant impact of content validity ratings on subsequent identification 

of an item as acceptable or deactivated. For instance, with the 25 projects 

sampled, there were 52 individual SKTs examined. Of these 52 SKTs, 37 had 

higher average CVR values for acceptable items than the average CVR values of 

the deactivated items which is what would be expected -- a positive differ- 

ence between the two. It is also important to note that 6 of the 52 SKTs are 

not applicable in this analysis since in these cases all 100 items were designated 

as acceptable and thus, there was no average CVR value for deactivated 

items. The 9 remaining SKTs had higher average CVR values for the deactivated 

items than the average CVR values for the acceptable items. Of these 9 

SKTs, 2 were from projects with insignificant interrater reliabilities. Even 

though the expected effect did not hold true for every case, overall, items 

higher in content’ validity had a greater chance of being acceptable while 

items lower in content validity were more likely to be deactivated. This 

shows that on the whole, item content validity does play a role in the SMEs’ 

evaluation of an item's testworthiness. 

The second objective of the research was to determine how SMEs and project 

psychologists perceived the usefulness of the CVR forms. It became evident 

that there was no universal agreement on the usefulness of the forms. Additionally, 

any project psychologist biases, either for or against the use of 

the forms, may have influenced how the psychologist administered the forms to 

239 

. .

t h e SME,;. This in turn may have biased the SMES’ ratings on the CVR forms 

and on the surveys as we11. 

The results of this study suggested several areas for future research. 

First, to the extent that the forms are used, one would expect the content 

validity of SKTs to improve over time since ideally, the forms would help 

SMEs identify and retain those items most essential to successful performance 

in the specialty. This could be observed by charting the average content 

validity rating for all SKTs over a period of years. 

Also, with the imminent manpower cutbacks, the USAFOMS mission may be directly 

affected. By charting these average content validity rating values, it 

would be possible to see whether the content validity of the test% declined. 

This would be helpful in illustrating the impact of cutbacks on the test. development 

mission at USAFOMS. 

Finally, it would be interesting to examine the relationship between project 

psychologist and SME responses to the survey questions and to investigate the 

possibility of psychologist biases affecting the SMEs’ use of the forms. 

Although the attitudinal portion of the research showed some disagreement as 

to the value of this quantitative procedure, statistically, the content validity 

of the items has a significant impact on the subsequent identification 

of items for reuse. 

References 

Carrier, M. R., Dalessio, A. T., and Brown, S. H. (1990). 

Correspondence 

between estimates of content and criterion-related validity values. Personnel 

Psychology, 43, 85-100. 

Distefano, M. K., Jr., Pryer, M. W., and Craig, S. II. (1980). Job-relatedness 

of a posttraining job knowledge criterion used to assess validity and 

test fairness. Personnel Psychology, 33, 785-793. 

Applica- 

Distefano, M. K., Jr., Pryer, M. W., and Erffmeyer, R. C. (1983). 

tion of content validity methods to the development of a job-related performance 

rating criterion. Personnel Psychology, 36, 621-631. 

Fitzpatrick, A. R. (1983). The meaning of content validity. Applied Psychological 

Measurement, 7, 3-13. 

Kesselman, G. A. and Lopez, F. E. (1979). The impact of job analysis on em- 

ployment test validation for minority and nonminority accounting personnel. 

Personnel Psychology, 32, 91-108. 

Lawshe, C . H . (1975). A quantitative approach to content validity. Personnel 


240 

.

Barbara Jezior 

U.S. Army 

Natick Research, Development, and Engineering Center 

Natick, MA. 

Interpreting Rating Scale Results: 

What does a Mean Mean? 

Larry Lesher 

GEO-CENTERS, Inc. 

Newton Centre, MA. 

Richard Popper 

Ocean Spray Cranberries Inc. 

Lakeville-Middleboro, MA. 

Charles Greene Vanessa Ince 

U.S. Amly * U.S. Army 

Natick Research, Development, ,and Engineering Center Natick Research, Development, and Engineering Center 

Natick, MA. Natick, MA. 

How well soldiers like items they use in garrison or the field is often measured on 

Likert scales, and the mean ratings obtained from these scales are then used as 

indicators of user acceptance. In examining data contributing to 176 mean ratings 

of various Natickproducts we found that the means accurately predict the acceptor 

set, i.e. the percentage of soldiers who rated a product on the positive end of a scale. 

Knowing the percentage who find a product acceptable provides a more intuitive 

and concrete basis for product development or improvement decisions. For 

example, the product developer can operate from the knowledge that 66% find the 

product acceptable, insteadof amean rating that deems the product “slightly good.” 

Introduction 

Natick is deeply involved in consumer acceptability 

issues. We develop basic subsistence items for servicemen 

- rations, protective clothing, shelters, and airdrop 

equipment. These products support m annual procurement 

of over 3 billion dollars, making consumer (soldier) 

acceptance critic‘al. Items that are unacceptable could sit 

in wnrehouses or never be used, and the soldier would be 

lacking necessary equipment as well. 

To obtain additional quantitative information on how 

soldiers felt about our products, we started a large-scale 

systematic survey program six years ago. Like many, we 

operatedunder the assumption that one of thebest ways to 

measure and describe how well the soldier liked the 

products was to use the mean and other parameters derived 

from verbal rating scales. 

After analyzing over 7,000 questionnaires throughout 

the six years and writing many reports for managers and 

product project officers webegan toquestion this assumption. 

We ourselves began to get curiousabout what means 

were saying in respect to the measure of product accepta- 

241 

bility. For instance, while we felt that a mean of 5 on a 7point 

scale shoul(l denote a relatively acceptable product, 

we found that we usually had many more negative ratings 

than expected. Over time we also began to feel, on an 

intuitive level, that a mean of 6 on a 7-point scale indicated 

a “very” good product but our verbal anchor was labelling 

such a product as “moderately” good. 

Moreover, in describing survey results to product 

managers, we found that while the concept of an average 

is rather commonly understood, the accompanying parameters 

of standard deviations, skewness, etc., are not 

understood outside the research communities, nor should 

we expect them to be. The problem here is that a mean in 

isolation, whichis what amanageris grappling with when 

not understanding its accompanying parameters, can be 

very misleading. A manager who makes product decisionswithout 

somesemeofwhataratingdistributionisall 

about may m‘ake the wrong decisions. 

Another problem with means for many is that they 

don’t provide a good intuitive feel for what relative 

differences are in regard to measuring products, any 

statistically significant differences notwithstanding. For 

instance, if means differ by one scale point. some don’t

think that difference especially disc!oncerting, while others 

thinkit’s monumental - ‘and those viewpointscanbeirrespective 

of whether there is an understanding of the 

underlying distribution or not. 

These issues led us to the literature to see what had 

been reported on rating scale distributions in respect to 

product acceptability. 

The literature showed that, in recent years, a new 

objective measure for determining level of product acceptability 

labelled the “acceptor set” has been described 

in marketing research literature, especially that of the food 

industry (Gordon and Norback, 1985). The measure has 

been used in conjunction with food product optimization 

techniques and market positioning (Lagrange and Norback, 

1987). 

While the acceptor set canbe determined by a simple 

binary method (dichotomous question), both Gordon 

(198s) and Choi and Kosikowski (1985) described creating 

an “acceptor set” from scaled data by splitting the 

sample group into two percentages, the percentage who 

found a product acceptable and those who did not. For 

example, respondents to our 7-point scale (l=“dislike 

verymuch”107=“like verymuch”)couldbesplit intotwo 

groups - tither the S-7 group, or the l-4 group, with the 5- 

7 group constituting the acceptor set. Product optimization 

then means finding methods to increase the acceptor 

set percentages (however derived) as measures ofproduct 

improvement. 

Given those findings, we decided to look at the acceptorset 

conceptinrespect tooursurveydatabasetoseel~ow 

we could add to the definition ofproduct acceptability for 

both the manager and researcher. 

Methodology 

Data base description 

The rating scale data were obtained on questionnaires 

administered to approximately 7,500 combat arms soldiers 

who rated products such as field rations, protective 

clothing, tents, and airdrop equipment. Data collectors 

went to the survey sites after these soldiers had returned 

from major training exercises where they had used one or 

more of the products. Entire units were tasked to participate 

in the surveys; soldiers in these units could refuse to 

fill out questionnaires if they chose, but few did. The 

sample size at each site r‘anged from 200 to 400. The 

soldier population was male, with over 90% between the 

agcsofbetween 19and 23 andservingintheenlistedranks 

E-2 to E-4. 

The verbal rating scales were either 7-point or Y-point 

scales. The 9-point scale, which has also been called a 

hedonic scale (Peryam and Pilgrim, 1957), has been a 

242 

scale traditionally used in military and civilian food research 

formore than 30 years (Maller and Cardello, 1984). 

Itwasonlyusedforrating the taste ofspecificration(food) 

items. Acceptability ratings for ration attributes other 

than taste (e.g. acceptability of portion sizes) were obtained 

on 7-point scales. 

The verbal anchors for the 7-point scales were: goodbad, 

satisfied-dissatisfied, easy-difficult, comfortableuncomfortable 

andlike-dislike. The 9-point scale anchors 

werelike-dislike. Eachscale had adverbmodifiers forthe 

anchors that graduated in intensity. For example, the 7point 

good-bad scale was: . 

VERY MODERATELY SLIGHILY NEtTtLER MAD SLKWTLY MUDEKATELY VERY 

BAD BAD BAD NOR CQOD CXWD GOOD GOOD 

1 2 3 4 5 6 7 

Each of the scales had a neutraI point and the positive 

verbal anchors for the scales were at the high ends, i.e. 5- 

7 for the 7-point scale and 6-9 for the g-point scale. The 

product acceptnnce issues covered a wide range of variablessuchasdurability, 

appearance,comfort, taste, weight, 

compatibility (with other pieces of equipment), weatherproofing, 

warmth, and “overall” acceptability. 

Analysis 

We based our ‘analysis on randomly selected mean 

ratings from our survey data. The number of means 

selected was 155 for the 7-point scale and 21 for the 9point. 

The largest sample contributing to any particular 

mean numbered 347, and the smallest 34. The lowest 

mean rating on the 7-point scale w;1s 2.94 and the highest 

was 6.53; the lowest for the 9-point was 3.01 ‘and the 

highest 6.35. The mean of the means obtained on the 7point 

scales was 4.71 (SD=.75), while the mean of the 

means obtained on the 9-point scale was 4.59 (SD=.9 1). 

The distributions for all the selected means were unimodal. 

We explored the relationships of the means to the size 

of the acceptor sets through regression analyses. The 

acceptor set definition was the percent of ratings falling in 

the entire positive range for either scale, i.e. S-7 for the 7point 

and 6-9 for the 9-point scale. 

Results 

The results show extremely good fits with linear 

regression models for both 7- and 9-point scales. Figure 

1 shows the scatter plot for the relationship of the means 

to the acceptor sets for the 7-point scale. The R* in this c,ase 

is .97 with a regression equation of: 

y = - 54.45 + 24.13x.

Figure 2’s scatterplot shows the relationship of me‘ans 

to acceptor set for the 9-point scale; the R1 is .98 and the 

regression line is: 

y = - 26.04 + 14.71x. 

Figure 1 

I 2 3 4 5 6 ‘I 

MEAN R4TmGS 

Figure 2 

MEAN RATINGS 

Discussion 

When we derive acceptor set sizes from the regression 

equations for both the 7- ‘and g-point scales, the size of the 

acceptor sets affirms what we were seeing in our data for 

iridividual products. For instance, the acceptor set for a 

mean of 5 on the 7-point scale corresponds to an acceptor 

set size of 66% of the population, whereas a mean of 6 corresponds 

with 90%. A far greater number of negative 

ratings are seen for a mean of 5 as opposed to a 6: the 

negative and neutral population is decreased by 25% 

between means of 5 and 6. 

As mentioned earlier, the 9-point scale bar been used 

243 

forratingfooditemsinmilitaryrationssince 1957. Senior 

researchers who have spent many years in ration acceptability 

feel sure they have a very good item if a rating is a 

7 (“like moderately”). That is, the 7 is not a good rating 

by default or relative stature of the item in the ratings list, 

the feeling is that an item with a rating of 7 is very 

acceptable in an absolute sense. Correspondingly, the 

acceptor set picture for the g-point scale regression is: a 

mean rating of 7 shows an acceptor se1 size of 77%, 7.5 

shows 84%, and 8 shows 91%. 

The situations described above point to how acceptor 

sets can aid the defmition of product acceptability. If we 

unshackle the description of product acceptance from the - 

scale verbal anchors, which can make it appear that products 

‘are falling somewhat short because they are not 

achieving perfect scores, it may facilitate definition of 

product norms that are easier to deal with both intellectually 

and at gut level. 

For instance, if you tell product developers that a 

product is top of the line if 90% of the populace rates it 

positively, the statement has an intuitive logic to it. Product 

developers assume that no one product can please 

lOO%ofthepopulation. Evenifthereweresuchaproduct, 

it would still probably not achieve a perfect rating on any 

scaled measure because there is a lot at play in the rating 

game, e.g., raters tend to avoid end points on scales no 

matter how they feel about a product, frames of reference 

can be different among raters in regard to a product, and 

even the mood the rater is in that day can affect his or her 

rating. 

What the norms for products should be, as defined by 

the size of the acceptor set, i.e., excellent, good, average, 

or poor, are to be determined. One approach might be to 

determine the cumulative distribution frequencies for 

acceptor sets and think in terms of percentiles. Figure 3 

shows the application of this concept to the 7-point scale 

data; the graph shows that an acceptor set of 45% falls in 

the 25th percentile, while a set of 74% falls in the 75th 

percentile. To achieve a product that scores better than 

80% of all products tested, an acceptor set size of about 

Figure 3 

0 25 50 75 ,,m 

CIJMULAINE I’ERCEh’T

77% is needed. Other qualitative or quantitative data 

could also be used in conjunction with acceptor set size to 

establish breakpoint criteria. 

Going one step further, nomrs could, and should, be 

established for different product groups. Some types of 

products by their nature will never have large acceptor 

sets. so they should not have to be measured against 

products that do. 

Our findings obviously reinforce the research attesting 

to the value of the acceptor set to managers in the 

commercial world concerned with market positioning and 

product optimization. The military worldcan alsobenefit. 

Although our consumeris a captive consumer so to speak, 

there may be some bottom line applications of the acceptor 

set that me,ans c,an’t address. 

For instance, the U.S. Army can spend ‘around 

$3 I ,OOO,OOO a year on its standard operational ration. For 

the sake of hypothesis. assume those who didn’t like it 

didn’t eat it. What would that mean in terms of dollars’! If 

you were to assume further the ration overall had a mean 

rating of 5 (7-point scale), 66% would be eating it, and if 

it had a mean rating of 6,90% would be eating it. The 

differential of that one scale point amounts to $7,440,000 

in uneaten rations. This type of accountability would 

behoove the developer to improve acceptability in a way 

that simply looking at means couldn’t. 

Overall, we recommend using an acceptor set to 

communicate levels of acceptability and to use this measure 

in tandem with traditional scale statistics. Scale 

pammeters still convey infomlation that dichotomous or 

otherqualitativedatacannot. Forthemanager, however, 

the acceptor set will provide a far more intuitive grasp of 

the product findings and a firmer footing for product 

244 

optimization or market positioning decisions. 

The excellent fit of the linear regression mode1 is 

especially gratifying because of the simplicity it offers. 

The acceptor set grows linearly with product acceptance 

means. One integer of improvement in a mean translates 

into a constant percent change in the acceptor set, namely 

about 24% on a 7-point scale. 

References 

Choi, H.S., and Kosikowski, F.V. (1985). Sweetened 

plain &and flavored c,arbonated yogurt beverages. 

Journal ofDairy Science,68,913. 

Gordon, N.M. (1985). A product development, positioning 

and optimization process guided by organizational 

objectives. Mnsterof ScienceThesis. University 

of Wisconsin, Madison. 

Gordon,N.M. andNorback, J.P. (1985). Choosingobjective 

measures when using sensory methods for optimization 

and product positioning. Food Technology, 

39,( 1 1), 96. 

Lagrange, V. and Norback, J.P. (1987). Product optimization 

and the acceptor set size. Journal of Sensory 

Studies, 2, 119-136. 

Mailer, 0. and Cardello, A. (,1984X Ration acceptance 

methods: measuring likes and their consequences. 

Niederlands Militari Geneeskundig 

Tijdschrift.,37(79/1 lo), 91-96. 

Peryam, D.R. and Pilgrim, F. (1957). Hedonic scale 

method of measuring food preferences. Food Technology, 

11(9), Supplement 9.

---.---- ---._. -.. 

\ 

ASVAB, Description 

Joint-Service Computerized Aptitude Testing 

W. A. Sands* 

Director, Testing Systems Department 


San Diego, California 921526800 

INTRODUCTION 

The Armed Services Vocational Aptitude Battery (ASVAB) is used by all the U.S. 

military services for both enlistment screening and classification into entry-level 

training. The current battery includes ten tests. The eight power tests are: General 

Science, Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, Auto 

and Shop Information, Mathematics Knowledge, Mechanical Comprehension, and 

Electronics Information. The two speeded tests are: Numerical Operations and Coding 

Speed. Administration of this conventional, paper-and-pencil test battery takes 

between 3 and 3 l/2 hours. 

The U.S Military Entrance Processing Command (USMEPCOM) administers 

ASVAB under two Department of Defense testing programs. In the Enlistment 

Testing Program, ASVAB is administered to over 800,000 applicants each year, in 

approximately 70 Military Entrance Processing Stations (MEPS) and 970 Mobile 

Examining Team Sites (METS) nationwide. In the Student Testing Program, ASVAB is 

administered to over l,OOO,OOO students annually, in over 15,000 schools. 

CAT-ASVAB Program 

Roles.. The U. S. Department of Defense initiated a Joint-Service research 

program to develop a Computerized Adaptive Testing (CAT) version of the battery 

(CAT-ASVAB) in FY 1979. At that time, the Department of the Navy was designated as 

Executive Agent, with the Marine Corps as Lead Service. Subsequently, the Lead 

Service responsibility was assigned to the Navy. The Navy Personnnel Research and 

Development Center (NPRDC) was designated as the Lead R&D Laboratory. The Air 

Force was assigned responsibility for the development of the large banks of test items 

needed for CAT-ASVAB. The Army was assigned responsibility for the procurement, 

deployment, and implementation of the full-scale operational testing system. 

Q&ectives. The Joint-Service CAT-ASVAB Program has three objectives: (1) 

develop a CAT version of the ASVAB, (2) develop a computer-based delivery system 

that will support the new test battery, and (3) evaluate CAT-ASVAB as a potential 

replacement for the paper-and-pencil version of the battery (P&P-ASVAB). 

* The opinions expressed in this paper are those of the author, are not official, and do not 

necessarily represent those of the Navy Department. 

245

Purpose 

ACCELERATED CAT-ASVAB PROJECT 

The Accelerated CAT-ASVAB Project (ACAP) is designed to develop and field-test 

CAT-ASVAB in the shortest time possible. The idea is to collect “lessons learned” 

information about the new test battery, the delivery system (including both 

hardware and software), and the testing environment. The information obtained will 

be used to specify the functional requirements for the full-scale, operational system. 

Delivery System 

The Hewlett-Packard Integral Personal Computer (HP-IPC) was selected for the 

ACAP System. The HP-IPC is a powerful microcomputer system which, for the 

examinee testing station, includes: a Motorola 68000, 16/32 bit microprocessor; a 

graphics co-processor; 1.5 megabytes of Random Access Memory (RAM); a built-in, 

3.5-inch, 7 10K byte microfloppy disk drive; a g-inch amber, high-contrast 

electroluminescent flat screen display, supporting a 255 by 512 pixel, bit-mapped 

display, and a standard window size of 24 lines by 80 characters; a go-key, lowprofile, 

detachable keyboard (which was modified by a template, leaving only the 

necessary keys exposed); and a built-in inkjet printer. The entire computer has a 7 

by 16 inch footprint, requiring less than one square foot of desk space. The software, 

developed at NPRDC, is written in the C programming language, running under a 

UNIX operating system (HP-UX). 

Field Research Activities 

The Accelerated CAT-ASVAB Project (ACAP) involves six field research 

activities: 

Pre-Test. The purpose of this research was to insure that examinees could 

easily use the CAT-ASVAB System (including hardware and software). Military 

recruits and students from high school special education classes were administered 

CAT-ASVAB. In the aggregate, they represented the full range of mental ability. 

Results were very encouraging. The examinees found CAT-ASVAB easier and faster 

than paper-and-pencil tests that they had taken. They liked the fact that it was selfpaced, 

and involved little writing. Some examinees expressed concern that they 

could not skip over items, nor go back to previous items and change their answers. 

Some examinees indicated that their eyes became tired, which emphasized the 

importance of avoiding glare on the screens. Administration instructions were 

revised, based upon information from questionnaires and interviews. This revision 

reduced the reading grade level from the eighth to the sixth grade. The Pre-Test was 

completed in November 1986. 

Medium of Administration. The purpose of this research was to evaluate the 

effect of the calibration medium of administration on score precision. The subjects 

were from the Navy Recruit Training Center in San Diego. Forty-item conventional 

tests were constructed for General Science, Arithmetic Reasoning, Word Knowledge, 

Shop Information, and Paragraph Comprehension. Subjects were randomly assigned 

to one of three groups. The first group took the tests on computer; these data were 

used to obtain a computer-based calibration of items. The second group took the same 

246 

.

tests in a paper-and-pencil mode; these data were used to obtain paper-and-pencil 

calibration information. Each of these calibrations was used to estimate the ability of 

examinees assigned to the third group, who took the tests on computer. Lengthy test 

administration time required splitting the study into two phases. General Science, 

Arithmetic Reasoning, Word Knowledge, and Shop Information were addressed in the 

first phase. Results from this phase showed: (a) no practical differences in the 

estimation of abilities; (b) small, but statistically significant differences in different 

tests; and, (c) no significant differences in test reliabilities. The second phase 

involved the administration of the Paragraph Comprehension test. Data for this 

second phase have been collected and anaIyses are underway. 

Cross-Correlation. The purpose of this research was to compare the 

measurement precision of CAT-ASVAB and P&P-ASVAB. Subjects were from the ‘Navy 

Recruit Training Center, San Diego. Each recruit had taken an operational form of 

P&P-ASVAB which was used for enlistment purposes. The total sample was split into 

two groups. The first group took CAT-ASVAB Form 1. then CAT-ASVAB Form 2. The 

second group took P&P-ASVAB Form 9B, then P&P-ASVAB Form 10B. The second test 

for each group was administered about five weeks after the first test. Results indicate 

that, despite using substantially fewer items, CAT-ASVAB exhibits significantly 

higher alternate form reliability than P&P-ASVAB for most tests, while no P&P- 

ASVAB test demonstrates significantly higher reliability than the comparable CAT- 

ASVAB test. 

Preliminarv Ooerational Check. The purpose of this research was to 

demonstrate the communications interface between the ACAP System and USMEPCOM 

computer system. The testing procedures were performed jointly by NPRDC and 

USMEPCOM personnel at the Seattle MEPS. Data from examinees were loaded onto the 

Data Handling Computer at the MEPS, then transferred to the USMEPCOM System-80 

minicomputer. Comparison of the data before and after the transfer showed the 

procedure was completed with perfect accuracy. 

Sco e Eauat’ p Development. The purpose of this research was to equate CAT- 

ASVAB with P&P-:SVAB. Equating is essential to insure that the two forms of the 

battery are on the same metric, and that the scores are interchangeable. Subjects 

were applicants for enlistment at six MEPS (San Diego, Richmond, Seattle, Boston, 

Omaha, and Jackson) and their satellite MET sites. These six MEPZYMETS complexes 

were selected because, in the aggregate, their applicants are representative of the 

nation. The operational measures included P&P-ASVAB Forms lOA, lOB, 11 A, llB, 13A, 

and 13B. There were two forms of the CAT-ASVAB (both non-operational). Finally, 

P&P-ASVAB Form 8A was used as the non-operational reference battery. Subjects 

were randomly assigned to one of three groups. The first group took CAT-ASVAB 

Form 1, then the operational P&P-ASVAB. The second group took CAT-ASVAB Form 2, 

then the operational P&P-ASVAB. The last group took the reference battery (P&P- 

ASVAB Form 8A), then the operational P&P-ASVAB. In each case, the testing was 

done on the same day or on successive days. Data collection, editing, and equating 

analyses have been completed. New equating procedures have been developed and 

applied. Analyses indicated that composite equatings were unnecessary. ‘Provisional 

equating tables for operational use in 

the subsequent Score Equating Verification 

study were developed. The ACAP microcomputer delivery system has performed 

satisfactorily, exhibiting fewer problems than anticipated. Finally, the logistics of 

testing in the numerous, heterogeneous MEPS/MET sites nationwide has presented no 

insurmountable problems. 

247

Score Eauating Verification. The Score Equating Verification study is designed 

to evaluate the effect of examinee motivation upon item calibration and equating. 

The examinees are applicants for military service who are processing through the 

same six MEPQMETS complexes used in the Score Equating Development study. 

measures include two forms of CAT-ASVAB and one form of P&P-ASVAB (8A). The 

CAT-ASVAB scores are based on the provisional equating tables developed in the 

Score Equating Development study, and count as scores of record for enlistment. Data 

collected during this study will be used to develop final equating tables for 

subsequent operational use. Data collection began on 3 September 1990. This was the 

first time that CAT-ASVAB test results have counted as scores of record and, therefore, 

determined enlistment eligibility and subsequent training opportunities for 

applicants to the military services. Plans call for data collection to be completed in 

June 1992, analyses to be completed in November 1992, and results documented by 

May 1993. 

Technical Base R&D 

ENHANCED CAT-ASVAB 

During the past several years, each of the Service R&D laboratories has been 

investigating computer-administered tests which measure abilities not measured by 

the current ASVAB tests. These include measures of psychomotor ability, spatial 

ability, and working memory. 

Technical Advisory Selection Panel 

A Joint-Service Technical Advisory Selection Panel (TASP) was established to 

evaluate new computerized tests which showed promise and to nominate Military 

Occupational Specialities (MOSS) for a Joint-Service Enhanced CAT-ASVAB @CAT) 

validation study. This committee was chaired by a representative of the Defense 

Manpower Data Center (DMDC) and included a technical representative from each of 

the Services and USMEPCOM. General criteria employed in evaluating the alternative 

tests included the theoretical development of the underlying construct, measurement 

precision, validity, equating, and operational feasibility. 

Joint Service ECAT Validation Study 

The TASP recognized that the amount of testing time available in the field was 

limited, and that not all promising tests could be administered. Therefore, the tests 

are grouped into primary and secondary categories. The primary group, to be 

administered to all examinees, includes: (1) Integrating Details, (2) Target 

Identification, (3) Figural Reasoning, (4) Two-Hand Tracking, and (5) Sequential 

Memory. The secondary group includes: (1) Assembling Objects, (2) Orientation, (3) 

One-Hand Tracking, and (4) Mental Counters. These secondary tests will be 

administered only in those situations where time permits. 

The Military Occupational Specialties (MOSS) involved in the Army include: 

Infantryman (1 lH), Cannon Crewman (13F), and Tank Crewman (19K). Air Force jobs 

include: Air Traffic Controller (27230) and Personnel Specialist (73230). The Marine 

248 

The

Corps MOSS will include: Motor Transportation (35Xx). and Aircraft Maintenance 

(61Xx). Finally, the Navy ratings will include: Air Traffic Controller (AC), 

Operations Specialist (OS), Fire Controlman (FC), Electronics Technician/Advanced 

Electronics Field (ET (AEF)), Radioman (RM), Engineman (EN), Aviation Structural 

Mechanic - Structures (AMS), Aviation Electrician’s Mate (AE), Aviation Electronics 

Technician (AT), Aviation Fire Control Technician (AQ), Aviation Antisubmarine 

Warfare Technician (AX), Aviation Ordnanceman (AO), Gunner’s Mate - Phase I 

(GMG), Machinist’s Mate (MM), and Electrician’s Mate (EM). 

Data collection for the Joint-Service ECAT Validation study began in February 

1990 and will continue through August 1991, with analyses, documentation, and 

reviews scheduled for completion in July 1992. 

Navy ECAT Validation Study 

The purpose of this study is to determine the incremental validity of new 

predictor tests for augmenting ASVAB for selected Navy ratings. It will provide 

additional information to the Joint-Service ECAT Validation study described above for 

assessing the cost-effectiveness of computerized testing. 

The experimental test battery includes six tests, followed by a seven-item 

questionnaire. Average administration time is 2 l/2 hours. The six tests are: (1) 

Mental Counters, (2) Sequential Memory, (3) Integrating Details, (4) Space 

Perception, (5) Spatial Reasoning, and (6) Perceptual Speed. The short questionnaire 

is designed to obtain information on examinee fatigue, motivation, and computer 

experience. 

Data have been collected from the following Navy schools: Operations 

Specialist (OS), Aviation Structural Mechanic - Structures (AMS), Aviation 

Ordnanceman (AO), Aviation Electronics Technician (AT), Aviation Fire Control 

Technican (AQ), Aviation Antisubmarine Warfare Technician (AX), Gunner’s Mate - 

Phase I (GMG), Machinist’s Mate (MM), Propulsion Engineering Basics, Aviation 

Machinist’s Mate (AD), Boiler Technician (BT), Hospitalman (HM), and Hull 

Maintenance Technician (HT). 

Data collection for the Navy ECAT Validation study has been completed. 

Analyses, documentation, and review of the results should be completed in December 

1990. 

CONCEPT OF OPERATIONS 

The concept of operations for the CAT-ASVAB System has not been finalized. 

In a previous study, four alternative deployment strategies were selected for special 

attention: (1) Centralized CAT-ASVAB testing at MEPS, with elimination of all METS 

testing; (2) High Volume Site Testing (all MEPS and 273 METS); (3) use of a CAT 

screening instrument at the military recruiting stations, with subsequent full CAT- 

ASVAB testing of screened personnel at MEPS, and (4) administration of CAT-ASVAB 

in mobile vans, testing at MEPS and fifty high-volume METS. The current 

operational scenario involving the administration of P&P-ASVAB in all MEPS and 

METS provided a baseline case for comparison purposes. 

249

ECONOMIC ANALYSES 

Department of Defense and Department of the Navy regulations require 

performing an economic analysis to assist in determining whether or not a system 

is cost-effective. An initial study was conducted by a contractor, whose 

representatives visited each of the MEPS in the continental United States to collect 

cost information in four areas: (1) development, (2) procurement, (3) 

implementation, and (4) operations and support. 

The Brogden-Cronbach-Gleser approach to test utility evaluation was 

employed. This approach assesses the dollar utility of the incremental validity of a 

new instrument (e.g., CAT-ASVAB) over the validity of an existing instrument ‘(e.g., 

P&P-ASVAB) in terms of improved performance. A conventional ten-year economic 

life was used, and the net life cycle benefit computed for each alternative concept of 

operation. The incremental validity used for CAT-ASVAB was 0.002, a conservative 

estimate based upon simulation results assessing the increased precision of CAT- 

ASVAB over P&P-ASVAB. The results appear promising for two concepts of operation: 

centralized testing, and the recruiter screening approach. The high-volume site and 

mobile van concepts were not cost-effective. 

A pivotal issue in these economic analyses is the actual increment in validity 

which can be expected by using CAT-ASVAB instead of P&P-ASVAB. While simulation 

results were adequate for initial analyses, empirical data are necessary for any 

conclusive evaluation. Therefore the Manpower Accession Policy Steering 

Committee (MAPSC) instructed the Executive Agent to evaluate new tests which 

offered significant promise for enhancing ‘the predictive effectiveness of the 

current battery. 

A final economic analysis study will bc performed under contract. Results 

from this study will be crucial in determining whether or not CAT-ASVAB should be 

implemented nationwide. 

250

ASSESSMENT OF APTITUDE REQUIREMENTS FOR 

NEW OR MODIFIED SYSTEMS 

Lawrence H. O’Brien 

Dynamics Research Corporation 

Wilmington, Ma. 

INTRODUCTION 

Recent Department of Defense initiatives on manpower. personnel, and training 

call for an assessment of the “aptitude requirements of new systems.” For example, 

AR 602-2, Manpower and Personnel Integration (MANPRINT) in the Materiel 

Acquisition Process, requires that “For material with a predominant human 

interface, it is critical to collect and evaluate human performance reliability data to 

determine whether the proposed system concept will deliver expected 

performance with no greater aptitudes and no more training than planned.” DOD 

Directive 5000.53, Manpower, Personnel, Training, and Safety in the Defense 

Acquisition Process, requires that descriptions of the “quality and quantity of 

military personnel” needed to field a system be developed and updated during the 

acquisition process. The directive indicates that the descriptions of military 

personnel quality requirements “shall include distributions of skill, grade, aptitude, 

anthropometric and/or physical attributes, education, and training backgrounds.” 

KEY QUESTIONS RELATED TO APTITUDE ASSESSMENT FOR NEW SYSTEMS 

Aptitude assessments for new weapon systems seek to address two basic questions: 

Question I: Can the svstem be successfullv onerated and maintained bv the soldiers 

WI 

To determine if the system is successful, one must (a) identify the functions that 

the system is supposed to perform, (b) identify the measures that can be used to 

assess performance on these functions, (c) establish criteria for these measures, 

(d) either collect “test’ data on or estimate system performance, and (e) compare 

the system performance with the criteria. If performance exceeds the criteria, the 

system is judged successful. 

The term “by the soldiers who are expected to man it” implies detailed 

consideration of soldier characteristics such as aptitudes. More specifically, it 

assumes that data will be obtained from soldiers who are “representative” of the 

soldiers who will actually be assigned to t.he system. 

To identify “representative” soldiers, one must first identify the key personnel 

characteristics which impact soldier performance. Aptitudes such as scores on 

Armed Services Vocational Aptitude Battery are especially important because they 

are used by the Army to control entry into the Army or MOS. This is accomplished 

by setting cut-offs or minimum acceptable scores on these characteristics. 

The best way to select “representative” soldiers for inclusion in system testing is to 

randomly sample from a population that has the same distribution of these 

characteristics as the population of soIdiers who are expected to man the system. 

However, future dislribution of these characteristics within a particular MOS may 

251

e different than their current distribution. Since most Army systems take 5- 10 

years to develop, the capability to estimate the future distribution of key personnel 

characteristics is a critical prerequisite for describing the soldiers who are likely to 

be available .to man the system. 

Estimating the future distributions of these aptitudes is not simple since these 

distributions are impacted by a number of factors. First, the distributions are 

impacted by the cutoffs that the Army sets for these aptitude measures. These 

cutoffs eliminate soldiers who score below the cutoffs both from accessions and 

from distributions of the aptitudes at higher paygrades. However, the cutoffs are 

not the only factors determining these distributions. The distributions are also 

impacted by the distribution of the aptitudes in different subpopulations of the 

general population at a particular point in time, the propensity of those 

subpopulations to enlist at various aptitude levels, and the rates (e.g.. reenlistment 

rates) with which the subpopulations transition through the Army personnel 

system. 

I 

Question 2: Can the svstem be operated and maintained within available 

mannower. personnel, and training resource constraints? 

This question seeks to assess the “personnel affordability” of the new system. The 

resource capabilities of the Army are limited. The total end strength of the Army is 

fixed annually by Congress. The Army’s capability to recruit high quality personnel 

is restricted by the recruiting budget. To effectively deal with these resource 

limitations, the Army must set constraints for critical resources such as personnel. 

During the acquisition process, resource requirements for the new system must be 

established and compared with the constraints. If the requirements do not exceed 

the constrainls, the system is affordable; otherwise, it is not. Manpower constraints 

describe the maximum number of people who will be available to man the new 

system. Personnel constraints describe: (a) expected cutoff values for key 

characteristics such as aptitude, and (b) the expected distribution of these 

characteristics above the cutoff. 

APPROACH FOR ASSESSING APTITUDE IMPACTS ON SYSTEM PERFORMANCE 

The relationship between aptitudes and system performance is not a direct one. 

Aptitudes impact the performance of the tasks required to operate or maintain the 

system. Performance on these individual tasks determines overall system 

performance. Assessing the relationships between the performance of individual 

system tasks and system performance requires consideration of the complex causal 

and sequential relationships among tasks. Task performance will vary as a function 

of the conditions under which the tasks will be performed. These conditions will 

vary across time and across scenarios. 

Measures of Svstem Performance,. A number of metrics can be used to quantify 

system performance. Typically, two types of measures are developed: operational 

effectiveness (e.g. mission performance time or success) and system availability 

(e.g. system reliability, availability, or maintainability). 

252

I 

AFI’ITUDE ASSESSMENT TOOLS FOR NEW SYSTEMS 

As part of the HARDMAN 111 program, the Army Research Institute (ARI) has 

developed two microcomputer-based tools that can be used to assist Army analysts 

in identifying aptitude requirements and constraints for new systems--the 

Personnel Constraints Aid or P-CON and the Personnel-Based System Evaluation 

Aid or PERSEVAL. 1 

P-CON. estimates personnel quality constraints. More specifically, the P-CON Aid 

estimates the future distribution of key personnel characteristics. These 

distributions describe the numbers and percentage of personnel that will be 

available at each level of the personnel characteristics. The P-CON Aid also 

provides guidance to help Army analysts and contractors understand the impacis of 

setting constraints at different personnel characteristic levels. For example, the 

P-CON Aid will display the levels of performance that can be expected at each of 

these levels. An analyst can us’e the information on expected performance to set 

personnel constraint levels for each characteristic. 

The P-CON Aid first estimates what the future distribution of the personnel 

characteristics will be. Then, it uses results from analyses of the Project A data 

base to show what levels of performance are achievable at different characteristic 

levels. The user may then use the information on both personnel availability and 

performance to identify minimum acceptable levels for each personnel 

characteristic. 

PER-SEVAL. The PER-SEVAL Aid determines what level of personnel 

characteristics is needed to meet system performance requirements given a 

particular contractor’s design, fixed amounts of training, and the specific 

conditions of performance under which the system tasks will be performed. 

The PER-SEVAL Aid has three -basic components. First, PER-SEVAL has a set of 

performance shaping functions that predict performance as a function of ASVAB 

area composite and training. Separate functions are provided for different types of 

tasks. The primary data source for developing the functions were results from a 

regression analyses from the Project A data base. Second, the PER-SEVAL Aid has 

a set of stressor degradation algorithms that degrade performance to reflect the 

presence of critical environmental stressors. Third, the PER-SEVAL Aid has a set 

of operator and maintainer models that aggregate the performance estimates of 

individual tasks and produce estimates of system performance. 

RECONCILING THE JOB-BASED SYSTEM AND SYSTEM-BASED APPROACH TO 

APTITUDE REQUIREMENTS ASSESSMENT 

Assessment of aptitude requirements requires consideration of the impact of 

aptitudes on “performance”. The personnel and the system development 

communities have different conceptualizations of performance. The personnel 

community tends to focus on “job performance” while the system development 

1 HARDMAN III is a major developmental effort of ARI’s System Research 

Laboratory. Its objective is to develop a set of automated aids to assist Army analysts 

in conducting MANPRINT assessments during the Materiel Acquisition Process 

(MAP). 

253 

.

community tends to focus on “system performance” Traditionally, most of the 

previous’work on assessing aptitude requirements has been based on the job 

performance perspective. Yet, aptitude requirements (i.e. ASVAE3 area composite 

cutoffs] are set for occupational specialties not weapon systems. The tasks 

associated with a particular weapon system may only constitute a subset of the total 

amount of tasks assigned to a particular occupational specialty. 

Figure 1 displays a strategy for linking the job-based and system-based approaches 

for aptitude assessment. Prior to the development of the new system, the 

personnel community will set a ASVAl3 area composite for each MOS. It is assumed 

that the process for setting this cut-off will include consideration of the impact of 

the cut-off on “job performance.” During the system development process, the,P- 

CON and PER-SEVAL tools can be applied to determine the impact of the cut-off on 

system performance. P-CON can be used to project what the future distribution of 

personnel will be at or above the cut-off and PER-SEVAL can be used to determine 

if this populatitin can successfiilly meet system performance requirements. If 

system performance is adequate, no change in aptitude cut-off is needed. If system 

performance is not adequate, the possibility of using higher cut-offs can be 

examined. The P-CON tool can be used to examine the impact of higher cut-offs on 

personnel availability (i.e, the numbers of people at or above the cut-off). P-CON 

outputs can be used to assess the impact of personnel availability on the Army’s 

ability to provide the manpower to successfully man the new system. Another ARI 

tool, the Army Manpower Cost System or AMCOS. can be used to assess the 

personnel costs associated with recruiting higher aptitude personnel. The 

information on system performance, personnel availability, and personnel costs can 

then be used by the personnel community in reassessing the MOS cut-off. It is 

assumed that this assessment will consider the impact of the aptitude change on 

total “job performance.” 

Personnel 

Community 

LSets MOS 

Cut-Off 

‘J 

pact of 

riig her 

. . . . . . ..-I..- -n 

AVellaOlllly 

and Cost 

I V I 

t 

Figure 1. Potential relationships between job and system perspectives 

254 

Personnel 

Community 

Reassess 

Cut-OH

Currently. most personnel psychologists view job performance as multidimensional 

construct. For example, using data obtained from the Project A study, Campell, 

McHenry. and Wise (1990) have developed a model of Army job performance that 

has five factors: core technical proficiency, general soldier proficiency, effort and 

leadership, personal discipline, and physical fitness and military bearing. Clearly, 

performance on system performance tasks is closely related to one of these 

components--technical proficiency. As Sadacca, Campell. Difazio. Schultz, and 

White (1990) have pointed out the utility of the different job components may vary 

across jobs. The need to raise a particular MOS ASVAB cut-off will depend on how 

much importance Army decision makers attach to technical proficiency vice the 

other job components for the particular MOS being investigated. 

’ REFERENCES 

Army Regulation 602-2, 19 April 1990, Manpower and Personnel Integration 

(MANPRINT) in the Materiel Acquisition Process 

Campell JP, McHenry JJ, and Wise LL. (1990) Modeling job performance in a 

population of jobs. PERSONNEL PSYCHOLOGY, 43, 313-333. 

DOD Directive 5000.53, 30 December 1988, Manpower, Personnel, Training, and 

Safety in the Defense Acquisition Process 

Sadacca R, Campell JP. Difazio AS, Schultz SR. and White LA (1990). Scaling 

performance utility to enhance selection/classification decisions. PERSONNEL 

PSYCHOLOGY, 43.367-378. 

System Research and Applications Corporation. (1990). Army Manpower Cost 

System Active Component Life Cycle Cost Estimation Model Information Book. 

Arlington, Va. 

255 

.

The Practical Impact of Selecting TOW Gunners 

with a Psychomotor Test 

Amy Schwartz and Jay Silva 

The U.S. Army Research Institute for the 


The ongoing reduction in defense forces has focused the 

interest of Army management on how to maintain current deterrent 

and combat power with fewer soldiers. One approach is to improve 

the person-to-job match in entry positions. A better match may 

lead to lowered attrition and better performance among those 'who 

are selected. New selection tests, developed through the Army's 

Project A (Campbell, 1990), have been shown to contribute 

significantly to the prediction of training performance in a 

variety of MOS (e.g., Busciglio, 1990; Busciglio, Silva & Walker, 

1990). If these tests were used to classify recruits who have 

been selected into a family of MOS, an increase in assignment 

efficiency into specific MOS could result. 

One application of a newly developed psychomotor test is the 

prediction of 11H TOW (Tube-launched, Optically-tracked, &ireguided) 

gunner performance. Currently, recruits are accessioned 

into the generic MOS 11X (Infantryman) using the Combat (CO) 

composite of the Armed Services Vocational Aptitude Battery 

(ASVAB) . They are later classified into one of four Infantry MOS 

including 11H TOW gunners. Previous research found that 

psychomotor tests, especially one which required two-hand 

tracking of a target (Two-hand Tracking test), accounted for a 

significant amount of variance of simulated gunnery performance 

beyond that explained by the ASVAB Combat composite for TOW 

gunners (Silva, 1989). 

The present analyses examined the practical benefits of 

using scores on a psychomotor test to select TOW gunners. First, 

the potential performance gains that can be accomplished with the 

additional test were examined. However, performance gains for 

11H's may result in decreases in the quality of recruits in the 

remaining Infantry MOS. Determining the overall effect of 

implementing the new test ideally would require criteria 

performance data for all Infantry MOS. Since these data were not 

available, the impact of the additional test was examined by 

comparing general quality of recruits selected into the 11H MOS 

with that of the remaining recruits who would be assigned to the 

other MOS in the 11 series. Armed Forces Qualifications Test 

(AFQT) scores, which are currently an accepted measure of 

quality, were used for this comparison. Thus, the purpose of 

this research is to demonstrate the contribution of Two-hand 

Tracking to predicting TOW gunner performance, while considering 

the general impact of implementing the new tests for 

classification purposes. 

256 

.

Sample 

METHOD 

The sample consisted of 911 recruits initially selected as 

11X Infantrymen based on a minimum CO composite score of 85 who 

were then classified as 11H TOW gunners. For the present 

purposes, the 11Hls were assumed to have been randomly chosen 

from 11X's and therefore contain the same properties as the 11X 

population. In order to test this assumption, t-tests were 

conducted comparing the AFQT and CO mean scores of the current 

sample (AFQT M=56.66, co $$=109.71) with those of a sample of' 

17,000 11X's (AFQT &=57.82, Co IJ=110.22) and there were no 

significant differences in the means. Because of this 

comparability, the current sample of 11H's were considered to be 

representative of the total 11X population. 

Procedure 

Recruits were given the Two-hand Tracking test along with 

other psychomotor measures during in-processing at the Reception 

Battalion. Classification of the examinees into specific MOS was 

not based on Two-hand Tracking scores. The procedure for 

assignment appears to be based on demand from each of four 

possible assignment MOS. During TOW gunnery training, gunnery 

data were collected using high-fidelity gunnery simulators. 

Measures 

Two-hand Tracking. This test measures two-hand coordination on a 

scale of distance from target accuracy. This score has been 

standardized (T distribution) and inverted such that a higher 

score indicates better performance than a lower score. 

Combat. This composite is the sum of four standardized ASVAB 

subtests: Arithmetic Reasoning (AR), Auto and Shop Information 

(ASI t Coding Speed (CS) and Mechanical Comprehension (MC). A 

score of at least 85 is needed to qualify for the 11X MOS. 

Combined Score. This is the optimally weighted predicted 

composite of both Two-hand Tracking and Combat. 

Training course performance. The criterion scores indicate 

performance on a TOW anti-tank gunnery simulator which requires 

the gunner to track a moving target (i.e., a target mounted on a 

moving vehicle) through an infrared optical device. The two 

measures of interest in the present study include Event 3, the 

trainee's score on the first qualifying set (an index of time on 

target) and Pass 3, whether the trainee passed or failed on the 

first qualifying set. 

257

RESULTS AND DISCUSSION 

Table 1 shows the correlations among the predictors and 

criteria. The joint effect of using both CO and Two-hand 

Tracking scores has been included under the heading 'combined.' 

The correlations of 'combined' with Combat and Two-hand Tracking 

are provided as an indication of the weights of each predictor in 

the optimal linear combination. 

To evaluate the practical significance of this method it 

must first be demonstrated that the proposed predictors will' 

improve the prediction of training performance. This is 

supported by the multiple regression results (see Table 2). The 

predictors significantly explain variance in performance on Event 

3 when they are used alone [CO F(1,909)=49,.84, ~

Table 1 

Correlation Matrix of Predictors and Criteria 

Pass 3 . 76** 

Combat . 23** 

Two-hand .31** 

Criteria 

Predictors 

Event 3 Pass 3 Combat Two-hand 

% 

. 12** 

. 23** . 34** 

Combined .33** . 23** . 68** . 92** 

Note. **pc.OOOl. llCombinedlt represents the correlation between 

the predictor and the predicted values based on a linear 

combination of both predictors. 

A second practical concern is the quality of the remaining 

recruits to be assigned to the MOS in the 11X series. If all of 

the recruits who score high on the additional tests are placed 

into one MOS, the remaining MOS will receive less qualified 

individuals. It has already been demonstrated that Two-hand 

Tracking is as good (if not better) a predictor of TOW gunner 

performance as CO. It remains to be shown that selection based 

on Two-hand Tracking will lead to less of a decrease in the 

quality of the remaining recruits than CO. 

Table 2 

Predicting Performance on Event 3 Using CO and Two-Hand Tracking 

Model R square df F 

co . 052 l/909 49.84** 

2-Hand 

Tracking 

co & 

a-Hand 

Tracking 

. 094 l/909 94.06** 

. 111 2/908 56.69** 

Note -0 **p

_____ I- ..-.- --_ - ..-.. * .._. - -._._. ._.. .._.. --- ___- --...-. ._~. _ . ..____ __..... ~_ 

Table 3 

Mean Performance at Cutoffs for AFQT, Actual and Predicted Scores 

on Event 3 and Passing Rate at Event 3 

Predictor SR AFQT Actual Pred. Pass/fail 

Score at Score at Event 3 

Event 3 Event 3 

Combat 20% 78.14 646.95 650.96 .879 

Two-hand 20% 

Tracking 

50% 

50% 68.40 633.38 633.52 .850 

80% 61.38 620.44 619.49 -834 

80% 

Combat & 20% 

Two-Hand 

Tracking 50% 

80% 

63.75 650.66 659.04 .912 

60.79 640.39 640.35 .890 

59.51 627.41 " 624.20 -853 

71.10 654.57 664.02 ,907' 

64.97 643.38 643.15 .881 

60.17 625.28 625.40 .848 

No Selection 56.66 609.94 .814 

Raising the cutoff on CO would lead to higher mean AFQT scores 

for llH, especially at lower SRs. This would lead to a depletion 

of high ASVAB quality recruits for the other Infantry MOS. 

Selection based on Two-hand Tracking scores also increases the mean 

AFQT score for llK, but to a much lesser extent than either CO or 

the two predictors combined. For example, a 50% SR on Two-hand 

Tracking would produce a higher pass rate on Event 3 than a 20% SR 

using CO, yet it would lead to much less of an increase in mean 

AFQT scores. Therefore, Two-hand Tracking, compared to CO, is 

better able to minimize AFQT impact while improving outcomes for 

11H's. 

While the current research demonstrates a smaller depletion ot‘ 

AFQT scores in remaining MOS when Two-hand Tracking is used as a 

classification test, future research must be conducted to evaluate 

the potential impact of this system on training or on-the-job 

260 

I 

/

performance criteria. Classification is most efficient when 

different skills are required for the jobs being filled. If twohand 

tracking is equally important for all Infantry positions, 

selecting TOW gunners with this test may not be appropriate and 

will result in a depletion of necessary tracking skills of recruits 

from the 11 MOS. follow-up work can examine this by collecting 

performance data from several Infantry MOS and examining the 

results of using a battery of tests to determine assignment. 

These results assume that training performance reflects onthe-job 

performance,. Some initial results in the field indicate 

that this is true, In addition, there is some limitation in the 

effectiveness of Two-hand Tracking as a predictor in this context, 

since the sample was already preselected based on a CO cutoff of 

85. More gains would most likely be found if psychomotor tests are 

given before recruits are assigned to even a family of MOS. 

However, the present results suggest that with only a slight 

modification of the present system, the addition of a psychomotor 

test can lead to improved selection without greatly depleting the 

quality of the recruits remaining for assignment in the remaining 

MOS. 

References 

Busciglio, H. H. (1990). The Incremental Validity of Snatial and 

Perceptual-Psychomotor Tests Relative to the Armed Services 

Vocational Attitude Battery. (AR1 Technical Report 883). 

Alexandria, VA: U.S. Army Research Institute. 

Busciglio, H. H., Silva, J., and Walker, C. (1990). The Potential 

of New Army Tests to Improve Job Performance. Paper 

presented at the 1990 Army Science Conference. 

Campbell, J. P. (1990). An overview of the Army selection and 

classification project (Project A). Personnel Psvcholoqy, 

43, 231-239. 

Silva, J. M. (1989). Usefulness of Spatial and Psychomotor Testing 

for Predictins TOW and UCOFT Gunnerv Performance. (AR1 

Working Paper WP-RS-89-21). Alexandria, VA: U.S. Army 


261 

..

Backaround 

VALIDATION OF A NAVAL OFFICER SELECTION BOARD 

Captain J.P. Bradley 


Willowdale, Ontario, Canada M2N 6B7 

Introduction 

In 1976, the Canadian Navy established the Naval Off,icer __ 

Interview Board (NOIB) for the purpose of selecting applicants for 

the Maritime Surface and Sub-surface (MARS), and Maritime Engineer 

(MARE) officer occupations. There were two components to the NOIB, 

a selection interview,‘ conducted by a panel of senior naval 

officers, and an orientation program, consisting of tours of naval 

facilities and briefings by naval officers. The purpose of the 

orientation component was to ensure that candidates would be able 

to make an informed decision to join the Navy if selected by the 

NOIB. 

By 1983, the NOIB had not reduced attrition among MARS and 

MARE trainees; therefore, the Naval Officer Selection Board (NOSB) 

was developed. The NOSB retained the orientation component and 

interview of the former NOIB but incorporated other assessment 

instruments to achieve a multi-method approach to the assessment 

of naval officer potential. In 1989, the NOSB was renamed the 

Naval Officer Assessment Board (NOAB). 

To become qualified MARS officers, candidates must complete 

four phases of training; the Basic Officer Training Course (BOTC), 

required of all Canadian Forces (CF) officer applicants regardless 

of military occupation, and three phases of MARS occupation 

qualification training. An evaluation of the NOAB's ability to 

predict success on BOTC by Okros, Johnston, and Rodgers (1988) 

demonstrated that: (a) the NOAB predicted BOTC performance better 

than CF recruiting centre (CFRC) measures; (b) the optimal 

combination of predictors produced a multiple correlation of .40; 

and (c) the file review was identified as the best single NOAB 

predictor of BOTC with a correlation of .31. The present study 

complements the BOTC validation study and examines the ability of 

the NOAB to predict MARS occupation training success. 

Subjects 

Method 

Of the 743 MARS candidates who have attended the NOAB, the 95 

who have gone on to complete all phases of MARS training comprised 

the sample for this validation study. The subjects in this study 

were male. Female applicants have attended the NOAB since 1988, 

but none have completed MARS occupational training to date. 

262

Variables 

Criteria. Two measures of success on MARS occupation training 

were used: (a) grades on the third phase (MARS III); and (b) grades 

on the fourth. phase (MARS IV) of MARS training. 

Predictors. Operational predictors used by the NOAB to assess 

MARS,candidates included: (a) an interview; (b) a file review (an 

evaluation of the biographical data collected by the CFRCs); (c) 

a conducting officer's assessment; (d) performance in a practical 

leadership exercise; (e) performance in a leaderless group 

discussion; and (f) a NOAB merit score (a weighted combination of .. 

NOAB measures). Experimental predictors included: (a) the Problem 

Sensitivity Test (PST); and (b) the Passage Planning Test (PPT). 

CFRC predictors included.: (a) a military potential score provided 

by CFRC staff; and (b) a measure of tested learning ability based 

on the CF General Classification (GC) Test. The relations between 

BOTC performance and MARS training success were also evaluated. 

Predictina MARS III Performance 

Results 

Although Table 1 shows statistically significant correlations 

between MARS III results and three NOAB predictors -- file review, 

leadership stands, and the NOAB merit score -- multiple regression 

analyses revealed that the leadership stands did not provide any 

incremental prediction beyond that contributed by the file review 

(R = .20). In essence, the prediction afforded by the merit score 

is that provided by the file review. MARS III performance was 

unrelated to the following measures: (a) the interview; (b) the 

conducting officer's assessment; (c) the leaderless group 

discussion; (d) the CFRC military potential score; (e) tested 

learning ability; and (f) performance on BOTC. 

Predictinu MARS IV Performance 

As shown in Table 1, performance on MARS IV was related to the 

file review, NOAB merit score, BOTC performance, and MARS III 

performance. Of all the predictors, the file review accounted for 

the most variance in MARS IV performance. The NOAB merit score 

also correlated with MARS IV performance; however, the predictive 

contribution of the merit score was actually that provided,by the 

file review. Multiple regression analyses also showed that neither 

BOTC nor MARS III performance could account for variance of MARS 

IV beyond that already predicted by the file review (R = .28). The 

following variables were unrelated to MARS IV performance: (a) the 

interview; (b) the conducting officer's assessment; (c) the 

leaderless group discussion; (d) leadership stands; (e) military 

potential; and (f) tested learning ability. 

263

Table 1 

Correlation Matrix of Potential Predictors and Training Criteria 

1. co 

2. -1NT 

3. FR 

4. LS 

2: MS LGD 

7. GC 

a. MP 

9. BOTC 

10. M-3 

11. M-4 

12. PPT 

13. PST 

1 2 3 4 5 6 7 a 9 10 1112 13 

25 

ha 

.44 

.27 

. 63 

. la 

:27 62.21 

:74 31.25 75 

:36 3-c .55 .62 

. 08 .09 . 14 .25 

. 35 .64 .18 .17 .50 

.24 ,28 .22 

.20 -xv-- 20 

. 28 - - :20 

----- 

----- 

.22 

T-27 

.21 I-3-6 

-- 

Note. Only correlations significant to the .05 level are reported 

in this table. Correlations between NOAB operational predictors 

are based on the population of NOAB candidates (n=743). 

Correlations between the NOAB predictors and training criteria are 

uncorrected correlations based on the sample of NOAB candidates 

attending MARS training (n=95). CO = conducting officer, INT = 

interview, FR = file review, LS = leadership stands, LGD = 

leaderless group discussion, MS = merit score, GC = general 

classification test, MP = military potential, BOTC = BOTC grade, 

M-3 = MARS III training results, M-4 = MARS IV training results, 

PPT = passage planning test, PST = problem sensitivity test. 

Experimental Predictors 

Because the PST and PPT have been incorporated into the NOAB 

only recently, there is not yet a sufficient number of candidates 

who have completed the two experimental tests at the NOAB and then 

gone on to complete MARS occupation training to evaluate the 

predictive validity of these tests. In the interim, the concurrent 

validity of the tests was evaluated by administering them to a 

small group of MARS candidates (n = 43 to 122) already in the 

training system. The results of this preliminary research indicate 

that the PPT is related to both MARS III (r = .21) and MARS IV 

performance (r = .30); however, the PST is not related to either 

MARS III or MARS IVtraining success. As shown in Table 1, the PPT 

is unrelated to the file review, suggesting the potential for 

contributing incremental criterion prediction beyond that provided 

by the file review, 

264

Psychometric Properties of NOAE! Predictors 

As a result of the inability of some NOAB predictors to 

provide criterion prediction, an evaluation of the psychometric 

properties of the NOAB exercises was conducted using two 

approaches. 

Factor analvtic approach. The 30 dimensions measured by the 

five NOAB exercises were submitted to a principal components 

analysis (varimax rotation) which produced a seven-factor solution 

shown in Table 2. As illustrated in Table 2, conceptually 

independent dimensions underlying the first four NOAB exercises .' 

loaded on exercise factors rather than on factors with conceptually 

similar dimensions. It appears that these four exercises are 

producing global measures of overall performance on each exercise 

and not measuring exercise dimensions, thereby raising doubt about 

the construct validity of the dimension ratings that comprise each 

of the exercises. Candidates' scores on these exercises may be 

more attributable to the procedures followed by the NOAB than the 

candidates' abilities with respect to the dimensions the four 

exercises are supposed to be measuring. .Table 2 shows that the 

file review was the only NOAB exercise that appeared as a 

multidimensional construct (it measures three different constructs 

-- personal background, military experience, and intelligence). 

In addition, the file review was the only NOAB measure that 

predicted MAPS training performance. The fact that the dimensions 

underlying the file review loaded on factors with other 

conceptually similar dimensions and did not simply load on a file 

review factor provides evidence of construct validity for the 

dimension ratings comprising the file review score and may account 

for the file review's success as an NOAB predictor. 

Multitrait-multimethod matrix annroach. To investigate 

further the notion that the interview, leadership stands, 

conducting officer's assessment and leaderless group discussions 

are actually producing one global measure for each exercise without 

regard to the dimensions contained in the exercise, the 

correlations of conceptually similar across-exercise dimensions 

(similar dimensions measured by different selection exercises) were 

evaluated using the method described by Campbell and Fiske (1959). 

As reported in Bradley (19901, the correlations between 

conceptually similar across-exercise correlations were lower than 

correlations between conceptually independent within-exercise 

dimensions, thereby lending further support to the notion that 

method variance is contaminating the measurement of NOAB exercise 

dimensions. 

Discussion 

The results of this validation research can be summarized as 

follows: (a) the file review is the only NOAB measure that predicts 

both training criteria; (b) the leadership stand assessment 

265

Table 2 

Factor Structure of NOAB Dimensional Measures 

Dimension/measure I II III IV v VI VII 

Interview: 

self-confidence 

.83 

presence/bearing 

78 

verbal expression 

:69 

enthusiasm 

82 

desire for MARS 

:79 

suitability for naval role .81 

ability to become naval officer .80 

Leadership task: 

initiative/decisiveness 

seeking/accepting advice 

preparation and planning 

communicating effectively 

directing others 

creating team performance 

Leaderless arouo discussion: 

persuasiveness/forcefulness _ 

self-confidence/bearing 

communication skills 

leadership/maintaining the aim 1 

alertness - 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

.88 

.50 

.81 

.85 

.86 

.72 

Conducting officer's assessment: 

supporting/cooperating with others - 

effectiveness of leadership behaviour _ 

individual effort and drive - - 

desire for MARS 

suitability for naval environment- 1 

File review: 

family background - 

military/pars-military experience- _ 

military potential - 

employment history - 

educational achievement - 

tested learning ability - 

other activities/interests - - 

- 

- 

- 

- 

- 

- 

- 

- 

- -- 

- 

- - 

- - 

.84 

.80 

.80 

.79 

.67 

- 

- 

- 

- 

- 

- ------ 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

.77 - 

.75 - 

.81 - 

.74 - 

.80 - 

Note. Only factor loadings greater than -44 are included in this 

table. 

. _ ~ . 

266 

- 

- 

- 

- 

- 

- 

.44 

- 

.76 

.68 

.z5 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

.a;3 

.44 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

.83 

c 

I 

I

predicts MARS III training success, but does not improve the 

prediction of MARS III beyond that already provided by the file 

review, and the leadership stand assessment does not predict MARS 

IV training success; (c) the NOAB merit score predicts MARS III and 

MARS IV performance, but all of this criteria prediction actually 

originates with the file review; (d) neither the interview, 

conducting officer, leaderless group discussion, nor the two CFRC 

measures -- tested learning ability and military potential -provide 

any prediction of MARS III or MARS IV training success; (e) 

the file review is the only NOAB measure that appears to be 

psychometrically sound; (f) the other four operational measures 

(interview, leadership stands, conducting officer, and leaderless _ 

group discussion) require a psychometric overhaul (or replacement); 

and (g) of the two experimental NOAB measures, the PPT has the most 

potential for use as an operational NOAB predictor. 

Based on this study and the earlier BOTC validation by Okros 

et al. (1988), it has been recommended that the NOAB be retained 

as the assessment method for selecting MARS candidates and that 

efforts be made to improve the board's predictive efficacy by: (a) 

increasing the construct validity of exercise dimensions; (b) 

investigating the potential for applying situational interview 

methods and patterned behavioural interview techniques; (c) 

improving the predictive efficacy of the leadership stands; (d) 

evaluating the predictive efficacy of the General Classification 

(GC) test; and (e) developing new selection measures to replace 

the leaderless group discussion and conducting officers' 

assessments. 

References 

Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant 

validation bY the multitrait-multimethod matrix. 

Psvcholoaical Bulletin, x, 81-105. 

Bradley, J.P. (1990). A validation studv on the Naval Officer 

Assessment Board's abilitv to predict MARS Officer training 

success. (Working Paper 90-7). Willowdale, Ontario: Canadian 

Forces Personnel Applied Research Unit. 

Okros A.C., Johnston V.W., and Rodgers, M.N. (1988). An evaluation 

of the effectiveness of the Naval Officer Selection Board as 

a Dredictor of success on the Basic Officer Traininu Course 

(Working Paper 88-l). Willowdale, Ontario: Canadian Forces 

Personnel Applied Research Unit. 

267

A Situational Judgment Test of Supervisory Knowledge in the U.S. Army1 

Mary Ann Hanson 

Personnel Decisions Research Institutes, Inc. 

Walter C. Borman 

The University of South Florida and Personnel Decisions Research Institutes, Inc. 

A situational judgment test involves presenting respondents with realistic job situations, usually 

described in writing, and asking them to respond in a multiple choice format regarding what should be 

done in each situation. Situational judgment tests have been developed by other researchers to predict 

job performance, especially for management and supervisory positions (e.g., Motowidlo, Carter, & Dunnette, 

1989; Tenopyr, 1969). 

This paper describes the development, field test, and preliminary construct validation of a situational 

judgment test designed to measure supervisory skill for non-commissioned officers (NCOs) in the U.S. 

Army. In contrast with most previous research the Situational Judgment Test (SJT) is a criterion measure 

of job performance. It is targeted at first line supervisors (ranking E-5), and is intended to evaluate 

the effectiveness of their judgments about what to do in difficult supervisory situations. Thus, the SJT is 

somewhat like a job knowledge test in the supervisory part of the job. Although no research is available 

on the use of situational judgment tests as criterion measures, there is research available on the usefulness 

of written simulations - which are similar to situational judgment tests - as measures of professional 

knowledge in fields such as law and medicine. Researchers have found that scores on written simulations 

differentiate between groups with differing levels of experience or training and are often related to other 

measures of professional knowledge or performance (see Smith, 1983 for a review). 

Development of the SJT 

Method 

Development of the SJT involved asking groups of soldiers similar to the target NCOs (i.e., E-4s and 

E-5s) to describe a large number of difficult but realistic situations that Army first-line supervisors face 

on their jobs. Once a large number of these situations had been generated, a wide variety of possible 

actions (Le., response alternatives) for each situation were gathered, and ratings of the effectiveness of 

each of these actions were collected from both experts (senior NCOs) and the target group (E-5 NCOs in 

beginning supervisory positions). These effectiveness ratings were used to select situations and response 

alternatives to be included in the SJT. The effectiveness ratings from the senior NCOs (i.e., experts) were 

also the basis for the development of SJT scoring procedures. Each of these steps is described in more 

detail below. 

Participants in the workshops to develop situations and response alternatives were 52 NCOs from 

nine different Army posts. Some were NCOs from the target sample and some supervised target NCOs 

(ranks ranging from E-5 to E-6). A variation of the critical incident technique (Flanagan, 1954) was used 

to collect situations to be used as the item stems. Workshop participants were asked to write descriptions 

of difficult supervisory situations that they or their peers had experienced as first-line supervisors in the 

Army. This resulted in a pool of about 300 situations. Response alternatives were primarily generated by 

presenting participants in later workshops with the situations that had been collected and asking them to 

write, in two or three sentences, what they would do to respond effectively in that situation. This resulted 

in about 15 possible responses for each situation, These responses were content analyzed and grouped to 

reduce redundancies. The final result was four to ten response alternatives for each situation, with a 

mean of about six response alternatives. 

1 This research was funded by the U.S. Army Research Institute for the Behavioral and Social Sciences, 

Contract No. MDA903-82-C-053 1. All statements expressed in this paper are those of the authors and do not 

necessarily reflect the official opinions or policies of the U.S. Army Research Institute or those of the Department 

of the Army. 

268 

._ 

---r 

I

One-hundred and eighty of the most promising situations were then chosen based on their content 

(e.g., appropriately difficult, realistic, etc.) and the number of plausible response alternatives available. 

For each of these 180 situations retained, information concerning the effectiveness of the various response 

alternatives was collected from two groups, a group of expert NCOs and a group of the target 

population NC0 job incumbents. The expert NCOs were 90 students and instructors at the United States 

Army Sergeants Major Academy. These NCOs were among the highest ranking enlisted soldiers in the 

Any (rank of E-8 to E-9), and all had extensive experience as supervisors in the Army. The target 

NCOs were 344 second tour soldiers (rank of E-4 to E-5) who were participating in a field test of a group 

of job performance measures at several Army posts in the United States and Europe. For each SJT situation, 

these respondents were asked to rate the effectiveness of each response alternative on a seven point 

scale (1 = least and 7 = most effective). Because there were still 180 situations and time limitations, each 

soldier could only respond to a subset of the situations. This resulted in about 25 expert NC0 and 45 

incumbent NC0 responses per situation. 

Items (situations) for the field test version of the SIT and response alternatives for these items were 

then selected based on these data, The following criteria were used to select 35 of these situations and 

from 3-5 response alternatives for each situation: 1) the expert group had high agreement concerning the 

most effective response for the item; 2) the item was difficult for the incumbents (i.e., agreement was 

substantially lower than for the expert group); 3) the difference between the expert and the incumbent 

responses for each situation was judged to reflect an important aspect of supervisory knowledge; and 4) 

the content of the final group of situations was as representative as possible of the first-line supervisory 

job in the Army. 

Field Test of the SJT 

The field test of the SJT had three major objectives. The first objective was to explore different 

methods of scoring the SJT. The second objective was to examine and evaluate the psychometric properties 

of this instrument. The final objective was to obtain preliminary information concerning the consttuct 

validity of the SJT as a criterion measure of supervisory job knowledge. 

The SJT was administered as part of a larger data collection effort to a sample of 1049 NCOs (most 

were E-4s and E-5s) at a variety of posts in the United States and Europe. For each of the 35 SJT items, 

these soldiers were asked to place an “M” next to the response alternative they thought was the most 

effective and an “L” next to the response alternative they thought was the least effective. 

Scoring Procedures. Several different procedures for scoring the SJT were explored. The most 

straightforward was a simple number correct score. For each item, the response alternative that had been 

given the highest mean effectiveness rating by the experts (senior NCOs) was designated the “correct” 

answer. Respondents were then scored based on the number of items for which they indicated that this 

“correct” response alternative was the most effective. The second scoring procedure involved weighting 

each response alternative chosen by soldiers as the most effective by the mean effectiveness rating given 

to that response alternative by the expert group. This gives respondents more credit for choosing 

“wrong” answers that are relatively effective than for choosing wrong answers that are very ineffective. 

These item level effectiveness scores were then averaged to obtain an overall effectiveness score for each 

soldier, Averaging these item level scores instead of simply summing them placed respondents’ scores 

on the same 1 to 7 effectiveness scale as the experts’ ratings and ensured that respondents were not penalized 

for any missing data (up to 10% missing responses were allowed). 

Scoring procedures based on respondents’ choices for the least effective response to each situation 

were also explored. The ability to identify the least effective response alternatives might be seen as an 

indication of respondents’ ability to avoid these very ineffective responses or in effect to avoid “screwing 

up”, As with the choices for the most effective response, a simple number correct score was computed: 

the number of times each respondent correctly identified the response alternative that the experts rated 

the least effective. In order to differentiate this score from the number correct score based on choices for 

the most effective response, this score will be referred to as the L-Correct score, and the score based on 

choices for the most effective response (described previously) wit1 be referred to as the M-Correct score. 

Another score was computed by weighting respondents’ choices for the least effective response altema- 

269

tive by the mean effectiveness rating for that response, and then averaging these item level scores to 

obtain an overall effectiveness score based on choices for the least effective response alternative. This 

score will be referred to as L-Effectiveness, and the parallel score based on choices for the most effective 

responses (described previously) will be referred to as M-Effectiveness. 

Finally, a scoring procedure that involved combining the choices for the most and the least effective 

response alternative into one overall score was also explored. For each item, the mean effectiveness of 

the response alternative each soldier chose as the least effective was subtracted from the mean effectiveness 

of the response alternative they chose as the most effective. Because it is actually better if 

respondents indicate that less effective response alternatives are the least effective, this score can be seen 

as a sum or composite of the two effectiveness scores described previously (i.e., subtracting a negative 

number from a positive number is the same as adding the absolute values of the two numbers). These 

item level scores were then averaged together for each soldier to generate yet another score, and this 

score will be referred to as M-L Effectiveness. 

Descriotive Statistics. Descriptive statistics and internal consistency reliability estimates (RR-20) 

were computed for each of the five scoring procedures. Intercorrelations were also computed among the 

five scores generated by the five different scoring procedures. 

Preliminarv Information Concerning Construct Validity 

The data from this field test were also used to obtain preliminary information concerning the construct 

validity of the SJT as a criterion measure supervisory job knowledge. As mentioned previously, collecting 

the field test data for the SJT was a part of a larger data collection effort. Several other job performance 

measures were administered concurrently with the SJT, including job knowledge tests, a self-report 

administrative information survey, and supervisory simulation exercises (involving training a subordinate, 

disciplinary counseling, and personal counseling). Performance ratings were also collected from 

peers and supervisors using behavior-based rating scales. If the SJT is a valid measure of supervisory job 

knowledge, certain relationships would be expected with these other measures. For example, it should 

have at least moderate correlations with the scores on the supervisory simulations and performance ratings 

on supervisory dimensions. Correlations of SJT scores with several of these other job performance 

measures were examined. 

Another type of information that was used to assess the construct validity of the SJT was the extent 

to which the knowledges assessed by the SJT am learned on the job. If the SJT is a valid measure of job 

knowledge, soldiers who have more experience or training would be expected, on average, to obtain 

higher scores than soldiers with less experience or training. Self report information was collected from 

the soldiers in this field test sample concerning whether or not they had attended any supervisory training 

and how regularly they were required to supervise other soldiers, Mean SJT scores for soldiers with 

different levels of training and experience were also examined. 

Field Test Results 

Results 

Table 1 presents the mean score for each of the five scoring procedures. The maximum possible for 

the M-Correct scoring procedure is 35 (i.e., all 35 items answered correctly), but the mean score obtained 

by soldiers in this sample was only 16.25. The maximum score obtained was only 27. The mean number 

of least effective response alternatives correctly identified by this group was only 14.86. Clearly the SJT 

was difficult for this group of soldiers. 

Table 1 also presents the standard deviation for each of the five scoring procedures, and all of the 

scoring procedures resulted in a reasonable amount of variability in scores obtained by the soldiers in this 

sample. Table 1 also shows that the internal consistency reliabilities for all of these scoring procedures 

are quite high. The most reliable score is M-L Effectiveness, probably because this score contains more 

information than the other scores (i.e., choices for both the most and least effective response). 

.~ ., 

f---~..-wz.-.-.. _ .,._ _ 2 7 0

Table 2 presents the intercorrelations among scores obtained using the five different scoring ptocedures. 

These intercorrelations range from moderate to very high. Correlations between scores that are 

based on the same set of responses (e.g., M-Correct with M-Effectiveness) are higher than cortelations 

between scores that are based on different sets of responses (e.g., M-Correct with L-Correct). The correlations 

between L-Effectiveness and the other scores are negative, because lower L-Effectiveness scores 

are actua.lly better. The high (negative) correlation between M-Effectiveness and L-Effectiveness seems 

to indicate that these two scores measure similar or related constructs. 

Table 1 

Situational Judgment Test Means, Standard Deviations, and Internal Consistencies 

Scoring Procedure N Mean SD 

Internal 

Consistency 

Reliability’ 

M-Correct 10253 5. 16.52 4.29 50 

M-Effectiveness lo253 4.91 .34 .68 

L-Correct 10073 14.86 3.86 .57 

L-Effectiveness loo73 3.542 .31 .68 

M-L Effectiveness loo73 1.36 .61 .75 

’ KR-20 ’ 

2 Low scona indicate higher performance. 

3 Soldiers with mom than 10% incomplete or invalid data were anittad from these a&‘ses. 

Table 2 

Situational Judgment Test Score Intercorrelations for the Five Scoring Procedures 

M-Eff. L-Correct L-Eff. M - L Eff. 

M-Correct .94 52 44 .86 

M-Eff. -me .59 -.70 .93 

L-Correct m e- -*- -.86 .78 

L-Eff. m-e v-m --_ -.92 

M - L Eff. -w- we- --- w-m 

Note. Sample sizes range from 1007 to 1025. 

271 

.._ II 

..----... ._.___._ I- 

.

-- .- ..-- -_*- --.._ --l_*y -r 

The M-Correct and L-Correct scores have less desirable psychometric properties than the scores 

obtained using the other three scoring procedures. In addition, these two scores contain information that 

is very similar to the information provided by the M-Effectiveness and L-Effectiveness scores respectively, 

because they are based on the same sets of responses. Thus, results reported for the remainder of the 

analyses will not include these two scores. 

Preliminarv Information Concerninp Construct Validity 

Table 3 shows the correlations of the three remaining SJT scores with scores from the other job 

performance measums. The SJT SCOIW cormlate moderately with a composite of scores on the three 

superviscry simulations. The SJT scores also have moderate correlations with the performance rating 

composite called Leading/Supervising. Correlations with the other performance rating composites are 

slightly lower. Correlations with scores on the job knowledge tests are quite high, but this is not surprising 

in view of the fact that these are also paper-and-pencil tests. Finally, the SJT scores have moderate 

correlations with a variable called “grade deviation score”, which is essentially promotion rate. Promotion 

rate might be seen as an overall measure of success as a soldier. 

Table 3 

Correlations Between SJT Scores and Other Job Perfomlance Measures 

Performance Rating Composites3 

Effort/ Grade* Supervisory 

Leading/ Technical Personal Military Job Deviation Simulation 

Supervising Performance Discipline %uing Knowledge’ Score Composite4 

M-Eff. .24 .21 .20 .ll .40 .20 .20 

L-Eff. ~18 -.17 -.15 -.06 -.34 -.20 -.I6 

M-L Eff. .22 .21 .18 .lO .40 .22 .20 

___- -....-. ___.-- 

1 Weighted mean across nine MOS; sample size per MOS ranges from 38 to 146. 

2 This variable is essentially promotion rate; sample sizes range from 849 to 919. 

3 Based on pooled peer and supervisor ratings. Sample sizes range fran 855 to 907: a con-elation of .07 is siwifiant at the .05 level. 

4 Composite of scores ftwft three. simulations: personal counseling, disciplinary counseling, and training. Sample 

aizcs range from 873 to 909, a correlation of .07 is significant at the .OS level. 

Table 4 shows the mean SJT scores of soldiers who reported various levels of supervisory training. 

Soldiers who had attended no supervisory school at all scored almost a half a standard deviation lower 

than those who had attended one or mom supervisory schools. One potential confound in this compati.son 

is that the opportunity to attend supervisory schools varies, and decisions concerning which soldiers 

are given the opportunity to attend these schools may be influenced by their effectiveness as soldiers or 

as supervisors. As a result, it is possible that these mean SJT score differences were obtained because the 

more effective soldiers were given the opportunity to attend supervisory training. However, regardless of 

whether these differences are the result of differential opporturities or training in the relevant supervisory 

skills, these mean score differences provide some support for the construct validity of the SJT as a measure 

of supervisory skill. 

Mean SJT scores are also reported on Table 4 for subgroups of soldiers identified by how frequently 

they reported supervising other soldiers. For all three SJT scoring procedures the expected pattern was 

found; soldiers who reported that they supervised other soldiers more frequently obtained better SJT 

sCOms. The largest difference is for the L-Effectiveness score. Soldiers who reported that they regularly 

supervise other soldiers obtained L-Effectiveness scores almost half a standard deviation better (i.e., 

272 

I

lower) than those of soldiers who reported that they never supervise other soldiers. These results for 

supervisory experience are slightly different than those obtained for supervisory training, where the 

largest mean differences were found for the M-Effectiveness score. Perhaps this is because supervisory 

experience sometimes involves making mistakes and leaming from the consequences of these mistakes 

(i.e., learning to identify ineffective responses), but supervisory training is more likely to focus on the 

identification of effective supervisory responses. 

Table 4 

Mean Situational Judgment Test Scores for Soldiers With Different Levels of 

Supervisory Training and Experience 

N M-Eff. L-Eff. M-L Eff. 

Attendedone or more supervisory schools 560-603 4.97 3.50 1.47 

-7 

Attendedno supervisory school 327-371 4.81 3.62 1.20 

How often required to supervise other soldiers: 

Never 87199 4.87 3.63 1.23 

Sometimes fill in for regular supervisor 294-327 4.86 3.58 1.29 

Often fili in for regular supervisor 125-135 4.90 3.53 1.38 

Regularly supervise other soldiers 391415 4.96 3.49 1.47 

Conclusions 

The results of the field test of the SJT indicate that this test is appropriately difficult for the target 

sample. The five scoring procedures that were explored all resulted in scores with a reasonable amount 

of variance among the soldiers in this sample. Internal consistency reliabilities were also quite high. 

Based on all of the psychometric properties examined, the most promising score appears to be M-L Effectiveness, 

which has an internal consistency reliability of .75. 

The preliminary information obtained concerning the construct validity of the SJT provides evidence 

that the SJT is a valid measure of supervisory job knowledge. The correlations of SJT scores with the 

other job performance measures provide some support for the construct validity of the SJT. However, the 

SJT also has moderate correlations with several measures of technical performance and with promotion 

rate. Mean SJT scores for soldiers with different levels of supervisory experience and training indicate 

that the knowledge or skill measured by the SJT is, to some extent, learned on the job and in supervisory 

training. 

REFERENCES 

Motowidlo, S. J., Dunnette, M. D., & Carter, Cl. W. (in press). An alternative selection procedure: The 

low fidelity simulation. Journal of Applied Psychology. 

Smith, I. L. (1983). Use of written simulations in credentialing programs. Professional Practice of 

Psychology, 4.2 l-50. 

Tenopyr, M. L. (1969). The comparative validity of selected leadership scales relative to success in 

production management. Personnel Psychology, 22,77-85. 

273

Context Effects on Multiple-Choice Test Performance 

Lawrence 5. Buck* 

Planning Research Corporation, System Services 

Introduction 

It has long been a tenet of test construction theory and practice that test items 

measuring the same content or behavioral objectives should be grou,ped within a 

test. For example, Tinkelman (1971) stated: 

If items measuring different content objectives or different behavioral 

objectives are included in the same test, consideration should be given to 

grouping the items by type. Usually the continuity of thought that such 

grouping allows on the part of the examinee is found to enhance the 

quality of his/her performance. 

Other rationales for grouping similar items include such viewpoints as: test anxiety 

may be reduced by grouping items on a test, examinees will concentrate better if 

they do not jump from subject to subject, and examinees might glean information 

from certain questions in a set of questions that will facilitate the answering of other 

questions in the set (Gohmann & Spector, 1989). 

A majority of the studies addressing item positioning have centered on the effects of 

ordering questions by difficulty level rather than by content. (For a representative 

sample, see: Hodson, 1984; Sax & Cromack, 1966; Leary & Dorans, 1985; and Plake, 

1980.) Numerous other studies, primarily in the educational arena, have addressed 

the effects of randomizing items in tests rather than presenting the items in the 

order that the information is covered in the classroom or in the textbook(s). (For a 

representative sample, see: Gohmann & Spector, 1989; Taub & Bell, 1975; and 

Bresnock, Graves, & White, 1989). 

The primary focus of this study is the effect on part and total test performance of 

randomizing the items on multiple-choice tests normally constructed with the items 

grouped by content areas or domains. A secondary objective was to evaluate the 

effects on the individual item statistics. The items in the tests in question are 

normally presented from easiest to most difficult within each domain. 

Two tests were selected for this study, Rigging and Weight Testing (BM-0110) and 

Outside Electrical (EM-4613). These tests are part of a testing program which 

develops, administers, and maintains Journeyman Navy Enlisted Classification (JNEC) 

exams for the Navy’s Intermediate Maintenance Activity (IMA) community. The tests 

are part of the qualification process for special classification codes. Both the BM- 

0110 and EM-4613 examinations consist of 120, four-choice, multiple-choice test 

questions spread across six domains as indicated in Table I below. 

Table I 

Test Item-Domain Breakdown 

Domains 

Test # of Items 1 2 3 4 5 6 

BM-0110 120 18 30 14 12 30 16 

EM-461 3 120 10 6 14 55 22 13 

*The author wishes to thank Norma Molina-laggard for her able assistance with the data analyses 

274

For each administration, the tests were generated with a total test and each domain 

mean difficulty index (p-value) of -60. The tests are essentially power tests with 

three hours allowed. The cutting score for each test is based on 62.5% of the 

number of test questions (a score of 75) or the group mean, whichever is higher. The 

cutting score was 75 for each of the tests for each administration. The test items 

were selected in accordance with the following parameters, p-values between -25 

and .90 and biserials between .15 and .99. 

The tests are administered twice yearly, in the spring and fall, to enlisted Navy 

personnel in pay grades E-5 through E-9, with a minimum of nine months experience 

in an IMA activity. BM-0110 was developed in the summer of 1987 and placed into 

operational use in the fall of 1987. EM-4613 was developed in the fall of 1987 and 

placed into operational use in the spring of 1988. All tests in this program *were 

developed by subject-matter experts from each trade under the tutelage of a testing 

specialist. All of the tests are computer generated by an automated test processing 

system (TPS) that includes item banking, scoring, and analysis and updating of all 

test and item data. 

Procedure 

Three different administrations -- Spring 1989 (l-89), Fall 1989 (2-89), and Spring 

1990 (l-90) --were used for this study for both the BM-0110 and EM-4613 tests. Both 

the l-89 and I-90 tests were constructed under normal procedures, i.e., with items 

grouped by domain and presented from easiest to most difficult within each 

domain. For the 2-89 administrations, the test items were randomized without 

regard for content area or difficulty level. 

The items for each administration were generated by the TPS from the total item 

pool available for each test and therefore the items were not identical across 

administrations. Table II presents the number of items common to each pair of test 

administrations. 

Table II 

Common Items Between Administrations 

1-89 - 2-89 1-89 - l-90 2-89 - l-90 

BM-0110 

I 

71 77 

I 

89 

EM-461 3 66 67 

I 

67 

Under ideal conditions, the research design would have used the same items for each 

administration and both forms of the test would have been administered at the 

same time. However, due to a number of factors including fairly small N’s and 

numerous repeat candidates from one test administration to another, the ideal 

design was not possible. The test populations do tend to be quite stable from one 

administration to another, however, in terms of trade experience and numbers from 

each paygrade. 

The test results and item statistics from each administration for each test were 

compared with the other administrations from four different perspectives -- total 

test results, part test scores, common item comparisons, and individual item 

statistics. As previously stated, the objectives were to determine if randomizing the 

items would have any effect on total test performance, part (domain) test 

performance, and individual item statistics. A variety of statistical procedures were 

employed to analyze the data including Z-tests, two-tailed t-tests, and ANOVAs. 

275

Results 

Total Test Performance. With respect to total test performance, the test results 

were quite consistent from administration to administration as reflected in Table III. 

The 2-89 administration seems to be a little easier for both the BM-0110 and EM- 

4613 tests although the differences are small. The test reliabilities also remained 

reasonably consistent across test administrations. 

Table III 

Summary Test Statistics 

BM-0110 EM-461 3 

A Z-test was applied to the mean test scores between paired comparisons, i.e., l-89 

with 2-89, etc., and all results were nonsignificant at the .OS level. In this respect, we 

were unable to reject the null hypothesis for all comparisons. An ANOVA was also 

calculated across each of the three administrations and the results were not 

significant at the .05 level for either the BM-0110 (F[2,359] = -1.183) or EM-4613 

(F[2,359] = .028). 

Table IV below, presents another way of comparing the overall test results as the 

passing rates by paygrade are presented for each administration. The passing rates 

are reasonably consistent across test administrations with somewhat higher 

percentages passing for the 2-89 test. These results are not inconsistent with the 

test results from other tests in the program where some fluctuations occur but the 

passing rates remain fairly consistent for each paygrade. 

Table IV 

Test Results by Paygrade 

BM-0110 

1-89 Passing 2-89 Passing l-90 Passing 

Paygrade N N Oh N N % N N Oh 

E-5 34 13 38 26 11 42 21 7 33 

E-6 9 5 56 10 6 60 12 5 42 

E-7 5 2 40 2 1 50 1 1 100 

E-8 & E-9 0 0 0 

TOTALS 48 20 42 38 18 47 34 13 38 

276

Table IV cont. 

EM-46 13 

Paygrade 1 N 1 N 1 % 1 N I N I % 1 N 1 N 1 % 

E-5 43 16 37 42 9 21 25 6 24 

E-6 44 18 41 42 22 52 31 12 39 

E-7 11 5 45 8 7 88 4 2 50 

E-8 8 E-9 1 0 0 1 1 100 2 2 100 

TOTALS 99 39 37 93 39 42 62 22 35 

Part Test Performance. In addition to evaluating any effects on total test 

performance of randomizing the items it was also considered prudent to consider 

any effects on domain performance. As indicated in Table V below, the results are 

similar to those reported in Table Ill for total test performance. That is, the average 

domain scores are quite consistent across test administrations with the 2-89 

administration being somewhat easier for almost all domains across the three 


Table V 

Average Domain Scores 

BM-0110 EM-461 3 

Randomized complete block design ANOVAs were computed for the domain scores 

across the three administrations of each test and the results were not significant for 

either the BM-0110 or EM-4613, (F[2,17] = 2.36) and (F[2,17] = .015) respectively. 

Common Item Comparisons. Since it was not possible to use the same items in total 

for each of the three test administrations, it was also necessary to evaluate the 

effect, if any, on the subset of common items for each paired comparison. A twotailed 

t-test was used to analyze the items common to each pair of administrations 

and all results for both the BM-0110 and EM-4613 were nonsignificant at the .05 

level, In addition, ANOVAs were calculated for each of the three administrations of 

the BM-0110 and EM-4613 tests and the results failed to reveal any significant 

differences at the .05 level of significance, (F[2,74] = .044) and (F[2,1461 = -720) 

respectively. 

Individual Item Statistics. The issue of any effect on item statistics of varying the 

item’s position was investigated by comparing the item difficulty indexes (p-values) 

of common items in each pair of test administrations as well as the item 

277

discrimination values (biserials). That is, does presenting the items in other than 

their normal domain and without regard to difficulty level, have an effect on the 

items’ statistics? Table VI presents the average p-value changes for the common 

items between the paired test administrations. The first test in each pair served as 

the base item position for comparative purposes. As indicated in Table VI, the 

avera e item p-values showed a somewhat greater tendency to increase (items 

easier 7than 

to decrease although the differences are small. The average overall 

change in the items’ p-values remained quite consistent across the three pairs of test 

administrations for both the BM-0110 and the EM-4613. 

.’ 

Paired 

Administrations 

Table VI 

Comparison of Common Items’ P-Values 

Average P-Value Change By Relative Position 

EM461 3 

Average Overall Average Overall Average Overall 

Change Increase* Decrease* 

l/89 with l/90 .095 .084 .075 .107 .098 

2/89 with l/89 .091 .105 .079 .092 .121 

2/89 with l/90 .099 .095 .117 .087 .I12 

BMOllO 

l/89 with l/90 

2/89 with l/89 

. 

.222 

I 

.199 .055 1.199 .071 

1.246 .091 I.198 .068 

2/89 with l/90 .213 .227 .087 1.190 .082 

I 

I 

*The first column represents the average for the first test of each pair; 

the second column represents the second test. 

With respect to the items’ biserials, Table VII presents the average biserials for the 

common items in each pair of test administrations As was the case with the pvalues, 

the average biserials were quite consistent between paired test 

administrations with the differences quite small. 

Table VII 

Average Biserials for Common Items 

of Paired Test Administrations 

EM-4613 BM-0110 

l/89 with 2/89 .29 .25 .30 .34 

l/89 with l/90 I.32 -25 1.32 .28 

2/89 with l/90 .21 

I 

278 

.22 

.32 .26 

I

Discussion 

This study failed to show that randomizing the items in a multiple-choice test would 

have a deleterious effect on examinees with respect to test performance. If 

anything, the randomized tests were somewhat easier although the differences 

were small and were not significant. The effects on item statistics were minimal as 

the item difficulty indices (p-values) showed no clear trend of increasing or 

decreasing when comparing randomized vs. nonrandomized tests, and the item 

discrimination values (biserials) remained quite consistent across test 


Within the confines of this study, it was not possible to assess examinee reaction to 

the different test formats to discern whether the different item presentations were 

perceived differently by the examinees. Nor was it possible to determine whether 

examinees answer questions in order or tend to skip around and group like 

questions even thou h they are not grouped on the test. Studies by Tuck (1978) and 

Allison and Thomas 91986) have suggested that few examinees answer questions in 

order and that there is a tendency to group similar items. 

The study supports the stability of item statistics across different test formats and 

administrations and the lack of any significant contextual or item position effects on 

test performance. The implications of these findings are, to preclude the possibility 

of cheating, randomized versions of the same tests could be administered without 

fear of creating an unfair advantage or disadvantage. 

References 

Allison, D.E., and D.C. Thomas. 1986. Item-difficulty sequence in achievement 

examinations: Examinees’ preferences and test taking strategies. Psychological 

Review 59,867-70. 

Bresnock, A.E., P.E. Graves, and N. White. 1989. Multiple-choice testing: Questions 

and response position. Journal of Economic Education, (Summer), 239-245. 

Gohmann, SF., and L.C. Spector. 1989. Test scrambling and student performance,. 

Journal of Economic Education, (Summer), 235-238. 

Hodson, D. 1984. The effect of changes in item sequence on student performance 

in a multiple-choice chemistry test. Journal of Research in Science Teaching, Vol. 

21, N. 5,489-495. 

Leary, L.F., and N.J. Dorans. 1985. Implications for altering the context in which test 

items appear: A historical perspective on an immediate concern. Review of 

Educational Research 55, (Fall), 387-413. 

Plake, B.S. 1980. Item arrangement and knowledge of arrangement on test scores. 

Journal of Experimental Education 49, (Fall), 56-58. 

Sax, G., and T.A. Cromack. 1966. The effects of various forms of item arrangements 

on test performance. Journal of Educational Measurement 3, 309-311. 

Taub, A.J., and E.B. Bell. 1975. A bias in scores on multiple-form exams. Journal of 

Economic Education 7, (Fall), 58-59. 

Tinkelman, S.N. Planning the objective test. In R.L. Thorndike (Ed.), Educational 

Measurement (2nd. ed.). Washington, D.C.: American Council on Education, 

1971. 

Tuck, J.P. 1978. Examinee’s control of item difficulty sequence. Psychological 

Reports 42, 1109fO. 

279

ABSTRACT 

DIETARY EFFECTS ON TEST PERFORMANCE 

Charles A. Salter 

Laurie S. Lester 

Susan M. Luther 

Theresa A. Luisi 

U.S. Army Natick Research, Development & Engineering Center 

Natick, MA 

Previous research suggests that meal composition may affect performance 

on the automated Memory and Search Task (MAST). The purpose of this study 

was to determine if lunch protein or carbohydrate would interact with caffeine to 

affect performance and mood as assessed by the MAST, the Automated Portable 

Test System (APTS), and visual-analogue mood scales. Male subjects were 

assigned either to a protein lunch (5 g/kg turkey breast) or a carbohydrate lunch (5 

g/kg sorbet) group so that normal caffeine intakes were equivalent. Within each 

group, subjects rotated through two caffeine conditions in a counterbalanced order, 

drinking two cups of either caffeinated or decaffeinated coffee with lunch. Caffeine 

use was prohibited at other times during the study. The APTS was consistent with 

the MAST in showing no performance effects of protein and caffeine, though 

protein did correlate with some self-reported moods. The protein group reported 

increased hunger over time (p=.OO2) and felt less dejected (p=.O4) than did the 

carbohydrate group, while caffeine produced no significant effects. Greater 

carbohydrate intake was associated with lower MAST scores, though the direction 

of causation is unclear, and it had no effect on the APTS. It is concluded that 

performance on the MAST and APTS are relatively unaffected by dietary 

differences of this type and magnitude. 

INTRODUCTION 

The automated Memory and Search Task (MAST) uses a hand-held 

computer to present stimuli consisting of randomized sequences of 16 alphabetic 

characters each along with randomized targets of 2, 4, or 6 letters that the subject 

identifies as being present within or absent from each stimulus (Salter et al, 1988). 

The first studies with the MAST used it as a tool to assess dietary effects on 

performance (Salter et al, 1988). These studies indicated a significant post-lunch 

slump in MAST scores followed by recovery later in the afternoon. Salter and Rock 

(1989) did not find a post-meal decrement in performance on the MAST when 

slightly different times were used for testing. This latter study did find, however, that 

the more protein subjects ate at lunch, the better they scored on the MAST 

280

afterwards. The purpose of the current study was to determine whether such 

nutrients/food ingredients as protein, carbohydrate, and caffeine would affect 

performance not only on the MAST but also on several subtests (pattern 

recognition, reaction time, symbolic reasoning, and hand-tapping) of the 

Automated Portable Test System or APTS (Bittner et al, 1985) and visual-analogue 

mood scales. 

Previous research has demonstrated that protein can enhance performance 

because it contains tyrosine, the amino acid precursor to norepinephrine, which 

helps the body function in states of arousal or stress (Lieberman et al, 1984).. On 

the other hand, carbohydrate leads to insulin release which helps clear the blood 

of amino acids except tryptophan, resulting in greater passage of this serotonin 

precursor into the brain. Serotonin can induce a drowsy quiescent state capable of 

suppressing performance (Lieberman et al, 1982/83). Caffeine is a commonly 

used performance enhancer demonstrated to increase alertness and vigilance 

(Sawyer, Julia, and Turin, 1982). We particularly wanted to test whether caffeine 

would interact with either protein or carbohydrate in affecting performance. 

METHOD 

The subjects were military and civilian employees, males only, at the US 

Army Natick Research, Development & Engineering Center. All potential subjects 

were screened for previous caffeine use. Only those who normally consumed 

between 2 and 4 caffeinated beverages (coffee, tea, or soda) per day were 

retained. The subjects then filled out a questionnaire regarding their typical 

caffeine use, from which their total daily caffeine ingestion was estimated. The 

subjects were then split into a protein-lunch group (16 subjects) and a 

carbohydrate-lunch group (18 subjects) so that the average daily caffeine intake 

was equivalent in both groups. 

On the first day of testing, the subjects were trained in the use of the 

automated MAST, the APTS (using the pattern recognition, reaction time, symbolic 

reasoning, and hand-tapping subtests), and visual-analogue mood scales 

(indicating on a IOO-mm line how relatively tense, hungry, dejected, tired, angry, 

vigorous, and confused they felt). On the following two days of testing, all subjects 

were fed the same standard, mixed-nutrient breakfast at 0730 hours, tested at 1000 

hrs, fed the experimental lunch at 1130, given a math exercise immediately after, 

then tested shortly after noon and finally at 1430. The timed math exercise (30 

minutes maximum) was used because previous studies (Morse et al, 1989) found 

that it served as an effective stressor to mobilize norepinephrine use. The protein 

lunch group was served 5 g/kg turkey breast, while the carbohydrate lunch group 

was provided with 5 g/kg sorbet. These two foods were chosen because previous 

research had demonstrated them capable of having behavioral effects (Spring, 

Lieberman, Swope, and Garfield, 1986). Subjects were instructed to eat as much 

of their test meals as they could, but there was a wide variation in the proportion 

281

consumed. Within each group, subjects rotated through two caffeine conditions in 

a counterbalanced order, drinking two cups of either caffeinated or decaffeinated 

coffee with lunch. Caffeine use was prohibited at other times during the study. 


Analysis of variance tests indicated no significant differences on MAST 

performance as a function of group (protein vs. carbohydrate), caffeine (or its 

absence), or the interaction of group and caffeine. Salter and Rock (1989) similarly 

found no major group effects due to nutrient type, but did find significant 

correlations between the proportion of protein actually consumed and 

performance. In Table 1 can be seen the correlations in the current study between 

the percent of nutrient consumed and MAST performance. Whereas Salter and 

Rock (1989) found a positive correlation for protein, the current study found a 

negative correlation for carbohydrate. Previous studies have found both types of 

effects (Lieberman et al, 1984). However, consideration of the time factor indicates 

that the significant negative correlation occurred even in the morning before 

Table 1 

Correlations Between Percent of Test Food Consumed 

and automated Memory and Search Task (MAST) scores 

Time. 

- Task Level: 

II 000 hrs) 

2 

(1200 hrs) 

3 

(1430 hrs) 

* p-E.05 

** PC.01 

*** PC.001 

Z-character target 

4-character target 





2character target 



Protein Carbohydrate 

/N=l6) /NJ 81 

-.28 -.58* 

-.36 -.58* 

-.38 -.58* 

-.03 -.65** 

-.07 -.37 

-.13 -.66** 

-.06 -.79*** 

-.02 -.x5* 

-.I5 -.47*

consuming the carbohydrate. This study, then, is a clear example of correlation not 

implying causation. If anything, it appears that people who score lower on the 

MAST are inclined to eat more carbohydrate rather than the other way around. 

The proportion of test meals consumed was also correlated with APTS 

performance, and there were no significant effects on the tasks of pattern 

recognition (stating whether two patterns of asterisks were the same or different), 

reaction time (pressing the number of the four boxes which lights up), or symbolic 

reasoning (indicating whether each of several statements is true, for example, “A is 

in front of B--BA”). Several trials of the hand-tapping test data were not recorded 

properly and this variable could not be analyzed. See Table 2. 

. 

Table 2 


and Automated Performance Test System (APTS) scores 

Test: Time. 

A 

Protein Carbohydrate 

/N=l6) /N=l8\ 

Pattern 1000 hrs -.05 .-.40 

Recognition 1200 hrs .41 -.33 

1430 hrs .14 -.40 

Reaction 1000 hrs -.22 -.27 

Time 1200 hrs -.05 -.23 

1430 hrs -.03 -.37 

Symbolic 1000 hrs -.25 -.25 

Reasoning 1200 hrs -.07 -.44 

1430 hrs .21 -.27 

Correlations between percent consumption and various moods, however, 

were more often significant. Table 3 has the results, including just the moods with 

significant effects. Moods like dejection, fatigue, and vigor are not included in this 

table because none of the correlations were significant. Protein consumption was 

positively related to tension and anger, a finding confirming earlier reports 

(Banderet ,et al, 1986). However, the association held also in the morning before 

the protein meal, again damaging the case for causation. In addition to the 

283 

.

Mood: 

Tense 

I--- ---- 

Table 3 


and Visual-Analogue Mood scores 

Time. 

- 

1000 hrs 

1200 hrs 

1430 hrs 

Hungry 1000 hrs 

1200 hrs 

1430 hrs 

AwrY 1000 hrs 

1200 hrs 

1430 hrs 

Confused 1000 hrs 

1200 hrs 

1430 hrs 

* p

REFERENCES 

Banderet, L. E., Lieberman, l-l. R., Francesconi, R. P., Shukitt, 6. L., Goldman, R. F., 

Schnakenberg, D. D., Rauch, T. M., Rock, P. B., and Meadors, G. F. (1986). 

. Development of a paradigm to assess nutritive and biochemical substances 

in humans: A preliminary report on the effects of tyrosine upon altitude- and 

cold-induced stress responses, Presented at and published as Proceedings 

of the AGARD Aerospace Medical Panels Symposium, Biochemical 

Enhancement of Performance, Lisbon, Portugal, 30 Sep-2 Ott, 1986. 

Bittner, A. C., Smith, M. G., Kennedy, R. S., Staley, C. F., and Harbeson, M. M. 

(1985). Automated Portable Test (APT) System: Overview and prospects. 

Behavior Research Methods, Instruments. & Computers, 17, 217-221. 

Lieberman, H. R., Corkin, S., Spring, 8. J., Garfield, G. S., Growdon, J. H., and 

Wurtman, R. J. (1984). The effects of tryptophan and tyrosine on human 

mood and performance. Psvchopharmacoloav Bulletin, 20, 595598. 

Lieberman, H. R., Corkin, S., Spring, B. J., Growdon, J. H., and Wurtman, R. J. 

(1982I83). Mood, performance, and pain sensitivity: Changes induced by 

food constituents. Journal of Psvchiatric Research, 17, 135-l 45. , 

Morse, D. R., Schacterle, G. R., Furst, L, Zaydenberg, M., and Pollack, R. L. (1989). 

Oral digestion of a complex- carbohydrate cereal: effects of stress and 

relaxation on physiological and salivary measures. American Journal of 

Clinical Nutrition, 49, 97-l 05. 

Salter, C. A., Lester, L. S., Dragsbaek, H., Popper, R. D., and Hirsch, E. (1988). A 

fully automated memory and search task. In A. C. F. Gilbert (Ed.), 

Proceedinas of the 30th Annual Conference of the Militarv Testing 

Association. Arlington, Virginia: Military Testing Association. Pp. 515 

520. 

Salter, C. A., and Rock, K. L. (1989). Using the memory and search task to assess 

dietary effects. Proceedinas of the 31 st Annual Conference of the Militarv 

Testina Association. San Antonio, Texas: Military Testing Association. Pp. 

701-706. 

Sawyer, D. A., Julia, H. L., and Turin, A. C. (1982). Caffeine and human behavior: 

Arousal, anxiety, and performance effects. Journal of Behavioral Medicine, 

5, 4 15-439. 

Spring, B. J., Lieberman, H. R., Swope, G., and Garfield, G. S. (1986). Effects of 

carbohydrates on mood and behavior. Nutrition Reviews/Supolement, 44, 

51-60. 

285 

. .

WHAT MAKES BIODATA BIODATA ? 

Fred A. Mae1 

US Army Research Institute 

Interest in the use of biodata in personnel selection 

continues to grow in all branches of the armed services. Various 

researchers have advanced legal, moral, and conceptual criteria 

that define biodata items and differentiate them from those that 

appear in temperament, attitude, or interest measures. These 

stated criteria are often disputed by other researchers or 

ignored in practice even by themselves. Moreover, in practice, 

many items termed tlbiodatal@ are indistinguishable from other 

self-report items. The result has been a continued blurring of 

what constitutes biodata. 

The confusion is especially problematic in light of the 

claimed advantages of biodata. For example, biodata scales have 

been shown to be more resistant to social desirability faking 

than temperament scales (Telenson et al., 1983). However, this 

may be true only of certain types of biodata, such as verifiable 

items. Similarly, reviews of selection measures (Reilly & Chao, 

1982) stating that biodata generally achieve higher validities 

than temperament measures are uninterpretable without knowing 

what, other than empirical keying, differentiates biodata from 

other measures. 

The purpose of this paper is to review criteria that have 

been used to define biodata and differentiate it from other selfreport 

measures. Drawing upon the work of previous researchers, 

the qualities that may uniquely define biodata across all 

applications are enumerated. Then, additional characteristics 

which may be desirable or legally required under certain 

circumstances are discussed. In the course of the discussion, 

differences between biodata and temperament scales are clarified, 

with the two viewed as potentially complementary, though not 

mutually exclusive, domains. 

The essence of biodata 

Biodata items attempt to measure previous and current life 

events which have shaped the behavioral patterns, dispositions, 

and values of the person. It is presumed that a person's outlook 

is affected by life experiences and that each experience has the 

potential to make subsequent life choices more or less desirable, 

palatable, or feasible. One possible reason is that the focal 

experience reinforces a pattern of behavior. Alternatively, the 

focal experience may be partly or wholely determined by earlier 

causal determinants- genetic, dispositional, or learned- which 

account for variations in both earlier and current behavior. A 

complete biodata measure should provide Ira reasonably 

comprehensive description of the relevant behavioral and 

experiential antecedents" (Mumford & Owens, 1987, p. 3). 

286 

.

Virtually all life experiences are potentially "job relevant", 

provided that they empirically differentiate better and poorer 

performers on a consistent basis. 

Biodata Item Attributes 

Historical versus Hvnothetical. Conceptually, biodata 

should pertain solely to historical events, activities which have 

taken place, or continue to take place. This attribute would 

exclude behavioral intentions or expected behavior in a 

hypothetical situation. 

External versus Internal. Some have argued that biodata 

items should deal with external, though not necessarily publicly 

seen, actions. These criteria would exclude items about 

thoughts, attitudes, opinions, and unexpressed reactions to 

events. An item about what one tvnicallv does in situations 

could satisfy the historical/external criterion. 

*Numerous biodata researchers have utilized non-external 

events in their biodata measures, and conceptually, non-external 

aspects of events are also capable of having significant impact 

on subsequent behavior. Nevertheless, the external event 

criterion may be crucial if claiming greater validity for biodata 

compared to temperament scales. Temperament scales require 

assessments of personal tendencies, often in areas in which 

people not only portray themselves favorably (t'impression 

management"), but actually see themselves in an unrealistically 

favorable light ("self-deception") (Paulhus, 1984). For example, 

most employees overrate their work performance compared to that 

of peers. Nondepressed persons consistently overrate their 

performance, so much that realistic self-evaluation may be 

indicative of depression (Mischel, 1979). Similarly, negative 

and positive affect orientations have been shown to be correlated 

with response patterns on temperament and related scales. Thus, 

the *'normallt tendency to overrate successes and underestimate 

failings can lead to self-deception and could possibly inflate 

responses to some temperament scales. By contrast, biodata 

scales dealing with external events purport to force the 

respondent to either answer honestly or consciously distort 

answers, with the assumption that fewer people will choose the 

latter. 

Obiective and First-hand versus Subiective. Some who 

prefer that biodata be descriptions of external events also feel 

that biodata should be obiective recollections, requiring only 

the faculty of recall. Subjective interpretation of events, such 

as ,assessing if one was 'Idisappointed"," angry", or "depressedl' 

in a given situation, would not fit this criterion. Evaluation 

of one's qualities or performance relative to that of others 

would also be considered subjective. A corollary would be that 

biodata items ask only for the first-hand knowledge of the 

respondent. Estimation of how others (peers, parents, teachers) 

would evaluate one's performance or temperament involves an 

287

-.. 

additional level of speculative subjectivity. Subjective items 

would appear to increase the chance of self-deception. Although 

subjective corroboration from others is feasible, subjective 

items are never objectively verifiable, and hence the chance for 

social desirability faking is increased. 

Conversely, a number of biodata researchers have made 

frequent use of interpretive items. In some studies, subjective 

items have actually been shown to have higher predictive 

validities than objective ones. An advantage to subjective items 

that address self-perceptions is that they can better focus on 

unitary theoretical constructs. By contrast, performance of 

objective behaviors is often determined by multiple causes and 

dispositions, making it difficult to isolate the role of any one. 

Barge (1987) has provided evidence that homogeneous items, 

tapping a single disposition or tendency, are more predictive 

than heterogeneous items such as school or work performance. 

Construct-based items are also easier to use to develop 

rationally-based biodata scales. It would thus appear that the 

use of some subjective items may provide some countervailing 

advantages as well. 

Discrete versus Summarv Actions. Methodologically, it may 

be preferable to focus on discrete actions, dealing with a 

single, unique behavior (e.g. age when received driver's 

license), as opposed to summary responses (e.g. average time 

spent studying). Responses to discrete items only require memory 

retrieval, while summary items also require computation or 

estimation, thus increasing the chance of inaccuracy. However, 

the above preference for discrete actions would obtain only when 

the event is unique or singularly memorable. With a regularly 

performed behavior, summary recall could be more realistic and 

accurate than recall of a single, arbitrarily chosen instance. 

Verifiable. A verifiable item is an item that can be 

corroborated from an independent source. Item verifiability thus 

goes beyond both the external event and objective criteria. The 

optimal source of verification is archival data, such as school 

transcripts or,work records. Alternatively, the testimony of 

knowledgeable persons, such as a teacher, employer, or coach, is 

also considered verification by most researchers. Asher (1972) 

and Stricker (1987) have advocated exclusive use of verifiable 

items, though others utilize non-verifiable items, and some 

advocate interleaving verifiable and non-verifiable items 

(Mumford et al., 1990). 

One reason to use verifiable items is to reduce social 

desirability faking and outright falsification. However, 

Shaffer, Saunders, and Owens (1986) have shown that social 

desirability distortion is not a serious concern with biodata. 

Previous research on false or inaccurate responding to verifiable 

biodata items has shown mixed results (Cascio, 1975; Goldstein, 

1971) which may be due partly to methodological factors (Mumford 

& Owens, 1987). Merely warning respondents that answers will be 

288 

�

verified can reduce faking (Schrader & Osburn, 1977). 

Verifiability should be less necessary with discrete and publicly 

witnessed items for which "faking good" would require conscious 

lying. When developing biodata, obscuring the l'rightll answers 

and deleting transparent items should also discourage socially 

desirable responses, even without the threat of verification. 

Paradoxically, items which fit the narrowest definitions of "job 

relevant" and show the greatest point-to-point correspondence 

with future job performance would be most transparent and elicit 

the greatest need for verification. 

The issue of control. From the aforementioned perspective, 

that all life events have the potential to shape and affect later 

behavior, there is no reason to differentiate between experiences . 

that a person has consciously chosen to undertake and those that 

were components of the person's environment. In the same way 

that a decision to join.ROTC or study chemistry may lead a person 

in a behavioral direction, personal characteristics or the 

climate in a person's home and community could also affect 

subsequent behavior. Moreover, even optional decisions and 

behaviors, such as smoking or amount of time spent studying, are 

partially shaped by noncontrollable influences. This view is 

reflected in the instruments of biodata researchers who freely 

utilize both llcontrollable't and Inoncontrollablett biodata items 

(Glennon, Albright, &I Owens 1966). Stricker (1987), on the other 

hand, argues that it is unethical to evaluate people based on 

noncontrollable items pertaining to parental behavior, geographic 

background, or socioeconomic status. He also considers items 

dealing with skills and experiences not equally accessible to all 

applicants, such as tractor-driving ability or playing varsity 

football, to be unfair. Similarly, the developers of the Armed 

Services Applicant Profile (ASAP), a biodata measure of 

adaptability to the military, also attempted to delete all noncontrollable 

items from their instrument (Trent, Quenette, & 

Pass, 1989). 

In practice, however, consistent adherence to the control 

criterion would exclude all items pertaining to physical 

characteristics and educational level; behaviors, values, or 

interpersonal styles influenced by parental genetics or 

nurturing; and vocational interests and behavioral preferences 

partially shaped by one's environment. Strict adherence would 

thus lead to exclusion of most life experiences likely related to 

later behavior. It would also exclude many items typically found 

on school and job application blanks. This would present a 

severe constraint when sampling applicant pools without extended 

job histories, such as military applicants. It is not surprising 

that even some advocates of this criterion have been forced to 

violate it in their scales. 

Invasion of privacv 

A final concern involves invasion of privacv. Intrusive 

questions are mainly problematic with background checks that 

289

focus on previous criminal and aberrant behavior. In contrast, 

most biodata deal with behaviors whose revelation would not harm 

respondents. Some questions, such as those pertaining to marital 

status, age, and physical handicaps, may be invasive if the 

responses were to be placed in the employee's personnel folder, 

but not if the responses were used only by researchers to 

generate applicant scores. An additional reason not to reveal 

individual responses and their implications to decision-makers is 

in order to maintain biodata key confidentiality. 

Summary 

This paper proposes that the core attribute of a biodata 

item is that it addresses an historical event or experience. ,The 

rationale is that previous events shape the behavioral patterns, 

attitudes, and values of the person, and combine with individual 

temperaments to define the person's identity. Other attributes, 

though not defining biodata, may have methodological advantages. 

These include limiting items to those regarding external events, 

those that only require objective recollection of events, and 

those asking only for first-person recollections. Items 

involving discrete, unique events, and events that are verifiable 

are also favored by some for these reasons. However, these 

latter attributes may have their own limitations. Limiting 

biodata to controllable life events is seen as overly 

restrictive. Exclusive use of verifiable and especially 

controllable items may.hamper efforts to cover the domain of 

relevant life events, as well as reduce validity. While clearly 

intrusive items are offensive and hence undesirable, definitions 

of and concerns about invasion of privacy will vary, depending on 

the situation. 

By attempting to measure historical events and experiences 

that may have impacted on behavioral tendencies, it should be 

possible to focus on a unique realm of individual differences not 

exhausted by temperament and other self-report measures. Perhaps 

biodata measures, as presently defined, could be used in tandem 

with temperament measures for optimal results. However, 

researchers should be exceedingly careful about making claims 

extolling biodata's virtues over other self-report measures. 

REFERENCES 

Asher, J. J. (1972). The biographical item: Can it be improved? 

Personnel Psvcholoqy, 25, 251-269. 

Barge, B. N. (1987). Characteristics of biodata items and their 

relationship to validity. Paper presented at the 95th annual 

meeting of the American Psychological Association, NY, NY. 

Cascio, W. F. (1975). Accuracy of verifiable biographical 

information blank responses. Journal of Anplied Psvcholoqy, 

60, 767-769. 

290 

_

Glennon, J. R., Albright, L. E., & Owens, W. A. (1966). A cataloq 

of life history items. Greensboro, NC: Creativity Research 

Institute of the Richardson Foundation. 

Goldstein, I. L. (1971). The application blank: How honest are 

the responses? Journal of Applied Psvcholoav, 55, 491-492. 

Mischel, W. (1979). On the interface o.f cognition and 

personality: Beyond the person-situation debate. American 

Psvcholoqist. 34, 740-754. 

Mumford, M. D., & Owens, W. A. (1987). Methodology review: 

Principles, procedures, and findings in the application of 

background data measures. Anplied Psvcholoaical Measurement, 

11, l-31. 

Mumford, M. D., Owens, W. A., Stokes, G. S., Sparks, C. P., and 

Hough, L. (1990). Developmental determinants of individual 

action: Theory and practice in the application of background 

data measures. Unpublished manuscript. 

Paulhus, D. L. (1984). Two-component models of socially desirable 

responding. Journal of Personality and Social Psvcholoqv, 

46, 598-609. 

Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of 

EOCe alternative employee selection procedures. Personnel 

Psvcholoav, 35, l-62. 

Schrader, A., & Osburn, H. G. (1977). Biodata faking: Effects of 

induced subtlety and position specificity. Personnel 

Psvcholoqv, 30, 395-405. 

Shaffer, G. S., Saunders, V., & Owens, W. A. (1986). Additional 

evidence for the accuracy of biodata: Long-term retest and 

observer ratings. Personnel Psvcholosv, 2, 791-809. 

Stricker, L. J. (1987). Developing a biographical measure to 

assess leadership potential. Presented at the Annual Meeting 

of the Military Testing Association, Ottawa, Ontario. 

Telenson, P. A., Alexander, R. A., & Barrett, G. V. (1983). 

Scoring the biographical information blank: A comparison of 

three weighting techniques. Applied Psvcholoaical 

Measurement, 2, 73-80. 

Trent, T., Quenette, M. A., & Pass, J. J. (1989). An oldfashioned 

biographical inventory. Paper presented at the 

97th Annual Convention of the American Psychological 

Association, New Orleans, LA. 

291

JOB SAMPLE TEST FOR WY FIRE CONTROLMAN 

Susan Van Heme 1 , PhD 

Frank Al 1 ey, Ph D 

Syllogistics, Inc. 

Spr i r,gf i e 1 d, l44 22151 

Herbert George Ba.ker, PhD 

Laura E. Swirsk i 

Na.vr Personne I Research and Deve 1 opmen t Center 

San Diego, c.A Y2152-68Clcl 

ABSTRACT 

The Navy has developed job sample tests Sor a number of i ts 

en1 isted occupations !or ratings> as part O f t h e 

Joi n t-Serv i ce Job Performance Measurement Program. One of 

those rat i rigs is Fire Controiman (FCI. This paper details 

the deve 1 opmen t of hand,=-on tests for first-term FC: data and 

radar personnel r and their administration to a sample of FCs 

(N=103> . The resul ts of test i ng are di scuss.ed, showing the 

relationship of test scores. to several criteria. 

INTRODUCT I DN 

Several reports detai 1 the research stra.tegy .a n d 

purposes. of the Joint-Service Job Performance Ueasuremen t 

(JPMI/Enl istment Standards Project COff ice of the Assista.nt 

Secretary of Defense, 19821, and the origin and scope of the 

Navy JPF1 Program (Laabs & berry, 1987). I n the Nav 7 ’ s 

effort, performance meas.ures are be i nq deve 1 aped for sever.3 1 

ratings, one of which is tha.t of fire controlman C FC :) . 

The MIX 84 Gun F i re Control System C c;FCS:! i s u c.e d t 0 

control var i 01~s surf ace sh i p-moun ted guns, 1.4 h i c h are used 

against both surface and airborne targets. F i rs.t-term Ml< 86 

GFC!s Frl:.g. are current 1 y t r a i n e d a n d d e p 1 II ;v. e cl i n t IA o d i f f e r e n t 

c,pec i .sl t i es.. Bc,th r,.JEcs o>per.&te the PIP< 516 GFC:S? blJt NEC: 

1 1 2iT;a. Sp e i: i .+. 1 j 2 * '3 i n m .a i n t e n 3. n c e O f t tl e r .s. a j .3. r c. IJ b s. ::i’ E. t e m 

I:.J h i 1 @ /..JES: 1129 5. p e 11 i .a 1 i z e 5. i n ~~1.3, I n t + n EC n c e 0 f t h c lj.2. t .3. 

processing subsystem. Both t:,*pez. ,:a+ PlcC: 3,s Ftzs go tt-lrout~h a 

i t r a i n i n g pipe1 ine w h i c h i nc 1 udes E: 3.5. I: El ec tr i c i ty a n d 

El ec tron i csI a Fire Control cr;-school F 

for e i ther data or r ada,r . fvlh’ $4, 

pm.“1 ;,‘1” 

, .- c, =. 

:3 6 

are 

c .- s,: h 0 114 1 

u s u a 1 1 )’ 

cross-tra.ined on the s.econd subsystem at the end of their 

f irs.t or tfeginn ing of their second tour of duty, 

APPROACH 

292

to develop materials to the best level of deta.il pos.sible, 

then returned to another SP~E pa.nel ior cr i t i 17~~. Use Elf SMEs 

from al 1 three PlK 86 FC C-schools ensured that al 1 of the 

5 i t e s wou 1 d have i n p u t i n t o the test development process. 

. Tryout was conducted on actual equipment to be used i n 

testing. Final SkjE review arId i tern ret’ i nement fol 1 C@Jed the 

tryout , and preceded the field test. 

Verification Of Tasks Selected For Testinq 

The first-term MK St; GFCS FC is ful 1:~ trained WI onl> 

one of the subsystems. , and al though he I: the rating is closed 

to women> may work on the other subsystem, he is I7 0 t 

qua1 i f i ed on i t through training or experience. Because of 

th i s.$ a single set of tasks could not be used; separate test 

i terns had to be deuel aped for each subspec i al tr. 

The first step in the development process was to verify 

the task 1 i st, Panels of PlK 86 GFCS SPlEs convened at the 

three MK 85 GFCS C-School reviewed the 1 ist, and suggested 

subst i tutes for tas.ks found unsu i tab1 e. Each task w a 5 

evaluated as to i ts appropriateness for hands-on tes.t i ng, 

according t 0 the f 01 1 CIW i ng cr i t er i a : (1 > represen tat i vr’ness 

of t h e f irst-termer’s j ob (2) mission cr i t i cal i tr; 1; .‘3 > 

frequence o f performance C4:, suff i c i en t v a r i a b i 1 i t y i n 

performa.nce) and, (51 practical i t:2’ for t e s t i n g a t t t-l e 

C-school si tes. In addi t ion, the SklEs. were asked to consider 

the need for comprehens i ue task coverage, and equ i val en t 

test diff icul ties for the two NECs. 

The SPl&=G provided detai led informat ion on the 7 test 

i terns far eaih test. Specific .:.utltas.k:. were conf i rmed for 

each major task, al ong I.‘.J i t h specific f 5. u 1 t s for t l-l e 

di agnost i c and trout,1 eshoot i rig i terns., Addi t i onal t 9 c h rl i c 3. 1 

documen tat i on was prov i dqd for use i I-I deve 1 op i ng scar i ng 

shee t s for procedural i zed t a s k c. * The test tasks L’.J e r 8 

sequenced to provide the smoothest and quickest pos.sible 

progressi art through the tes.t , and equ i pmen t requ i remen ts 

were verified and refined. The i nformat i on gathered enabl ed 

the test development team to begin writing draft test items, 

and to prepare for the first SHE panel. 

i kf ter t h e f i n a 1 SI-1 E r e v i e w t h e nf orma t i on a n d 

revisions were incorporated into dra>t test i terns. P 1 a 1-1 5. 

were made for trying out the test i tems on a smal 1 s.ampl * caf 

first-term PlK S6 GFCS FCs. 

Test Item Tryout 

Four f i rE.t-trrm HK ;3& C;FC:S FC:S, c.tat ioned o n sh i DC. a t 

Norf 01 k, VA, served as tes.t subjects. TI~JO were da. t a FCs I:PAEC 

1125’) and two radar I/..JEC 1 12sj . The equ i pment used for the 

tryout was the Darn N e c k I-1K SA GFC!S trai n i ng s:;~*stem, IaJh i ch 

includes a full se t 0 f ac tu.31 equ i pmen t e q u i v 3.1 e n t t 0 t h e 

293 

.

i 

MOD 10 Capabi 1 i ty Expanded shipboard system. Except for an 

added simulation capabi 1 i tr for t h e interface t 0 t h e 

equipment control led br the system, and a “fan-out” version 

of the WE-7(V) computer (an extra UYK-7 wi th the c ircui t 

card pl a.nes exposed to permit easy access), t h e 5)‘s tern 

rep1 i cates a sh i p-mounted system? and i s compoc.ed e n t i r e I :k’ 

of actual equipment, housed in two connect i n3 rooms in t t-l e 

SC l-l QO 1 building, with a full set Of s.ys, t err1 t e c h n c a 1 

documentation avai iable in the training laboratory, along 

wi th al 1 required tools and test equipment. 

The purposes of the tryout were to: cl> verify tha.t the. 

test items would perform properly with the equipment; ?12:) 

ensure that instruct i ons were clear and accurate ; ( 3 :I 

determine whether t h e suggested i tem t irne 1 i m i t 5 w e r e 

real i st i c ; (41 uerify that there would be some variability 

among subjects i n performance on the i terns; a n d ( .f ) r 4? 4.) e a 1 

unanticipated problems of any sort, Because of t l-l e srna 1 1 

samp 1 e , there WitS rl ct attempt to 3a ther 5 t a t i 5 t i c a 1 

informat ion. 

Because one of the purposes of thE. tr;*‘CifJt bJ.3S to 

determine whether the time limits were reasonable, a subject 

Was al 1 owed to cant i nue working on a task un t i 1 i t w a c. 

completed if he was making progress on the task, and the 

compl et ion t ime was recorded. Al 1 subjects were able to 

complete the test within four hours or less, but there wan 

considerable vari abi 1 i ty in the ccmplet i on time on most of 

the i terns. Th i s small sample did not permi t conf i dent 

predi c t ion of the best time 1 imi ts for al 1 i t ems 9 but did 

suggest some changes to suggested time 1 imi ts. 

The PeSlJl ts of the tryout were posi t i ve. No major 

problems arose during the tryout. The test i terns performed 

we1 1 0 n t f-1 e equ i pmen t , w i t h on 1 > m i nor adjustments i n 

procedure required (some improvements in the techniques of 

fault i riser t i on, to ensure that prefaul ted modules and 

groundi n3 straps were not visible to exami nees) . The 

instruct i ons were understandable, with a few areas to be 

clarified. The final i terns included these changes. 

Final SME Review 

The revised test i terns were reviewed tl:v SklEs. i two data 

instructors and two radar instructors) at Great Lakes. The 

St”!Es c 1 ar i f i ed scme technical issues, (e.g., documentation 

nomencl a.ture and faul t i riser t i on techn i ques> . 

Field Testinq 

Site PreDaration 

294

considerable time preparing for the tryout I rehearsing each 

i tern, verifying that the faul ts to be inserted would produce 

the des. i red i ndi cat i enc., and e n 5 u r i n g that ttre training 

equ i pmen t w a 5 in good condi t ion + c, ,’ t e 5 t i n g . Test 

admi n i strattor train i ng consi sted of rev i euJ and prac t i ce of 

the test items and procedures. 

Test i no Ders.onne 1 

The test admirl i E.tratctr ~a=. a retired E-S’ MK 36 GFC:S 

techn i c i an, who had served as a t”lt< 35 Course Di ret tar for 

the three years preceding his retirement. One of the school 

sen i or staff C E-7/3) W d S available throughout . He 

part i c i pated i n the preparation for testing, helped with 

equ i pmen t setup and was a.blc t o solve t h e f e w equ i FaKlerlt 

probl ems wh i ch occurred. Two observers were on si te, with 

one present in the testing area during all test periods. 

Test Subjects 

The samp 1 e consi sted of 103 individuals engaged i n 

the i r f i r s t term of mil i tarr serv i ce . There I~J e r e 45 

. . 

indlvlduals tested in Dam Neck and 53 individuals tested in 

San Diego. Al 1 individual s in this sample were ma1 e. The 

major i tr of the FC'S i n t h i s 5.3ff1p 1 Q were i n t tre third, 

fourth, and fifth years o f the i r mi 1 i tary serv i ce 

obl i gat i ens. S i x t y-one individual s were c 1 ass i f i ed i n t h e 

radar subrating and 42 individuals were classif ied in the 

data subrat i ng. Al 1 ~lCiC~%~ of the FC’s were high E.chool 

graduates who either earned diplomas or GED equivalents.. 

EQuiDment 

The equipment used for the f i el d test was the same as 

that used for the tryout, plus equivalent equipment a.t the 

San Diego C-s.chool . 

Procedure 

When the subjects arrived at the testing si te, they 

were given a brief introduction to the project, with an 

explanation that their performance would in no way affect 

their service records, and would not be reported to anyone 

but project staff. 

At the beginning of each testing session, the s.ub.iect 

was given oral and wr i tten instruct ions on the test i ng 

procedure and the ground rules for t l-l e test i ng. Some 

biographical data were collected, and then the firs% item 

was admi n i stered. For e a c h i tern, the t e s t administrator 

gave oral and wr i t ten instruct ions on the task requ i remen ts 

and the time al 1 owed for camp 1 e t i on . The su bj ec t w a s 

encouraged to ask quest ions before beginning the task. W h e n 

295

the subj eC t indicated that he was ready to begin, the tes.t 

administrator instructed him to start and began timing. 

At s.evera 1 p a i n t s in the testing sequence i t was 

necessary for the test administrator to insert or remove 

faul t condi t i ons or otherwi se prepare the equipment for the 

next i tern. kt these times (3 or 4 per test> the subject was 

excus.ed and given a break of approximately fiue minutes.. 

Throughout testing, the test adminstrator observed the 

subject’s actions, checking off 5 t e p s performed in 

procedures on the scoring sheets provided, and recording and. 

eval uat i ng troubl eshoot i ng (non-procedural 1 act i ens on other 

forms. When necessary, the test administrator queried the 

subject to determine what he was doing or attempting, Time 

to complete each task was al so recorded. Upon completion of 

testing, each s.ubject was. asked how frequent1 y he performed 

each of the tested tasks on the job, and when he had most 

recent 1 Y performed each tas.k. 

RESULTS 

Each of the FC hands-on performance tests consisted of 

7 tasks, each of which yielded a single, overal 1 score 

ranging from 0 to 100. Carrel at ions were computed for the 

radar and data subrat ings, combined and separatel:x*, and are 

shown i n Tab1 e 1. The carrel at i on between overal 1 

performance on the hand,=-on test with kFQT for the radar and 

data subrat i rigs comb i ned was -.03. Correct i ng for 

restriction in range resulted in a correlation of .12. The 

corre 1 at i on between overal 1 $erf ormance on the hand,s-on test 

wi th AFGT for the data subrat ing was .30. T h e carrel at i on 

between overal 1 performance on the hands-on test with AFQT 

for the radar subrat i ng was -.lO. Correcting for 

restriction in ranoe res.ul ted in a correlation of .17 for 

the data subrating- and .14 for the radar subrat ing. 

correlations were not signif icant, 

All 

CoPbInd (Data and Radar) 

Uands-On and AFOT -.03 

�� d AFOT 

corrected for restrIction 8n range .i2 

Data 

Hands-On and A.=01 *JO 

“m&-On and AFOT 

cmrr*cted for restriction 1.9 ran** .I7 

Radar 

Hands-On md AFOT -.,o 

Hands-On and AFOT 

corr,ct.li f0P rrstrlctlcn In r*ngr .I4 

Tablo 1. Correlations Rmtwem Hands-On 

Porformmce w, th AFDT

REFERENCES 

Laabz., G. ~1.) &‘< Berry, V. 14. ( 1 P87, August). The Navr 

job performance measurement uroaram: Backuround, 

i nceP t i on, and current status (NF’RDC TR S7-34) . 

San Diego: Navy Personnel Res.earch and Development 

Cep ter . 

. 

Off ice of the Assistant Secretary of Defense (PlRA&L). 1:1982, 

December). ‘Joint -service efforts to 1 ink en1 istment . 

standards and job performance: First annual report 

to the House Commi ttee, on Appropriation. Washington, 

DC: Author . ‘ 

297

ASVIP: AN INTEREST INVENTORY USING 

COMBINED ARMED SEWICES JOBS 

Herbert George Baker, PhD 

Mar j or i e M . Sands. 

Navy Personne 1 Research and Deve 1 opmen t Cen ter 

San [> i ego, CA y2152-68CllJ 

Arnold R. Spokane, PhD 

Spokane Career Assoc i ates 

Al lentown, PA 18104 

ACISTRACT 

A number of 7vocational i n terest i nven tor i es have 

been devel aped by the A r m e d Serv i ces for use in 

guiding en1 i sted p b? r 5 I:I t-1 r-1 e 1 into VI i 1 i t .3. r Ir; 

0 c c u p a t i 0 n 5 . T I-I e 4. tl’ i ri 5 t. r u rrl e n t t h 3 ‘2 e I, ‘“. * d 

occupa t i onal activities, jub t i ties, retreat i orial 

activities, and so ftr th -- or , a combi nat i cln cef 

such elements. Test subjects then i n d i c .a t e the i r 

interests or preferences for each i tern. Al though 

great efforts have been made to cross-code 

military .j 0 b s between t h e Serv i ces a i-1 d w i t h 

civi 1 ian occupations, unti 1 I-I 3w n 0 interest 

measure has used the combined Armed Serv ices jabs. 

Th i s paper describes t h e development a n il 

admi n i strat ion of the Armed Serv i ces Vocat i onal 

Interest Prof i le (ASVIP) . The instrument uses the 

job titles (officer and en1 isted) found in the 

Mi 1 i tary Career Guide, the jobs al so having been 

assigned three-letter Ho1 1 and Codes. In scar i ng, 

resul ts indicate the most preferred Ho1 1 and Cude) 

plus a n i ndi cat i on of h i gh or 1 CUJ preferred 

occupational level . This paper reports on a studr 

t 0 measure t h E, e n d o r s e m e 1’1 t 0 f t h e 

combined-services jobs. Suggest i ens 3re made for 

use and for further research. 

INTRODUCTION 

Vocational interests have long been recognized as one of the 

many individual character i st i cs that affect occupat i ona. 

exploration, j ob acqu i si t i an, WCII-k 5.3 t i 5 f ac t i on c. , a n d .r 

perhaps, performance. There are many theories of uocat i 13r1al 

i n terests and j ob preferences, and a great number of 

i nstrumen ts have been deve 1 oped to identify and 115 e a 5 I.J r e 

vocational i n terests. One of t h e ma j or US-55 4 ,I. p t h 8 c. 8 

instruments is in guiding young people into the tz’pes of 

work for which their interests best suit them. 

298 

* 

.

Similarly, a number of vocational i n terest inventor i es. have 

been devel aped by the Armed Serv i ces for use i n gu i di ng 

en1 isted personnel into military occupations. Examples 

include the Uocat iunal Interest Career Exa.minat ion (Al ley, 

1978>, developed by the Air Force, and the Navy, L’ocs.~~,(~;,~~ 

Interest I riven tory (Abrahams I L a u , & Neumann r -ao.J . 

Al though research has shown promise to enhance the se1 ect i on 

and classification processes through the i ncorporat ion of a 

formal , measured interest component, with the exception of 

the Air Force, i riterests have remained an exper imen tal as 

opposed to an operational consideration. 

The various vocat i onal i nterest instruments devel uprd by the 

Armed Services have used occupational activities, j ob 

titles, recrea t i onal activities, and 5.0 forth -- or, a 

combi nat i on of such elements. Test subjects are asked to 

indicate their i n terests or preferences for each i tern. 

Scoring systems then report out an interest type, match the 

subject wi th a f-1 occupa t i onal area, or i n 5. cm e 0 t h e r wa) 

indicate the interests of the individual. 

A few years ago, great efforts were made to cross-code 

military jobs between the Services and PJ i t h civil ian 

occupat i ons, i n a project sponsored by the Office of The 

Assistant Secretary of Defense (FM&P> (Dale, Wright, Haven, 

Pavlak, & Lancaster, 1989). The resul t was a taxonomy of 

what may be called combined-Services jobs -- identical to no 

specific job, but i ncorporat i ng occupat i onal i nf orma t i on 

from each Service that has a similar job (plus the Coast 

Guard). I t should be noted here that 5 am e .j ohs (e.g.? 

infantrymen, dentists, etc.) are n o t represented in t h e 

occupat i onal structure of al 1 the Services. 

The combi ned-Serv i ces job taxonomy offers a nurnber of 

research oppor tun i t i es using occupat i onal i nf orma t i on 

specific to DOD jobs. Howeuer, to date, no interest mellsure 

h a s used the combined Armed Services jobs. 

APPROACH 

The combi ned-Seru i ces j chs , both en 1 i sted (w1:34 j a n d 

officer (N=71> , 1 isted in the Ni 1 i tarr Career Gu i de 

(Department of Defense , 1981) were merged and al phabe t i zed 

into a numbered 1 ist of 205 i terns. Whi le there has been much 

controuersr over the wisdom of using job t i t 1 es in interest 

measurement ) substantial research supper ts the i r use. More 

recently, Ho1 1 and, Got tf redson I a n d GaKer ( 1 ppTJ > , i n 

research wi th Navy recrui ts, found that use of job t i tles 

WCtB both feasible and meaningful. Arguably, the Navy Is job 

ti ties are the most esoteric and potential 1~ confusing to a 

young person , ret there were few, i f any, problems i n the i r 

use wi th young ma1 e sailors. Consequentlr, the even more 

understandable combi ned-Serv i ccc. job t i t 1 es were cons i dered 

fully suitable for use as items on an interest instrument. 

299

The 205 job t i tl es. thus became i tems on an i riven tory, the 

Armed Serv i ces Uocat i onal Interest Profile (ASVIP), as shown 

in .Figure 1. The answer sheet 1 ists, for each item, three 

answer options, L, I , and D (for Like, Indifferent, and 

Dislike, respectiuely). Typ i cal scar i ng strategies 5) s i n g 

these options cal 1 for simply disregarding the I responses, 

and suhtrac t i ng the number of Ds from the number of Ls.. 

44. Construction Equipment Operators 

45. Correct i ens Spec i al i sts 

46. Court Reporters 

47. Data Entry Special i sts 

48. Data Processing Equip. Repa.irers 

4s. Data Processing Managers 

50. Dental Laboratory Techn i c i ans 

51 , Dental Spec i al i sts 

52. Dentists 

53 . Detectives 

54. Dietitians 

55. Dispatchers 

56. Divers 

Figure 1 . . Examples of 1 terns 

The ASVIP was administered to samp 1 es of ma1 e (N=l 501 and 

female (N=150) Navy recruits, at the Great Lakes, I1 1 inoic. 

and Or 1 ando, Fl or i da Navy Recru i t Train i ng Commands. T h e r e 

WEtS no time limit for the test. Completion times ranged from 

18 to 32 minutes, with a mean of 24 for males and 23 far 

females. 

Th i s pi lot study, i n addition to a s 5 e s s i n g t h e test i ng 

logistics for the instrument, was designed to reveal 5 I-1 e 

levels of endorsement for the 205 jotIs. Thus, for the effort 

repor ted here i n, only the L responses were considered in the 

scoring process.. 

RESULTS 

Each of the 205 jobs received SCIITI~ endorsement from this 

s a m p l e of Navy recrui ts, even though not all of the jobs c a n 

be found in the Navy. The range of endorsement (out of a 

poss i tl e 300) is. f ram 20 4 for Photographers to 25 for 

Cl 0th i ng a.nd Fabr i c Repni rers. T .3. /J 1 e 1 s t, CII,~.J~ t h c c u mu 1 a t i v e 

frequent i es wi th which each of the uff i cer 3. rf fij 2 rl 1 i 5 t e ~j .j 13 tt.5. 

\,..I .$ 'S. C ri #j 811 r 5. e rJ , T t-1 a t i z. , t: h e t 3, b 1 e 9. j-, Ol,...J~f. r-, urn tse I- il f t i n-1 e E. t h e 

L I KE r e 5.~1 cl n ‘5. e cp t i ,c, n !..,,I 2. E. (1 t, 0 : + r; F I; r I: /-, 3 t i t em . 

CONCLUSIONS 

The resul ts 0 f t I-I i s pi lot s t u d :>’ c. j-1 Cl(>.J t h 3. t t r-1 e I- e 

i -2 

endorsement + 0:~ r al 1 cornbi neJ-Serv i t:es. .j 0 t1.s 9 a Ii d t t-l 3. t there 

i s a reaSOnatl1 e di spersi on across. al 1 of t h e j 0 tl t i t 1 es. 

This sctgqfictc __d _- that the ASVIP might be useful a.5 ,j , 3. 5 C ?J ‘Z. 5. j 0 l-l 

-.-_ 

300

tool wi th which tu begin acquaint i rig young people wi th the 

occupational oppor tun i t i es offerred by the mil i ta.ry, that 

is, as a gu i de for occupat i anal exp 1 ora t i on i n t o t h e 

milli tary working community. Administration to other-Service 

and civilian samples is an obvious necessity before any firm 

conclusions could be drawn as to the feasibility for use of 

t h e AS’S I P i n j ob expl oration, counse 1 i ng, and 

classification. 

RECOMMENDATIONS FOR FURTHER RESEARCH 

A number of research opportun i t i es suggest themse 1 ves at 

this point. One would be to compare t t-l e 5 t r e n c~ t h o f 

endorsement across the 205 job titles with the endorsement 

of similar civilian job titles, the latter being information 

al ready avai 1 able i n the research 1 i terature. 

. 

At the time of data co1 1 ec t i on, gender i nformat ion was al so 

collected. This enables a study of differential response 

pat terns be twen ma1 es and female Na.vy recrui ts. Al c,o, data 

were co1 1 ected wi th an al ternat ive instrument using the sxr~e 

items, but 1 isting them within the cateyor ieE. used in the 

Mi 1 i tarr Career Gu i de, rather than in an alphabet i cal 

1 isting. This makes it possible to study the effects of 

presenting items in simple alphabetization versus presenting 

them in ways that make possible the influences of categorj 

names on response patterns. 

Fur thermore, each of the 20sb combi ned-Serv i ces job5 14 a -3 

coded using the Ho1 1 and three-letter occupat i onal cod i ng 

system (Ho1 1 and, lP85>, Several studi es suggest themselves: 

(1) compar i sons between ma1 e and female endorsements across 

t l-l e six Ho1 1 and primary codes; (21 assessmen t of i nd i v i dual 

response consistency within each Holland primar:v code; and 

(3) tracking t h e wsjec t5 and compar i ng perf ormarrce 

eval uat i ens in 1 ight of the congruence tte tween i n tere.sts and 

actual job .3ssignn1ents i n t h e Na v :Y . 

Other possi bi 1 i t i es inc’lude studying t h 8 d i f f e r e n t i a 1 

respon5.e pat terns for h i gh and 1 CIW asp i T-EC t i on 1 eve 15 0: i . e , Y 

c.f f i cer and en1 i sted jobs), in terms of both ma.1 e-femal c 

differences and intra-individual consistency. 

Finally, t l-l e much-discussed i mp ac t of forward 3. r e G, 

ass i qnmen t on women s job asp i rat i ens can be addr e c-.se d i n k 

small VJE%j/ by using the ASVIP. The i rfs trumen t shou 1 d be 

administered to another Navy female recruit population a fecz.1 

years hence to assess the impact of Operation 0 8 s. e r t !s /-I i e i j 

on wome rl ’ 5 job asp i rat i ens. 

302 

., .

REFERENCES 

Alley, W. E. (1973) (jocat i ona. Interest Career Exami fiat i on: 

use and Appl i cat ion i n Cc~urlse 1 i nc~ a.nd J Cl t, PI acemen t 

(AFHRL-TR-73-62). 3rooKs kir Force Ease, TX. Personne 1 

Research Di v i si on. 

ALrahams, N. kl., Lau, A. W. g & Neumann, I . (1963) & 

Ana. ysis of the Navy Vocat i ma.1 Interest I n v e n t or y as a 

Predictor of SC h 00 1 Perf ormirnce and Rat i rlq Ass i qnmen t . 

(NPRDC-SRR-69-11) , San D i E c~ 0 : l.dalJ Y Personnel Re se a.r c h 

Activity. 

Dale, C., W r i g h t , G., Haven, R . , F‘a~laK, PI., & LancaSter, 61. 

The DOD Mi 1 i tary/Ci v i 1 i an Master Cr OSSW.3 1 K Project. 

Proceedings of the 31st Annual Conference of the Mi 1 i tary 

Testina Association. San Antonio, TX: Air Farce Hum.~n 

Resources L&oratory a n d USAF Occupat i onal l4eacuremen t 

Center, pp. 250-255. 

Department of Defense (1988) r-1 i 1 i t 3. r s Career GIJ i tje 

1933-1939. Washington, DC: Author. 

Ho1 1 and, J. L. , Gottfredson, C;. D. , & E:ak:er, H. G. c i 99~~:~ 

Ual i di ty of Vocat i onal Asp i rat i ons and Interest Inventor i es: 

Extended, Rep1 icated, and Re in terprs ted. Journa 1 Clf 

Counseling Psychology, 37, 3, pp. 337-342. 

Ho1 land, J. L, r: 1935) Manual far t h e t,Jc,cat j orfa.1 Prererence 

Inventory, Odessa, FL: Psychological {+~.sessKlerl t Re%olJrces. 

303

PREDICTING PERFORMANCE WITR BIODATA 

Morris S. Spier, Ph.D. 

Somchai Dhammanungune, Ph.D. 

U.S. International University 

Herbert George Baker, Ph.D. 

Laura E. Swirski 


ABSTRACT 

. 

A scored biographical questionnaire was developed and 

administered to a sample of Navy Fire Controlmen in two subratings: 

radar operations and data processing. The subjects 

were subsequently administered an extensive, hands-on test 

of technical proficiency. A correlational analysis 

identified 15 items that may predict proficiency for the 

radar subrating, and 20 items which may predict job 

performance for the data processing subrating. Crossvalidation 

is needed to confirm the findings. 

INTRODUCTION 

The notion that past behavior is the best predictor of 

future behavior both supports and receives support from the 

use of scored autobiographical questionnaires. Biodata has 

_ demonstrated its usefulness in predicting a range of factors 

employment setting including: (1) career 

~~ogr~!~ion; (2) turnover/job tenure; 

(3) job satisfaction; 

and (4) trainability. The convergence of the findings to 

date support the notion that biodata approaches tend to be 

excellent predictors of a wide range of employment-related 

criteria. 

The Armed Services, in cooperation with the Department of 

Defense (DOD), are currently engaged in a Joint-Services Job 

Performance Measurement (JPM) Project of which the present 

research is a subtask. The larger project is investigating 

the feasibility of measuring on-the-job performance with an 

aim toward using the measures to set military enlistment 

standards. As a part of its contribution to the Joint- 

Services Project, the Navy (Laabs & Berry, 1987) is 

developing performance measures for a number of occupational 

specialities (ratings), including that of Fire Controlman 

�� 

There are, thus, separate proficiency tests for radar and 

data processing personnel. Scoring test is done using a 

scoring sheet to grade steps in the process as having been 

completed either "correctly" or llincorrectly,'t and to grade 

any products produced as a part of the process as either 

"acceptable" or nunacceptable." The final score is a tally 

of the correct and acceptable actions and products. 

304 

.

METHOD 

The purpose of the present research was 

autobiographical questionnaire and to 

relationship between scores on the biodata 

performance on the hands-on tests. 

Biodata Questionnaire Development 

to develop 

determine 

instrument 

t:: 

and 

A l24-item draft version of the Personal Activities 

Inventory was developed, based on a review of the relevant 

literature, and on the nature of the critical tasks to be 

performed during the job performance test. Emphasis was 

placed on biodata factors associated with mechanical 

interests, abilities, and experience, numerical and 

technical/scientific interests and abilities, past 

experience with computers, and on work, academic, and 

personal experiences that might be reasonably expected, on 

an llarmchairtl basis, to be related to task performance. 

Attention was similarly given to the development of items 

that might reflect the cognitive (e.g., attention to detail) 

and social (e.g., working alone or with others) processes 

that might be reflected in task proficiency. The 124 items 

were classified into 24 broader Biodata Factors. The draft 

version of the Inventory was reviewed and, following minor 

refinements, was pretested on a small sample (N=15) to 

determine ease of administration. No problems were found. 

Subjects 

Subjects for the biodata testing were first-term FCs. The 

103 sailors who were scheduled to be administered the handson 

job performance measurement test were, thus, a sample of 

convenience for the present study. While predictor 

(biodata) scores were collected for all 103 subjects, both 

predictor and criterion (job performance) data were 

available for only 56 of the total sample tested, 25 

(44.61%) radar and 31 (55.4%) data processing. 

Administration of the "Personal Activities Inventory" 

The final version of the Inventory was administered at Dam 

Neck and San Diego. Subjects were logged-in, given the test 

booklet and answer sheet, and instructed to begin. There was 

no time limit. 

Analysis of the Data 

Hands-on test data were entered into the computer. The raw 

scores for each of the seven critical tasks were summed for 

each subject in the form of a standard score. ,Data were 

analyzed separately for the two subratings. The response 

format of each item was the determining factor in the 

305

analysis. For biodata items in which the response options 

represented a continuum, the biodata scores were related to 

the job proficiency.scores using the Pearson Product Moment 

Correlation. Items with dichotomous or discontinuous 

response options were analyzed using the Point Bi-Serial 

Correlation. 

Fe-Radar Operations Personnel 

RESULTS 

Table 1 shows that nine (9) Biodata Factors contained items -. 

which correlated at a statistically significant level with 

the job performance data of the radar personnel. It is 

interesting to note that four of the items fall within the 

Adjustment/Emotional Maturity Factor; an additional four 

items deal with some aspect of Technical/Scientific, 

Mechanical, or Numerical Factors. Overall, 13 separate 

items validated against the criterion data. Table 2 

presents the results of the Pearson Product Moment 

Correlations (continuous to continuous variables) for radar. 

The validity coefficients range from.322 (p c.05) to .575 (p 

I 

_-- - I

Table 6 presents the results of the Point Bi-Serial 

Correlations (dichotomous to continuous variables) for the 

data processing subrating. Note, again, that Item #61 and 

Item #75 each have two response foils that reach statistical 

significance. As a result, the 12 Biodata Factors that 

validated for data processing, contain 20 statistically 

significant validity coefficients. 

DISCUSSION 

The data from the present study, while based on relatively 

small samples and still needing cross-validation, suggest -. 

optimism. Among radar persons, 15 validity coefficients 

reached levels of statistical significance across 9 Biodata 

Factors. Among data processing people, 20 validity 

coefficients reached statistically significant levels. 

Moreover, the size of the coefficients are consistent with 

those reported in the literature for job proficiency in 

relation to biodata predictors (Mumford C Owens, 1987). In 

fact, the correlations are larger than those reported for 

other uses of biodata to predict military proficiency where 

only ratings, rankings, and archival data were used as the 

criterion (Barge & Hough, 1986). 

CONCLUBIONS 

A correlational analysis identified 15 items that may 

predict. The data suggest that a biodata test may be a 

useful surrogate for job proficiency tests. However, the 

limitations of the study, for example, the restricted sample 

size, make it essential that these findings be crossvalidated 

to confirm and establish the predictive factors. 

It is further recommended that the emergent l@profilel@ of the 

ratings be used to generate hypotheses about factors that 

may be predictive and thus lead to a higher proportion of 

discriminating items. Lastly, thought should be given to 

extending the biodata approach to other Navy ratings. 

REFERENCES 

Barge, B.N., and Hough, L.M. (June, 1986). Utility of 

biographical data for predicting job performance. In 

Leatta M. Hough (Ed.), Literature review: Utility of 

temperament, biodata, and interest assessment for 

predicting job performance. Alexandria, VA: U.S. Army 

Research Institute for the Behavioral and Social Sciences. 

Mumford, M.D., and Owens, W.A. (March, 1987). Methodology 

review: Principles, procedures, and findings in the 

application of background data measures. Applied 

Psychological Measurement. 

309

INTRODUCTION 

DEVELOPMENT OF EQUATIONS FOR PREDICTING 

TESTING IMPORTANCE OF TASKS 

Walter G. Albert 

William J. Phalen 


The Specialty Knowledge Test (SKT) is an important component 

of the Weighted Airman Promotion System (WAPS). SKTs are lOOitem 

multiple choice achievement tests designed to measure job 

knowledge in various Air Force Specialties (AFSs). They are 

written annually for each AFS by teams of four to eight subject 

matter experts (SMEs). The SMEs are senior NCOs in the AFS for 

which a particular test is being written. A psychologist 

experienced in test construction procedures is assigned to each 

team to serve as a group facilitator. 

A critical part of the test construction process for any SKT 

is the preparation of the test outline, which guides the SMEs in 

determining how many questions they should write for each 

knowledge or duty area of the AFS. The outline used in test 

construction is generated in one of two ways. For many years, 

the SMEs created their own outline, which is referred to as the 

Conventional Test Outline (CTO). Recently, an automated process 

has been used to develop outlines for some AFSs. With this 

process, the Automated Test Outline (ATO) is available for use 

when the test development team arrives. The AT0 is generated 

from information gathered from testing importance (TI) surveys, 

where senior NCOs are asked to rate the importance of each task 

as to whether the knowledge(s) required to perform it should be 

covered by the SKT. 

An important advantage of the AT0 procedure over the CT0 

procedure is the direct link established between important tasks 

performed by incumbents in the AFS and test questions which 

address the knowledges required to perform those tasks. The AT0 

process has been implemented in several AFSs, but currently it is 

regarded as an experimental procedure and is being evaluated 

against the CTO. This paper investigates whether information 

routinely collected from occupational surveys can be used to 

generate accurate TI values for each task. The resulting 

prediction equations could then be used to select tasks for 

inclusion in testing importance surveys of previously unsurveyed 

AFSs or to serve as a surrogate for TI, when a TI survey cannot 

be accomplished. 

OCCUPATIONAL SURVEYS 

An occupational inventory containing up to 2,000 task 

statements is administered to a large number of incumbents in 

each AFS. These tasks are grouped into seven to twenty duty 

areas. Each duty area is comprised of a group of tasks that form 

a major activity associated with the job specialty. Each 

310 

_ .

surveyed job incumbent is requested to estimate the relative 

amount of time that he/she spends in performing each task on a 

nine-point scale that ranges from "very small amount of time" to 

"very large amount of time." No response means that the 

incumbent does not perform the task. Each of these ratings is 

divided by the sum of the relative time spent values for all of 

the tasks in the inventory to get a percentage of time spent 

value for the incumbent on each task. From these responses, the 

following values are computed for each task: (a) the percentage 

of incumbents performing the task (PMP), (b) the percentage of 

time spent by incumbents performing the task (PTM), and (c) the 

average pay grade of incumbents performing the task (AG). -. 

Another survey containing the same task list as the 

occupational inventory is administered to a large sample of 

senior NCOs in each job specialty, who use a nine-point scale to 

estimate the difficulty in learning to perform each task 

successfully (TD) and the emphasis that should be given in formal 

training on each task for newly-hired employees (TE). Raters are 

asked to respond to all tasks they are familiar with, even if 

some of them are not part of their current job. The TD and TE 

values for each task are the means of the responses. 

CONVENTIONAL TEST OUTLINE DEVELOPMENT 

The CT0 is organized according to broad job knowledge areas. 

The test development teams spend one to two days to create CTOs 

by specifying and weighting knowledge categories based on their 

own expertise and their review of appropriate personnel 

classification and training documents, such as the Specialty 

Training Standard, which describes important duties and tasks for 

each job specialty; the Position Classification, which describes 

all duties and responsibilities for each job specialty; and the 

SKT abstract, which furnishes the following information for each 

task in the AFS: PMP, PTM, TE, AG, and TD. The SMEs decide on 

the number of test questions to be written on each knowledge 

area, based on their determination of the relative testing 

importance of that area. 

AUTOMATED TEST OUTLINE DEVELOPMENT 

The first step in the AT0 process is to select those tasks 

from the inventory that are performed by at least 50% of the 

incumbents or have TE values at least one standard deviation 

above the mean. The screening process selects approximately 150 

to 250 tasks for each AFS. A survey containing the selected 

tasks is administered to approximately 70 senior NCOs to obtain 

their opinions on the importance of including a question on the 

SKT concerning the knowledge required to successfully perform 

each task. The rating scale for testing importance is a sevenpoint 

scale that ranges from *Ino importance" to t'extremely high 

importance.t' The interrater reliability of these ratings is 

estimated and deviant raters are eliminated (Lindguist, 1953). 

The testing importance (TI) value for each task is the mean of 

the ratings after deviant raters have been eliminated. 

311

An AT0 is organized by duties and tasks within duties. All 

tasks on the TI survey are listed under the appropriate duty. 

The TI values are used to weight the duties and tasks. To 

accomplish this weighting, the TI values for each task are 

squared and summed within a duty. The weight for each duty is 

the sum of the squared TI values across all tasks within the duty 

divided by the sum of the squared TI values across all duties. 

These weights are the percentages of test questions to be 

selected to cover the required knowledges to successfully perform 

the tasks within each duty. 

The TI value of each task within a duty is reflected by a 

letter from A to D. Tasks are designated as r'AV' tasks if their -. 

TI values are at least one standard deviation above the mean of 

the TI values or if their TI values are at least 6.0. Similarly, 

tasks are designated as 'ID" tasks if their TI values are more 

than one standard deviation below the mean of the TI values: 

however, all tasks with TI values of at least 4.00 are designated 

as qVC'* tasks. Of the remaining tasks, the upper 50% are 

designated "B" tasks and the lower 50% are designated "C" tasks. 

SMEs are required to write at least one item to test the job 

knowledge required for every '@A" task and to write no more than 

three items for a single task. Procedures are available to 

override these restrictions; however, they require written 

justification. Items can be written on "D1@ tasks only with the 

group facilitator's approval. 

PROCEDURE 

Tasks for each of 26 AFSs for which testing importance 

indices were available (914X0, 753X0, 423X3, 791X0, 423X4, 792X1, 

915X0, 908X0, 392X0, 231X2, 542X2, 674X0, 552X0, 324X0, 542X1, 

427X3, 321XlE, 112X0, 121X0, 274X0, 321XlG, 241X0, 431X0, 275X0, 

566X0, and 231X0) were randomly divided into two samples--one 

sample designated the Validation sample" and the other sample 

designated the "cross-validation sample." First, the IIAtt tasks 

were randomly split between the two samples, such that each 

sample contained approximately an equal number of l*AVV tasks. The 

IIB, II IICII and I'D“ tasks were split between the two samples in the 

same manner. Regression equations were computed separately for 

each validation and cross-validation sample with TI as the 

criterion and PMP, PTM, AG, TD, and TE as the predictor 

variables. The two sets of regression weights computed for each 

AFS were applied to the predictor scores for the cross-validation 

sample to generate predicted testing importance (PTI) values. 

The predictive efficiency of each set of weights can be measured 

by the Pearson coefficient of correlation (r) between TI and PTI. 

If the shrinkage in r using the two sets of weights on the 

cross-validation sample is statistically nonsignificant (Walker t 

Lev, 1953), then the data for both samples can be combined for a 

hierarchical clustering analysis. In this procedure, the number 

Of regression equations is reduced by one at each stage of the 

clustering by combining AFSS into groups and combining their 

corresponding regression data. The two most similar groups are 

combined at each stage, as measured by the resulting loss of 

312 

, 

i

overall predictive efficiency, (i.e., the reduction in r between 

TI and PTI). The process continues until all data are combined 

into a single equation. Analysis of the r losses at each stage 

allows identification of the fewest number of regression 

equations that can accurately generate FTI values across all 

AFSs. In order to measure how well each set of weights would 

reproduce an ATO, the weights were used to classify tasks into 

the "A-D" categories. PTI values were classified into importance 

categories of A through D using a procedure identical to the one 

for TI values. Classification accuracy (CA) was measured by 

computing the table and formula shown in Figure 1. 

Predicted Ch0~ltlcatlon 

A B C D 

A F 11 F 11 F 1s F 14 Rl 

B 

Actual F 21 F 22 F 2s F 24 R2 

Cluoiflcatlon 

c F91 F SP F 93 

F R3 

94 

D F 41 F 44 F,, F 44 Fk 

Cl c2 cs c4 N 

F;; ir the frequency in the iJ%ll 

Figure 1. Classification Table and Formula 

CA has been weighted such that misclassifications result in 

larger penalties as the V'distance18 between predicted 

classification and correct classification becomes greater. This 

weighting strategy is reasonable, in that testing importance 

differences associated with categories in the table become 

greater as the 8*distance1V between the categories increases. The 

range of CA values is 0% (every classification has maximum 

distance from the correct classification) to 100% (every 

classification is correct). 

RESULTS 

The r's using weights from the cross-validation samples 

ranged from .44 (908X0) to .92 (914X0) and the r's using weights 

from the validation samples ranged from .42 (566X0) to .91 

(121X0). Therefore, there is great amount of variability among 

the AFSs in the ability of a linear function of the five 

predictors to account for the variance in TI. Because the 

shrinkage in r using the weights from the validation and crossvalidation 

samples was nonsignificant (a(=.O5) across all AFSs, 

the validation and cross-validation samples were combined for 

subsequent analyses. 

313 

_

classification tables and CA's were also computed for each 

set of weights. The CA's using weights from the cross-validation 

samples ranged from 70% (908X0) to 92% (121X0); and for the 

validation samples, from 68% (231X0) to 92% (121X0). Of the 

4,104 tasks classified within the 26 AFSs, only four "D" tasks 

were classified as "A" tasks and only four @'A" tasks were 

classified as @rD@t tasks. Although it is desirable to have zero A 

to D or D to A misclassifications (because the test development 

team is being advised incorrectly to write or not write an item), 

infrequent misclassifications of this type should not adversely 

affect the construction of a valid SKT. The team can rectify 

these discrepancies with the permission of the group facilitator. *: 

CA's computed for the combined data ranged from 71% (908X0) to 

90% (112X0). In general, the predictive accuracies using 

combined samples were higher than those for the validation 

samples referred to earlier, but all differences were small (less 

than 6%). Only two @ID" tasks were classified as rcA't tasks and 

two *'Aft tasks were classified as "D" tasks. Squared and 

interactive predictor terms were added to the model for each AFS 

in an attempt to increase classification accuracy, but only small 

increases in accuracy were observed. In fact, for some AFSs, 

classification accuracy decreased. 

What is adequate classification accuracy in the context of 

generating an ATO? The table having the lowest CA value (68%) is 

shown in Figure 2. It was generated by applying the 112X0 

weights from the validation sample to the cross-validation 

sample. The impact of the misclassifications in the table is 

probably not too severe when it is recalled that the AT0 is a 

guide for SMEs to use in developing an SXT, and they are free to 

select tasks from any of the importance categories within the 

restrictions delineated above. 

Pfedl,otd CtuJfloation 

Flgwo 2. Claultlcatlon Tsbk wlth Lorn.1 a Valu. 

The r's for the combined data for each AFS ranged from .51 

(908X0) to .91 (112X0). A hierarchical clustering of the 

regression equations for ail 26 AFSs showed small decreases in r 

throughout most of the clustering process. For example, the 

overall r dropped from .84 at the 26-group stage (i.e., a 

separate regression equation for each of the 26 AFSs) to .79 at 

the 5-group stage. Thereafter, the drops in r to the l-group 

Stage were .02, .02, .04, and .12, respectively. 

The gradual drop in r's until the clustering at the l-group 

stage makes identification of an "optimal clustering stage" 

difficult. Therefore, classification was also examined at 

various stages. The CA's of equations at the l-group stage 

314

anged from 62% (908X0) to 89% (112X0); however, the second 

smallest CA was 71% (321XlE). In comparison, the CA's at the 26group 

stage ranged from 71% (908X0) to 90% (112X0), with the 

second smallest CA being 72% (231X0). Therefore, the range of 

the CA's doesn't change much between the two extremes of the 

clustering process. At the 26-group stage, only 2 "A" tasks were 

classified as "Dw tasks and only 2 "D" tasks were classified as 

ltA1l tasks. With the exception of one AFS (908X0), where six 'ID" 

tasks were classified as IlA" tasks and one "A" task was classfied 

as a "D" task, there were only three "A" tasks classified as "D" 

tasks and two "DM tasks classified as "A" tasks over all AFSs at 

the l-group stage. 

A Wilcoxon matched-pairs signed-ranks test (Siegel, 1956) 

was used to compare the differences in CA's between the the 26group 

stage and the 5-group stage and between the 5-group stage 

and the l-group stage. There was a statistically significant 

difference (d=. 05) between the 26-group and 5-group stages, but 

not between the 5-group and l-group stages. Although 

significantly better classifications result from the use of 26 

eqUatiOnS, 20 Of 26 AFSs had differences Of 5% or 1eSS (maX=13%). 

If generalized equations are to be used to classify tasks in 

other AFSs where TI data are not available, it appears promising 

that a single prediction equation could generate adequate testing 

importance values. Further analyses are being conducted to 

identify the highest and lowest stages that are significantly 

different from the 26-group and l-group stages, respectively. 

CONCLUSIONS 

A large amount of the variance in TI was accounted for by 

linear combinations of the task-level predictors. The stability 

of least squares weights within each of the 26 AFSs was 

demonstrated. Prediction equations adequately classified tasks 

according to testing importance with very few A to D or D to A 

misclassifications. Use of squared and interactive predictor 

terms added little to predictive efficiency. A hierarchical 

clustering of the regression equations developed for each AFS 

showed small decreases in predictive efficiency throughout most 

of the clustering process. Preliminary results indicate that a 

single prediction equation may do an adequate job of classifying 

tasks on testing importance across all AFSs. 

REFERENCES 

Lindguist, E. F. (1953). Design and analysis of exneriments in 

psvcholoav and education. Boston: Houghton Mifflin Company. 

Walker, H. M. & LeV, J. (1953). Statistical inference. New York: 

Henry Halt and Company. 

Siegel, S. (1956). Nonnarametric statistics for the behavioral 

sciences. New York: McGraw-Hill. 

315 

_ .

Authors: 

INTRODUCTION 

ESTIMATING TESTING IMPORTANCE OF TASKS 

BY DIRECT TASK FACTOR WEIGHTING 

William J. Phalen, Air Force Human Resources Laboratory 

Walter G. Albert, Air Force Human Resources Laboratory 

Darryl K. Hand, Metrica, Inc. 

Martin J. Dittmar, Metrica, Inc. 

This paper is one of a series of presentations delivered at the current and previous two Military Testing 

Association Conferences to document R&D of an automated, task-data-based outline development procedure 

for Air Force Specialty Knowledge Tests (SKTs). A companion paper to thii one (Albert & Phalen, 1990) 

provides a brief description of the automated test outline (ATO) procedure. This paper will focus on that part 

of the AT0 procedure having to do with the selection process by which 150 to 250 tasks are selected from a job 

inventory containing up to 2,000 tasks for inclusion in a Testing Importance. Survey booklet. Up to now, rulebased 

screening procedures have been used to identify potentially important tasks to include in the survey, with 

cutoffs on percent of members performing each task at the E-5 and E-6/7 paygrade levels and on the 

recommended training emphasis index being the primary selection criteria. A little over a year ago, research 

was initiated to derive and validate a minimal subset of regression equations for predicting the SME-furnished 

testing importance ratings in 28 AFSs with linear combinations of five task-level predictor variables, i.e., percent 

of members performing (PMP), percent time spent by members performing (PTM), ave.rage paygrade of 

members performing (AG), task learning difliculty (TD), and field-recommended task training emphasis for fusttermers 

(TE). So far, it appears that possibly one, but not more than three, generalized regression equations 

may adequately classify tasks into their appropriate testing importance categories. These equations will, 

hopefully, perform several important functions. First of all, they should provide a more accurate and defensible 

task selection procedure for surveying AFSs that have not been previously surveyed. Secondly, the predicted 

testing importance (PTI) values generated by the equations should be able to serve as surrogate testing 

importance indices when time or budget constraints prevent the administration of testing importance surveys. 

Thirdly, when a new job inventory is developed and administered in an AFS whose testing importance data are 

based on the old job inventory tasks, the new data for the predictor variables should be available to use in 

conjunction with one of the generalized regression equations to generate PTI values for all the tasks in the new 

job inventory. 

But the application of these PTI equations also raises several pertinent questions: (1) How can we 

determine which PTI equation should be used to generate PTI values for a previously unsurveyed AFS? (2) Can 

SMEs provide direct estimates of Al?%specific weights for the five predictor variables that are nearly as accurate 

for an AFS as the generalized regression weights? (3) Is it possible that the need for regression-generated or 

SME-derived weighting is obviated by simple unit weighting of the five predictor variables? The potential value 

of direct estimation of predictor weights by SMEs was anticipated back in 1987; accordingly, an SKT Task Factor 

Testing Importance Survey booklet was developed and administered to the SMEs in all AFSs for which SKTs 

were developed in 1988, 1989, and 1990 (to date). The booklet used in 1988 contained seven factors, the two 

additional ones being “consequences of inadequate performance” (CIP) and “requirement for prompt 

performance” (RPP), the latter being a rewording of the old “task delay tolerance” factor in order to reverse the 

direction of the scale and make it consistent for all factors. In 1989, it was decided to limit the task factors 

surveyed to the five which were routinely surveyed by the USAF Occupational Measurement Squadron 

(USAFOMS); thus, CIP and RPP were dropped. The elimination of the CIP and RPP factors also made it 

possible to assess the effect of their presence or absence on the other five factors. In 1990, the CIP and RPP 

factors were restored to the survey in order to introduce more variance into the profiles of the SME-furnished 

factor weights and thus eliminate some fuzziness from the clustering solution. The availability of data on the 

same seven factors for the same AFSs in 1988 and 1990 made it possible to assess the stability of factor weights 

over a two-year period, assuming, of course, that the SMEs in both periods were equally representative of their 

Al=%. 

316 

- . 

I 

I

THE SURVEY INSTRUMENT 

The SKT Task Factor Testing Importance Survey is administered to all SMEs who have been sent by 

their respective commands to participate in the development of SKTs in their AFSs. TO date, approximately 

1,000 SMEs have been surveyed. The survey is group-administered by a member of the USAFOMS test 

development staff immediately following the SKT in-briefing. It takes about 10 minutes to read the instructions, 

fill in the background section, and provide ratings on the seven listed factors (1 to 7 scale). In order to clearly 

communicate what the XT task factor rating process is all about, the rating instructions, scale, and factor 

definitions as they appear in the survey booklet are shown in Figure 1. 

RESULTS 

A. Reliabilitv Analvsis. There were 35 AFSs in which the SKI Task Factor Testing Importance Survey 

was administered in 1988 and again in 1990. In most instances, no SMEs appeared in both survey 

samples. As shown in Table 1, the average number of raters per AFS in 1988 was 3.50, and in 1990, 

the average was 3.59. The average correlation between the mean factor profiles (across seven 

factors) for the 35 AFSs was 4841 (correlations averaged through 2,). A value this high was 

considered very acceptable, especially since it involved a two-year time interval between 

administrations and small numbers of different raters per AFS at both points in time. This value 

compares very well with the average test-retest reliability of 5835 that was obtained on task-level 

testing importance ratings for 26 raters in 20 AFSs with a 3-to-4-month interval (Weissmuller, 

Dittmar, & Phalen, 1988). These raters were surveyed by mail and were later surveyed again when 

they were selected to serve on an SKT development team. The difference between the two 

reliability coefficients was found to be nonsignificant (p = .4337). As a further test, the 1988-to-1990 

factor profile correlation ( i = 4841) was treated as a group measure of interrater reliability (RJ 

with no time interval involved, and the R, was reduced to a single-rater reliability value (R,,) for 

comparison with the mean R,, value for task-level testing importance ratings across all 28 AFSs that 

had been surveyed. The computed R,, value for a composite reliability (Ra of 4841 based on an 

overall average of 3.54 raters per factor profile was 2649. The average R,, for the task-level testing 

importance ratings across the 28 surveyed Al% was Z40, an almost identical value. Yet, the 

former involved a two-year interval and the latter is a concurrent measure of internal consistency, 

B. 1. Two tests were 

applied to determine whether the relative weights of the common five factors were affected by 

adding or removing the additional two factors (i.e., CIP and RPP). In the first test, each factor was 

given an overall rank in terms of its mean rating in 1989 (live-factor survey) and its mean rating in 

1988 and 1990 separately (seven-factor surveys). The Mann-Whitney test was applied to assess the 

differences in the sums of ranks. The mean ratings of the PTS, AG, and TD factors were relatively 

unaffected by the presence or absence of the additional factors, but PMP and TE showed signilicant 

shifts in their mean ratings (p c .Ol). Both were significantly higher when CIP and RPP were 

absent (or sign&a.ntly lower when CIP and RPP were present). A test was also applied to 

determine whether the sizes of the differences between the PMP and TE means in the five-factor 

vs. the seven-factor environment ‘were related to the sizes of the mean CIP and RPP values. 

Regression equations of the form i%@ - PMP, = W, m, + W, RPP, were applied. None of the 

regression results were found to be significant. Thus, while it can be said that PMP and TE were 

affected in a given direction by the presence or absence of CIP and RPP, there was no indication 

that the level of difference was proportional to the level of CIp and m. 

317 

.

SECTION II. INST’RUCI-IONS 

Imagine that you have been asked to review the job-task statements in the most went USAF Job Invcntoy administered 

in the career field for which you are developing SKT’s. This survey could contain anywhtre from SO0 to 1200 or more task 

statements. Next, assume that you have been asked to rate each task statement indicating how important it is to include the job 

knowledges needed to petiorm that task on a Specialty Knowledge Test. A task would be rated high in testing importance if it 

requires knowledges that are critical to successful job performance within the career field. 

You are in luck, however. You are not being asked to prwidc these 500 or more ratings. Instead, seven factors (or.typcs 

of information) have been proposed as possible factors in determining tht testing importance level of a task. These seven factors, 

along with thtir descriptions, are shown in Section II, SKT TASK FACIGR TESTING IMPORTANCE RATING SCALE YOU 

are asked to rate each task factor on how impottant it is to consider this factor when assigning a testing importance rating to the 

tasks performed by airmen in the Air Force Specialty for which you are developing SK%. Using the scale provided, determine the 

most appropriate rating and record your rating in the column provided. 

Ratine Factor 

SECTION II: SKI- TASK FACIOR TESTING IMPORTANCE RATINGS 

RATING SCALE FOR FACTORS IN TESTING IMPORTANCE 

This factor has: 

7 = Extremely High Importance 

6 = High Importance 

5 = Above Average Importance 

4 = Average Importance 

3 = Below Average Importance 

2 = Iow Importance 

1 = No Importance 

- 1. Percent Members Performing: a measure of the proportion of all airmen who perform the task 

- 2. Average Percent Time Soent: a measure of the proportion of the total work time that airmen in the AFS spend 

performing the task 

- 3. Average Grade: the average grade of all airmen who perform the task. 

- 4. Learning Difficult\l: a measure of the relative length of time required to learn to perform the task properly. 

- 5. Ccmseaucnces of Inadeauate Performance: a measure of the probable seriousness of failing to perform the task 

properly. Tht impact is measured in terms of possible injury or death, damage to equipment, wasted supplies or lost 

work-hours, etc. 

- 6. Reauiremtnt for Promot Performance: a measure of the length of time from the moment that an airman is aware 

that a task will need to be done up to the point at which the task MUST be performed. In other words, doer the 

airman have to be able to perform the task immediately, or does he or she have time to consult a manual or seek 

guidance? 

__ 7. Field-Recommended I&t-y-Level Training Fmoh&s: a measure of how strongly NCOs in the field have 

recommended the task for inclusion in formal, structured training programs for entry-level airmen. Structured 

training may include resident technical school, on-the-job training (OJT), field training detachments (FTDs), or 

career development courses (CDCs). 

Figure 1. SKT Rating Form 

318

C. Clusterine of Factor Profiles vs. Clustering of PTI Remession Equations. One objective of 

gathering task factor ratings from SMEs was to provide a means of determining which one of 

several generalized regression equations should be applied to previously unsurveyed AFSs to 

select the appropriate set of tasks for inclusion in a Task Testing Importance Survey. If AFS 

factor profiles produced a clustering of AFSS that corresponded to the clustering of AFSs on 

similarity of regression equations, then regression equation group membership could be defined 

for task factor clusters of AFSs for which there were no regression equations. Various attempts 

were made to produce corresponding clustering solutions, but no adequate match could be 

generated. A major impediment was the fact that even in the case in which the input sample 

of factor profiles contained the maximum amount of variance (1988,1989, and 1990 combined) 

the “between” overlap for the last two groups to merge was 86.3% and the total sample “within” 

overlap was 93.2%. On the other hand, the clustering of regression equations did not seem to 

indicate a need for more than one equation. Thus, a lack of variance was present in these data, 

as well. If additional research indicates that only one overall regression equation is needed for 

all AFSs, then the need for a procedure to select the appropriate regression equaiion for a 

previously unsurveyed AFS vanishes. 

D. Comnarison of Remession- vs. Factor-Weighted Eauations for Predictinp Testine: Imnortance 

of Tasks. Table 1 shows the predictive efficiency of the AFS-specific PTI regression equations 

for 25 AFSS for which task-level testing importance id&s were available and for which SMEs 

had provided factor weights in 1988, 1989, or 1990. Since the derivation and validation of the 

regression equations and their predictive efficiency are discussed in detail in a companion paper 

(Albert & Phalen, 1990), the correlations of predicted and actual testing importance values for 

the 25 AFSs are reported here only for their comparison with the correlations produced by the 

SME-based factor-weighting approach (which standardizes each task factor before applying the 

factor weights and sums the cross-products into a testing importance composite). In Table 1, 

only the highest correlations computed for the 1988, 1989, and 1990 factor weights and all 

possible combinations thereof are reported in order to show the highest correlations this 

approach can hope to produce for comparison against the best alternative, i.e,. the least-squares 

fit of task-level indices for the five task factors (predictors) to the indices of task-level testing 

importance (criterion). 

For some unexplainable reason, the 1990 factor weights uniformly produced 

lower correlations than the 1988 weights. Overah, the factor-derived correlations averaged to 

a respectable i = 602 at the E-5 level and i = 606 at the E-6/7 level, compared to i = .798 

and .786 for the E-5 and E-6/7 regression-derived correlations, respectively. The difference is 

significant (p c 01) in both cases, but the real difference is in the lack of uniformity of fit of 

the factor-derived approach; i.e., in some cases, it matches the regression-derived correlations 

quite well, and in other cases rather poorly. It appears that the SME-furnished factor-weighting 

approach is not an acceptable alternative to the regression approach, as long as the regression 

alternative remains supportable, 

E. Differential vs. Unit Weiphting of Factors. Because there was little variance in the SMEderived 

factor weights, and substantial positive correlations existed between the five task factors 

and the testing importance criterion, with the exception of average grade (Weissmuller, Dittmar, 

& Phalen, 1989), there was a distinct possibility that a unit-weighted linear composite of the 

standardized task factors might do almost as well as the differentially weighted composite. The 

effect of unit weighting on the correlations with testing importance are shown in Table 1 under 

the heading “Unit.” The unit weighting approach produced correlations for both the E-5 and 

E-6/7 levels that were generally close to the correlations derived from differential weighting by 

SMEs, with only two instances showing a substantial drop in correlation (both within the same 

AFS); but 14 correlations based on unit weighting were actually higher than those based on 

differential weighting. Tests of significance of difference between the i ‘s for differential and 

319

=TlwilhlTl 

r-TlrithPll

DJSCUSSION 

unit weighting at the E-5 and E-6/7 levels (602 vs. 565, and 606 vs. S82, respectively) yielded 

no significant differences. These findings clearly indicate that there is virtually nothing to be 

gained by continuing to gather factor importance ratings from SMEs, since. unit weighting of 

the factors is equally effective. 

The findings of this study suggest one positive conclusion and three negative conclusions. The positive 

conclusion is: (1) Factor importance weights display good reliability, even when the interval between 

administrations is as long as two years. The negative conclusions are: (1) The factor importance weighting 

approach does not yield correlations with task-level testing importance that would permit abandonment of the 

more rigorous regression approach, which requires the administration of task-level testing importance surveys 

in order to obtain criterion data for generating a least-squares solution. (2) There does not appear to be 

sufficient variance in the profiles of factor weights to provide a clustering of AFSs that corresponds sufficiently 

well with the clustering of AI%-specific regression equations; therefore, the clustering of profdes of factor weights 

is not useful for indicating which generalized regression equation should be used for a particular AFS (assuming 

that more than one equation will be needed to adequately cover ah AFSs). (3) Since unit weighting of the 

testing importance factors in virtually as good as SME-furnished differential weights, there is little to be gained 

by continuing to gather factor importance ratings from SMEs. 

RECOMMENDATIONS 

Discontinue administration of the Testing Importance Factors Survey and concentrate instead on 

improving the predictive efficiency and classification accuracy of the regression-based procedure. 

REFERENCES 

Albert, W.G., & PhaIen, WJ. (1990). Development of equations for predicting testing importance of 

tasks. Proceedings of the 32nd Annual Conference of the Militarv TestinP Association, 

Orange Beach, AL. 

WeissmulIer, J J., Dittmar, M J. & Phalen, WJ. (1989). Automated test outline develonment: research 

findins (AFHRL-TP-88-70, AD-215 401). Brooks AFB, TX: Manpower and Personnel 


321 

.

Upper Body Strength and Performance in Army Enlisted MOS 

Elizabeth J. Brady and Michael G. Rumsey 

Army Research Institute 

Introduction 

Cognitive testing for selection and classification purposes 

has a long and distinguished history in the military services. 

The link between cognitive ability and soldier performance has by 

now been firmly established, providing a reasonably solid basis 

for this type of testing. 

The concept of screening on the basis of physical strength 

capability is less firmly established. A solid empirical 

foundation linking physical strength to overall job performance 

does not as yet exis:. Yet for those jobs requiring lifting or 

moving heavv n:,ysic?l objects, the question naturally arises as 

to whether SC"'- llllllirnal degree of physical strength might be an 

appropriate prerequisite. 

This question began to receive special attention in the 

1970's, as the number of women serving in the military, as well 

as the number of specialties open to women, increased 

dramatically. In 1976, the General Accounting Office recommended 

that the services develop common physical standards for males and 

females in specialties where physical strength attributes were 

relevant to effective performance. In 1982, A Women in the Army 

policy review evaluated the strength requirements of a variety of 

jobs. Then, 1984, the Army began administering the Military 

Entrance Physical Strength Capacity Test ('MEPSCAT) to each 

applicant for enlistment at the Military Entrance Processing 

Stations (MEPS). Results of the test were used for job placement 

counseling rather than for determining an individual's 

qualification for entering any particular job. 

In 1987, the Army's personnel office, the ~. Office-of _. the ,, 

Deputy Chief of Staff for Personnel (ODCSPER), determinea tnat it 

was time to review its physical strength screening process. The 

question of most immediate concern was: are the benefits of 

screening worth the effort? The initial approach taken to 

answerina the o-uestion was to explore whether there was any 

evidencedthat physical strength iimitations were perceived to 

interfere in any substantial way with job performance in the 

Army. 

Presented at the meeting of the Military Testing 

Association, November, 1990. All statements expressed in this 

Paper are those of the authors and do not necessarily reflect the 

official opinions or policies of the U.S. Army Research Institute 


322 I

The ODCSPER directed that a Physical Requirements 

Questionnaire (PRQ) be developed and administered to determine 

the extent to which job incumbents were perceived, by themselves 

or their supervisors, as having difficulty in performing their 

job due to upper body strength limitations. Accordingly, the 

U.S. Army Research Institute, in collaboration with the Enlisted 

Accessions Division of the ODCSPER and the Exercise Physiology 

Division of the Army Research Institute of Environmental 

Medicine, developed a 7-item supervisor version and an ll-item 

incumbent version of this questionnaire. Only the results from 

the incumbent version will be discussed in this paper. 

This paper will assess the extent to which insufficient. 

upper body strength is perceived to interfere significantly with 

job performance in a representative sample of Army jobs. These 

self-report data will also be related to MEPSCAT scores, an 

objective measure of upper body strength. 

Method 

Subjects. The total sample size consisted of 11,069 (88% 

male, 12% female) job incumbents across 21 Military Occupational 

Specialties (MOS). There were 65% white, 27% black, 4% hispanic, 

and 4% other in this sample. The mean age for 86% of the males 

was 20, and 60% of the females had a mean age of 21. Due to 

missing data, the actual sample sizes used in the following 

a.lalyses may be somewhat smaller. 

Phvsical Requirements Ouestionnaire. The incumbent version 

of the PRQ contains 11 items, which consist of 10 multiple choice 

and one short answer. This version was pretested in April 1988, 

as part of a field test of Project A second tour measures. It 

was administered to 79 second tour soldiers (36 to 60 months in 

service) in three MOS (13B, cannon crewmember; 88M, motor 

transport operator; and 95B, military police). The results of 

the pretest indicated that the PRQ was easy to administer, that 

the response options were reasonable, and that it could be 

completed in less than 10 minutes. 

Phvsical Demand Cateqories. The purpose of the physical 

demand categories is to assign soldiers to jobs for which they 

are physically qualified. The categories are based on upper body 

strength. According to AR 611-201, the five categories are: (1) 

LIGHT - occasionally lift 20 pounds and frequently lift 10 

pounds: (2) MEDIUM - occasionally lift 50 pounds and frequently 

lift 25 pounds; (3) MODERATELY HEAVY - occasionally lift 80 

pounds and frequently lift 40 pounds; (4) HEAVY - occasionally 

lift a maximum of 100 pounds and ,frequently lift 50 pounds; and 

(5) VERY HEAVY - occasionally lift over 100 pounds and 

frequently lift 50 pounds. As shown in Table 1, the Project A 

sample has 14 Very Heavy MOS, 1 Heavy MOS, 4 Moderately Heavy 

MOS, 2 Medium MOS, and no Light MOS. 

323 

c 

. .

Table 1 

MOS by Physical Demand Cateaories 

VERY HEAVY 11B Infantryman 

12B Combat Engineer 

13B Cannon Crewmember 

19E M48-M60 Armor Crewman 

19K Ml Armor Crewman 

27E Tow/Dragon Repairer 

31c Single Channel Radio Operator 

51B Carpentry & Masonry Specialist 

54B Chemical Operations Specialist 

55B Ammunition Specialist 

63B Light Wheel Vehicle Mechanic 

67N Utility Helicopter Repairer 

88M Motor Transport Operator 

94B Food Service Specialist 

HEAVY 76Y Unit Supply Specialist 

MODERATELY HEAVY 

MEDIUM 

16s Manpads & PMS Crewmember 

29E Radio Repairer 

91A Medical Specialist 

95B Military Police 

71L Administrative Specialist 

96B Intelligence Analyst 

Data Collection. The objective was to collect questionnaire 

responses from a large number of first tour incumbents in a 

reasonably representative set of Army MOS, or jobs. It was 

determined that the most effective means of achieving this 

objective was to administer the PRQ as part of a large-scale data 

collection being conducted as one stage in a research effort, 

known as Project A, to improve the Army's enlisted selection and 

classification system. Between July, 1988 and February, 1989, 

the PRQ was administered to 11,069 soldiers in 21 MOS chosen to 

reasonably represent the full set of Army MOS for Project A 

purposes. 

Results 

A factor analysis with an orthogonal varimax rotation 

yielded two factors, which accounted for 46% of the common 

variance. The first factor includes items which deal with the 

individual's inability to get the job done: the second factor 

includes items which tend to focus more on ways in which to 

improve job performance. 

324 

.

Factor 1, 

For purposes of this paper, one representative item was 

selected from each scale for purposes of highlighting some of the 

principal results that emerged from our initial analyses of these 

data. From the first factor, the item selected reads as follows: 

How many times in the past six months have you had insufficient 

upper body strength to complete a task assignment in your MOS? 

The response options for question 1, and the proportion of 

respondents choosing each option, are shown below: 

Prooortion 

Ontion Male Female Total 

1. 10 or more 7 8 7 

2. 5 to 9 3 5 3 

3. 2 to 4 8 17 9 

4. 1 6 6 6 

5. None 76 64 75 

Before further analyses were conducted, response options 

were grouped into two categories based on the degree of 

difficulty experienced by the respondent in performing tasks: 

high difficulty (options 1 and 2) and low difficulty (responses 

3, 4 and 5). Thus, 10% of the total group and of the males, and 

13% of the females, fell in the high difficulty group. 

The next analysis examined whether this type of difficulty 

was related to ability to lift as measured by the MEPSCAT score 

obtained at the time of enlistment. Individuals were sorted irrto 

two groups based on their MEPSCAT score: one group consisting of 

those who were able to lift 110 pounds, and a second group 

consisting of those who were not. The difference between the 

groups was rather small: 9.7% of those with high MEPSCAT scores 

reported high difficulty; 11.5% of those with low MEPSCAT scores 

reported such difficulty. 

Next, results were compared across MOS. Substantial 

differences were found, with motor transport operators having the 

largest percentage (16) in the high difficulty group and radio 

repairers having the lowest percentage (3). 

The next set of analyses examined characteristics which 

might at least in part account for MOS differences. It was fo.und 

that 11% of the soldiers in MOS with very heavy physical strength 

requirements, compared with 7% in the other MOS, fell in the high 

difficulty category. In combat MOS, 12% experienced a high 

degree of difficulty: in non-combat MOS, 8%. 

Some of the results followed no particular pattern, 

suggesting the need for further investigation. The greatest 

disparity between the sexes was found among light wheel vehicle 

mechanics, where 26% of the females, but only 9% of the males, 

325

eported high difficulty. While a fair number of males (14%) 

experienced difficulty in the motor vehicle transport job, this 

was another case where the percentage of females experiencing 

difficulty was particularly high (24%). Both male (13%) and 

female (18%) food service specialists also placed large numbers 

in the high difficulty category. 

The pattern oi results for other items in the first factor 

generally followed the pattern for this item. However, MEPSCAT 

made a much greater difference with respect to a second item in 

this factor: How many times in the past six months have you been 

physically unable to lift an object while working on your Army 

job? The response options for this item were the same as those 

for the first item, and high difficulty and low difficulty were 

defined the same way as for the first item. On this item, 5.3% 

of those with high MEPSCAT scores reported high difficulty; 9.8% 

with low MEPSCAT scores so reported. 

Factor 2 

The item in Factor 2 chosen for close examination in this 

paper read as follows: How helpful do you think weight/strength 

training would be in improving your job performance? The 

response options for this item, and the proportion of respondents 

choosing each option, are shown below: 

Proportion 

Option Male Female Total 

1. Extremely helpful 31 18 30 

2. Helpful 30 26 29 

3. Somewhat helpful 18 20 18 

4. A little helpful 13 18 13 

5. Not at all helpful 8 18 10 

Again, for purposes of simplicity, responses were grouped 

into two categories. A "more helpfull' category consisted of 

options 1 and 2; a lVless helpful" category consisted of options 

3, 4, and 5. As can be seen above, 59% of the total group, 61% 

of the males, and 44% of the females, responded in the more 

helpful category. 

Among those able to lift 110 pounds, 61% were in the more 

helpful category, as opposed to 53% of those not able to lift 110 

pounds. 

A comparison across MOS revealed vast differences. Among 

cannon crewmembers, 73% were in the "more helpful" Category. 

Among administrative specialists, 25% were in this category. In 

the MOS with very heavy strength requirements, 64% thought 

weight/strength training would be helpful or extremely helpful. 

In the other MOS, the percentage was only 49%. In combat MOS, 

the percentage was 70%: in non-combat MOS, 53%. 

326 

. .

Discussion 

Certain characteristics of this effort suggest that we 

should treat these findings cautiously. We are dealing with 

self-report; thus, all limitations associated with self report 

measures must be considered. We have observed a positive 

relationship between self-reported difficulty in lifting an 

object and performance on a more objective measure, the MEPSCAT, 

however, so we do feel the results do deserve to be looked at 

seriously. We should also point out that we are dealing here 

with but two items on an Il-item scale. Until we can report more 

thoroughly the results of all the items, as well as the results 

from the supervisor version of the PRQ and from a variety of. 

additional performance measures administered concurrently with 

the PRQ, these results should be considered as just a slice from 

a much larger picture. 

Having expressed these caveats, what should we make of the 

results? The good news is that soldiers do not report widespread 

difficulties with the physical demands of their jobs. The 

somewhat surprising news is that the overall differences between 

self-reported male and female difficulty are not particularly 

great. 

But when we look beneath the surface, the picture is not ali 

that simple. There are major job differences, some not terribly 

surprising, some perhaps deserving further investigation. Why 

are the physical demands of being a mechanic, for example, 

apparently so much greater for females than for males? Why is 

there a similar disparity for truck drivers? 

The item on weight/strength training also revealed some 

interesting news. It is those people who are already strongest 

(in terms of their MEPSCAT scores) who are most convinced of the 

benefits of weight/strength training. Of course, since these 

individuals may be concentrated in jobs where the physical 

demands are the greatest, the true meaning of this finding awaits 

further analysis. While it may not be surprising that clerks see 

less need for strength training than do those in combat jobs, the 

extent to which clerks seem to view strength training as not 

particularly helpful is perhaps beyond what one might expect. 

The results reported here are best considered as a preview 

of things to come. Further analyses on a data set allowing a 

much broader set of-comparisons, and at a higher level of 

sophistication, than could be completed at this time will follow. 

Thus, we will forego the temptation to draw major conclusions 

until we have travelled somewhat further along the data analysis 

road. 

327

Response Distortion on the Adaptability Screening Profile (ASP)’ 

Dale R. Palmer, Leonard A. White, and Mark C. Young 

U. S. Army Research Institute 


INTRODUCTION 

The Armed Services are considering the implementation of a biodata/temperament instrument, 

the Adaptability Screening Profile (ASP), to supplement education credentials as a predictor of first term 

attrition. A key problem in utilizing instruments like Ihe ASP, especially in the “en masse” screening 

medium of the Armed Services, concerns the potential for ttem response distortion of the self-report 

information, and consequently, invalidation of the instrument over time (Walker, 1985). Previous research 

on the Armed Services Applicant Profile (ASAP) and the Assessment of Background and Life 

Experiences (ABLE), both components of the ASP, indicates that these instruments are susceptible t0 

intentional distortion in the desired iirection of the examinee (Hough, 1987; Trent, Atwater, & Abraham% 

1986). Thus, it is possib!z that wid6,pread distortion could occur in a service applicant setting, 

particularly if ..lz\e,notlnc 5 f--Juraged. Guidelines may be written that “coach” applicants on how to 

do well on the test, ana recruiters, in order to meet quotas, might encourage or even train applicants to 

respond in a particular manner (Hanson, Hallam, & Hough, 1989). 

Prior to the research presented in this paper, we used a sample of 324 receptees to conduct a 

preliminary analysis of the effects of coaching on the ASP. With the assistance of military personnel, we 

developed a short script intended to represent “realistic” coaching that might be given to an applicant. 

The coaching taught examinees how to describe themselves in order to score well on the test. They 

were also warned that the instrument contained items to detect socially desirable responding and 

therefore not to answer in ways that could not possibly be true. As expected, we found that examinees 

can, when asked, distort their responses to the ASP in a socially desirable direction. Unexpectedly, 

however, the scores of examinees who were coached and warned about faking did not differ significantly 

from those who were responding honestly. One explanation for this result is that the warning effectively 

counteracted the coaching. 

The research reported here was designed to replicate and extend these findings. Specifically, to 

separate the effects of coaching and warnings about detection, one group received coaching on 

“correct” responding without being warned about possible detection and a second coached group was 

warned about faking detection. In addition, we examined the usefulness of the ABLE’s Validity scale to 

correctly detect those respondents who were instructed or coached to distort their responses in a 

socially desirable direction. 

Subiects 

METHOD 

Five-hundred and two male receptees were administered the ASP at the U.S. Army Reception 

Battalion, Ft. Sill, OK. ihe receptees were tested in eight groups of 14-105. Participants were informed 

that the purpose of the research was to learn how different test-taking strategies affect scores on the 

ASP. 

t Presenfed at the meeting of the Military Testing Association, November, 1990. All statements expressed in this Paper are 

those of the authors and do not necessarily reflect the official opinions or policies of the U.S. Army Research Institute 


328

Instruments 

-.~--..-~---.- --... .___ -- .__.... 

The ASP is a combination of the ASAP and ABLE. The ASAP consists of 50 multiple choice 

items which are combined to yield an overall score. Responses to each item are scored l-3. with 

scoring weights to best predict attrition during the first term of enlistment. The ABLE is a 70-item, 

construct-based temperament scale comprised of three subscales to measure Achievement, Adjustment, 

and Dependability. These three subscale scores are combined with unit weights to form an overall ABLE 

composite. A fourth, the ABLE Validity scale, is used to detect inaccuracy in examinees’ responses 

caused by attempts to respond in a socially desirable manner. 

Procedure 

The design was a 4 x 2 between-subjects factorial with four l&els of instructional condition and 

two orders of test administration. One-half of the subjects within each session completed the ABLE prior 

to the ASAP and one-half first took the ASAP followed by the ABLE. The four instructional conditions 

were as follows: 

The Honest. instructions followed those developed for the proposed operational ASP. 

Participants were instructed to “pick the response that best describes your attitudes or past 

experiences.’ 

Fake Good. Subjects in this condition were told to “se!ect the answer that describes yourself in 

a way that you think will make sure that the Army selects you....Your response should be the choice that 

you think would impress the Army the most.” 

Coached-With Warning. The instructions in this condition were designed to represent coaching 

strategies that might be used to help applicants for the Armed Services score well. Subjects were told 

to, “select the answer that describes yourself in a way that you think will make sure that Ihe Army selects 

you...to make a good impression,.,answer so that you look mature, responsible, well-adjusted, hardworking, 

and easy to get along with.” In addition, subjects were told to “be aware that there are 

questions designed lo detect if you are trying to make yourself look too good. So, answer in a way that 

makes you look good, but try to avoid answering any of the questions in a way that cannot possibly be 

true.” 

Coached-Without Warninq. Subjects in this condition received the same coaching instruction as 

those in the coached-with warning group, except that no warning about items to detect faking was 

provided. 

Descriptive Statistics 

RESULTS 

Table 1 presents the means and standard deviations of the six ASP subscales and composites in 

the four instructional conditions. Overall, mean ASP scores were highest for examinees who were 

coached on the “correct” responses or instructed to fake good. Note, the mean ASP scores for 

respondents who were warned about possible detection of faking were most similar to scores in the 

honest condition. 

Effect of Test Order and Instructional Condition 

Six 4 x 2 ANOVAs were used to examine the effects of instructional condition (4 levels), test 

order (2 levels), and their interaction on the dependent variables. The main effect of instructional 

condition was highly significant bc.001) for all ASP scales. The highest E value was obtained for the 

329 

.

Table 1 

Effect of Instructional Conditions on ASP Scales 

scale 

Instructional Condition 

COACHED- COACHED- 

HONEST FAKE GOOD WITH WARNING NO WARNING 

(n=126) (n=148) (n-109) (n=lOO) 

- 

ASAP Total 114.58 (10.67) 120.50 (9.35) 117.59 (12.06) 120.80 (9.86) 

ABLE Total 142.01 (15.18) 158.33 (14.44) 144.88 (14.54) lSS.51 (15.87) 

Achievement 54.90 (7.72) 62.13 (6.53) 56.07 (6.30) 60.13 (7.13) 

Adjustment 33.06 (4.50) 36.51 (4.13) 33.36 (4.64) 35.18 ’ (4.98) 

Dependability 54.02 (5.49) 59.61 (5.39) 54.50 (6.22) 58.99 (5.63) 

Fake Validity 15.99 (3.27) 21.40 (5.50) 16.24 (3.75) 20.74 (4.77) 

,&&. The maximum sample sizes are reported. Sample sizes vary slightly across outcome 

measures. Standard deviations are presented in parentheses. 

ABLE Validity scale, E(3, 475) = 50.75, ec.001. As shown in Table 1, the honest and coached-with 

warning groups had comparable means on the Validity scale, with J$ = 15.99 and u, = 16.24, 

respectively. By comparison, the means on this scale were about one standard devration higher in the 

fake good @ = 21.40) and coached @ = 20.74) groups. None of the main effects of test order or the 

treatment group by test order interaction was significant (all ~s.05) 

Effect Sizes for Instructional Group Comparisons 

Effect sizes for the 6 possible combinations of instructional comparisons for all ASP scales and 

composites are reported in Table 2. Scheffe test significance levels for each comparison are also 

shown. 

Table 2 

E f f e c t 

Honest v. Honest v. Honest v. Fake Good v. Fake Good v. Coached v. 

Scale Fake Good Coached-W Coached-NW Coached-W Coached-NW Coached-M 

ASAP Total -4.562 -0.26 -0.58* +0.27 -0.03 -0.26 

ABLE Total -0.96* -0.19 -0.00* +o.a4* +0.19 -0.73* 

Achievement -0.90* -0.16 -0.67* +0.05* +0.30 -0.64f 

Adjustment -0.74* -0.06 -0.47* +0.60* +0.31 -0.39 

Dependability -0.91* -0.08 -0.90* +0.81* +0.11 -0.72* 

Fake Validity -1.01* -0.07 -1.4s* +0.94* +0.12 -1.20' 

Note. Coached-W = Coached with a warning about fake detection items in the test. 

Coached-NW = Coachedwithout a warning about fake detection items in the test. 

'The difference in group means divided by the pooled group standard deviation. 

� p

Overall, the results replicate the findings from our previous research. As in the earlier experiment, 

the scores of soldiers given the “fake good” instructions were significantly higher than soldiers’ in the 

honest condition. This shows that the “fake good’ instructions were effective in producing positive 

response distortion. 

Also, scores resulting from the honest and coached-with warning conditions were not significantly 

different from each other. Thus, response distortion on the ASP was reduced (but not necessarily 

eliminated) in the group given the ‘coached-with warning” instructions. The combination of the warning 

of fake detection items and instructions not to appear “too perfect” may be responsible for this 

suppression of positive response distortion. 

In our extension of the research, we also examined the effect of coaching when no warning about 

fake detection items is given. As shown in Table 2, soldiers in this condition had significantly higher 

scores than those given the “honest” instructions. However, the scores of those in the coached-without 

warning group did not differ significantly from those in the fake good condition. Thus, the aeneral 

(faking) strategy (i.e., describing oneself in a way that insures being selected by the Army) and the more 

soecific (coached) strategy (trying to present oneself as mature, responsible, well-adjusted, hardworking, 

well organized, and easy to get along with) were equally effective in producing response 

distortion. Finally, in comparison with the “coached” instructions, the addition of a warning about faking 

detection items resulted in significantly lower scores on 4 out of the 6 scales. This demonstrates that the 

warning was at least partially effective in reducing response distortion. 

Grouo Differences in Correlations Amona ASP Scale Scores 

Correlations of the Validity scale with the other ASP scales were examined in each of the four 

conditions. As expected, the lowest correlations with the Validity scale were found when examinees 

were responding honestly & = .20 to .37, all p-z.05). The highest correlations with the Validity scale 

were found when subjects were coached or told to fake in the socially desirable direction (r = .30 to .71, 

all g< .05). The correlations with the Validity scale within the coached-with warning group @ = .11 to 

SO) were generally higher than the correlations found within the honest group, but smaller than the 

correlations for the two other groups. This indicates that the coached-wi?h warning group distorted their 

responses in a positive direction, but not as much as the faking or coached groups. 

Utilitv of the Validitv Scale 

for Detectina Resoonse Distortion 

The purpose of the ABLE Validity scale is to identify individuals who have distorted their responses in 

a socially desirable direction. We examined how effective this scale would be in correctly classifying 

persons who were coached or instructed to distort their ASP responses. 

Table 3 shows how well the Validity scale discriminates among the groups, for each possible cut 

score that might be used to classify distorted responses. For example, with a cut score of 27, no one in 

the honest group would be incorrectly classified as faking (i.e., deliberately distorting responses in a 

socially desirable direction). However, this cut score would correctly classify 22% of those given the 

fake good instructions, 15% of those coached, and 3% of those coached-with warnings. Thus, all 

individuals in the fake good or coached conditions who were at or above the cut score would be 

correctly classified as fakers. Moreover, this would be done without misclassifying anyone in the honest 

group (since no one in this group had a Validity score above 26). The results also show that response 

distonion among those given the coached-with warning instructions is most difficult to detect. This is 

consistent with the finding that Validity scores between the honest and coached-with warning groups do 

not differ significantly.

Table 3 

Detection of Response Distortion Among Instructional Groups 

Dsino the validitv Scale 

Percent Percent Percent 

Response validity False Alarms Percent of Coached-W Coached-NW 

Scale Cut score (in the Honest All Fakers Respondents Respondents 

(at or above) sample) Detected Detected Detectet 

(n=233) (n=213\ In=2481 (n-100) 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

100.0 

97.4 

91.8 

75.5 

62.7 

49.4 

40.3 

28.3 

19.7 

13.7 

9.9 

6.C 

3.0 

’ “7 

1.3 

0.4 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

100.0 

99.1 

97.2 

100.0 

93.1 

89.5 

93.0 77.4 

89.7 66.9 

85.0 53.6 

79.4 

46.3 

74.2 34.6 

66.7 23.7 

59.2 16.0 

51.6 11.6 

43.2 9.6 

37.1 6.8 

29.6 

4.8 

27.7 4.0 

23.5 2.8 

21.6 2.8 

16.4 2.8 

13.1 2.0 

10.3 0.8 

8.9 0.4 

1.9 0.0 

1.4 0.0 

100.0 

100.0 

99.0 

97.0 

93.0 

86.0 

81.0 

72.0 . . 

65.0 

55.0 

46.0 

39.0 

3 2 . 0 . 

24.0 

22.0 

19.0 

15.0 

8.0 

7.0 

5.0 

5.0 

2.0 

2.0 

Note. Coached-W = Coached w a warning about fake detection items in the test. 

Coached-NW = Coached without a warning about fake detection items in the test. 

Except as noted, samples were aggregated from two experiments. 

'Sample was obtained from the current experiment only. 

DISCUSSION 

The results in this paper serve to corroborate earlier findings by Hanson et al. (1989), while 

adding new information on the effects of coaching. First, we found that the inclusion of warning 

statements about lie detection items seems to suppress response distortion to almost honest condition 

levels. The use of warning statements may be helpful to deter intentional distortion in future 

administrations of the ASP. Secondly, the Validity scale was found to be reasonably effective in 

detecting response distortion. High Validity scale scores were shown to correctly identify a substantial 

percentage of fakers, without misclassifying honest respondents. 

In addition to these findings, our results suggest that coaching instructions designed to simulate 

“real-life” coaching by recruiters may be no more effective in eliciting response distortion than general 

instructions to “fake good”. Outside guidance on distorting ASP responses may serve to motivate 

applicants to fake. However, it is questionable as to whether such guidance would make a significant 

difference in the ASP scores of applicants who would otherwise be motivated to dissemble. 

Finally, future research will examine how positive response distortion affects the validity of the ASP 

for predicting attrition. We pfan to investigate the feasibility of using the Validity scale to adjust ASP 

scores for faking. Such an adjustment might enhance the validity of the ASP in predicting attrition, as 

well as other important Army criteria. 

332

REFERENCES 

Hanson, M.A., Hallam, G.L., & Hough, L.M. (1969, November). Detection of resoonse distortion in the 

AdaDtabilitv Screenino Profile [ASPA. Paper presented at the 31st Annual Conference of the 

Military Testing Association, San Antonio, TX. 

Hough, L.M. (1987, August). Overcomina obiections to use of temDerament variables in selection: 

Demonstrating their usefulness. Paper presented at the American Psychological Association 

Convention, New York, NY. 

Trent. T., Atwater, D.C., & Abraham% N.M. (1986, April). Experimental assessment of item response 

distortion. In Proceedinas of the Tenth Psvcholoav in the DOD Svmposium. Colorado Springs, 

CO: U.S. Air Force Academy. 

. 

Walker, C.B. (1985). The fakabilitv of the Armv’s Militarv Aoolicant Profile fMAP1. Paper presented at the 

Association of Human Resources Management and Organizational Behavior proceedings, 

Denver, CO. 

333 

- --- 

.

PSYCHOMETRIC PROPERTIES OF A NUMBER COMPARISON TASK: 

MEDIUM AND FORMAT EFFECTS 

Banderet, L.E., Shukitt-Hale, B.L., Lieberman, H.R., Simpson’, 

LTC R-L., Perez’, CPT P.J., U.S. Army Research Institute of 

Environmental Medicine, Natick, MA, and ’ TEXCOM Armor and 

Engineer Board, Advanced Technology Research Div., Fort Knox, KY. 

ABSTRACT 

Researchers adapting or developing performance tasks for 

administration on personal computers are confronted with choices 

that may affect the task’s measurement properties. To eva I uate 

the effects of test medium, subjects completed Number Comparison 

(NC) tasks administered with both paper-and-pencil and portable 

computer media. Computer i zed NC proved super i or to 

paper-and-pent i I NC ; Lhe automated version had greater completion 

rates, ’ r-’ ‘Lbilities, and sensitivity to environmental 

stressors”;~ypoxia and cold). 

In a second study investigating task format, subjects were 

tested with a computerized NC task which presented either 1 or 33 

problems in each display window. Although the results were 

similar for these two formats, the response rates for the two 

formats were dependent upon the number of administrations. On 

some of the later administrations, rates for the multiple-problem 

format were 10% greater. Thus, formal evaluation during 

adaptation or development of computerized performance tasks helps 

ensure evolving tasks will possess reliability, sensitivity, and 

other useful psychometric properties. 

fNTRODUCTlON 

Testing performance capabilities with tasks automated by 

computers is more feasible today than ever before since computers 

possess better displays, process information faster, execute 

larger programs and data bases, store more information, and cost 

less. When a performance task is adapted or deve I oped for 

administration by computer, the subject’s output responses and 

the instrument’s psychometric properties may change (Banderet et 

al., 1989; Moreland, 1987). 

We evaluated the automation of a performance task in two 

studies.- In the first, an automated Number Comparison task (C-NC) 

was compared to its paper-and-pencil equivalent (P-NC). In the 

second study, the format of the display on the automated Version 

Was evaluated. Displays with a single problem were compared with 

displays with 33 problems. This report will describe the effects 

Of task medium and format upon the psychometric properties Of an 

automated NC task. 

334

Sub jects---Twen ty medical research volunteers from Fort Detrick, 

MD, and Natick. MA, were subjects for study 1. Thirty two Ml-Al 

a&Or personnei fr&n Ft. Knox, KY, participated in study 2. Al I 

soldlers participated in these studies after they were given 

physicals and were fully informed about the conditions and procedures 

of the study. Investigators adhered to AR 70-25 and 

USAMRDC Regulation 70-25 on Use of Volunteers in Research. 

Assessment Instruments---The Number Compar i son Task involves 

evaluating pairs of numbers to determine if the two numbers in 

each problem are the same or different. In the first study, 

automated and paper-and-pencil versions of the NC task ,wtre . 

studied. The paper-and-pencil task (P-NC) was generated by 

computer and printed on a laser copier. The automated Number 

Compar i son (C-NC) task was administered on a GRiD Compass 

portable computer. A subject’s response could not be changed 

after it was entered on the keyboard of the automated task. 

These assessment measures and experimental data are described 

elsewhere (Shukitt et al., 1988). 

METHOD 

In the second study, two formats of the automated NC task 

were studied. During testing, a display on a subject’s computer 

showed either 1 or 33 problems to be evaluated. The later format 

Was similar to the format used in study 1 on both versions of NC. 

Procedures ---Both studies reported in this paper were repeatedmeasures 

L designs and were incorporated into larger investigatlOnS 

with other objectives. The first was to determine if an 

amino acid, tyrosine, prevents some of the adverse behavioral 

effects i nduced by environmental stressors. Specifically, 20 

subjects were exposed to 4700 m of simulated high altitude and 

17OC for 7 h; two other occasions they were exposed to 550 m and 

22OC (baseline). The automated and paper-and-pencil versions of 

the NC task were administered 300-320 minutes after ascent with 

10 min separating their respective administrations.. Initially, 

subjects practiced the NC task 15 times and learned to perform 

quickly with

sensitivity to experimental effects, a z score was calculated 

since it reflected both the magnitude and variability of measured 

effects. 

RESULTS 

At 550 m + 22OC, performance rates were greatest for the 

automated NC task since P-NC task was 87% of C-NC (See Table I). 

Task definikiuil for the C-NC task was also better than its ImnUal 

counterpart. The reliability of administrations during the 

experimental and control conditions was greatest for the C-NC 

task. The C-NC version of the NC task was also more sensitive to 

altitude effects than the manual version of this performance task 

as inferred from z score magnitudes. 

TAB’-’ . ; i Prhrzrties of the paper-and-pencil (P-NC) and 

automatea (C-NC) versions of the Number Comparison Task 

at 550 m + 22OC and 4700 m + 15OC. 

CRITERION 

Baseline Rates 

(correct/min) 

Minutes Practice Required 

(mln/admin) 

Task Definition 

(admin 5 and 6) 

i 

STATISTIC I P-NC : C-NC 

8 

BASELINE VALUES (550 m + 22OC) 

Mean 25.13 28.78 

Sigma 8.02 8.88 

Mean 20 20 

Pearson’s r .89 .94 

Reliability Pearson’s r -81 . 91 

(550 m vs. 4700 m) 

ALTITUDE + COLD EFFECTS (4700 m + 17OC) 

Altitude Effect Mean - 5.04 - 5.26 

(change In correct/min) Sigma 5.08 4.07 

z score - -99 - 1.29 

336 

.

In the second study, the response rates for the multiple 

prob I em format interacted with administrations (i.e., days). 

Rates for the multiple-problem format were always greater than 

rates for the single-problem format; days 3 and 4 of the field 

test they were approximately 10% greater (See Fig. 1). Practice 

requirements, task definition, and task sensitivities were 

comparable for the two different display formats. The I arger 

response rates for the multiple-problem format are noteworthy 

since faster rates are usually associated with tasks that have 

superior psychometric propertles. 

C 0 

r 

e 

c t 

P er 

m in 

DAYS 

.-Q.. 33 Problems -+-- 1 Problem 

Fig. 1: Response rates for an automated Number 

Compar i son Task for a l-problem or a 33-prob I em 

display. Each task was practiced four times previously 

(5 min per administration). 

I3 I SCUSS I ON 

Automated NC was super ior to its paper-and-pencil 

counterpart. The response rates, sensitivity to environmental 

stressors, and test-retest reliabilities were greater for the 

automated version than for the paper-and-pencil version. This 

demonstrates that when performance tasks are automated, modified, 

or developed they may have different psychometric properties than 

thelr traditional counterparts. In this evaluation, the 

automated verslon of the NC task possessed the best psychometr ic 

properties. 

Secondly, It is important that performance tasks be 

evaluated during their adaptation or development. The success of 

performance tasks is usually dependent upon their psychometric 

properties (e.g. sensitivity. requirements for pract:;:; TtV$ 

test-retest reliabilities). Evaluation wi 11 ensure 

psychometric characteristics of the task can be optimized and 

that appropriate measures will be retained and used. 

337

---__- - ---.. _ 

REFERENCES 

Banderet. L.E., Shukitt, B.L., Walthers, M-A.. Kennedy, R.S., 

Bittner, A.C.. Jr., & Kay, G.G. (1989). Psychometric properties 

of three addition tasks with different response requirements. 

Proceedings 30th Annual Meeting Military Testing Association (PP. 

440-445) . Arlington, VA: U.S. Army Research Institute for the 

Behavioral and Social Sciences. 

Moreland, K.L. (1987). Computerized psychological assessment: 

What’s available. In J.N. Butcher ted. 1 Computerized 

Psychological Assessment. New York: Basic Books, pp. 28-49. 

. 

Shukitt, B.. Burse, R.L., Banderet, L., Knight, D.R.. & Cymerman, 

A. (1988). Cognitive performance, mood states, and altitude 

symptomatology in 13-21% oxygen environments (Tech. Rep. No. 

18/88). Natick, MA: U.S. Army Research institute Environmental 

Medicine. 

338

SUBJECTIVE STATES QUESTIONNAIRE: PERCEIVED WELL-BEING 

AND FUNCTIONAL CAPACITY 

Banderet, L.E., O’Mara, M., Pimental’, N.A., Ri ley, SGT R.H., 

Dauphinee, SSG D.T., Witt, SSG C.E., Toyota, SGT R.M., U.S. Army 

Research Institute of Environmental Medicine and ‘Navy Clothing 

and Textile Research Facility, Natick. MA. 

ABSTRACT 

Self-rated measures of symptoms and moods are especially 

Sensitive to stressors and often detect changes in well-being . 

before more object i ve indices (Beck, 1979). We dev? I oped a 

40-item Subjective States Questionnaire (SSQ) to exploit such 

measurement properties in our research program for determining 

the effects of extreme environments and evaluating treatment 

strategies. The SSQ assesses a greater range of reactions than 

most symptom or mood sea les and seeks est imates of a soldier ‘s 

capac i ty to perform common soldier tasks and other fami I iar 

activities or the effort required to complete them. 

In a laboratory Study of heat stress, SSQ data were collected 

during six, 135-minute test sessions. Nine soldiers gave 

verbal ratings of “how they felt at that moment” during selected 

exercise, rest, and recovery intervals. Many subjective states 

appear sensitive to these manipulations. Ratings of most capabilities 

return rapidly to normal after termination of exercise 

and heat exposure. 

INTRODUCTION 

Self-rated measures of symptoms, moods, and behavioral capabilities 

are often more sensitive than objective measures of psychological 

phenomena (Beck, 1979). The sensitivity of self-rated 

measures probably results because many phenomena can be assessed 

with self-rated instruments, human subjects can recall and integrate 

personal experiences over time, and sensory and perceptual 

systems are most respons i ve to changes in stimulation or 

activity. 

To exploit the advantages of self-rated measures in our ongoing 

research with soldiers exposed to environmental stressors, 

we deve I oped The Subjective States Questionnaire (SSQ). This 

questionnaire assesses perceived capability or the effort to 

complete a task by having the mi I i tary subject relate such constructs 

to common soldier tasks or other fami I iar activities. 

This paper describes preliminary findings with the SSQ from an 

experiment where military subjects were tested experimentally in 

a hot physical environment whi le wearing various uniform ensembles. 

METHOD 

Subjects--- Nine physically fit males (average Statistics: age, 

23 years; height, 69 in; weight, 165 lbs) volunteered for the 

339

test after they were fully informed about the conditions and 

procedures of the study (P imenta I , Avellini, & Banderet, in 

progress). 

Assessment Instruments---The SSQ is a 40-item, self-rated instrument 

(see Table I). It assesses perceived cognitive, memory, 

affective, sensory-perceptual, psychomotor, verba I, and kinesthetic 

capabilities. Many items operationally define estimates of 

such capabilities by relating them to selected common soldier 

tasks (HQ, Dept. .?rmy, 1987). For example, “I would have trouble 

running 2 m.i!zs in anything near my normal time” Some items are 

defined by relating them to familiar activities; e.g., Item.22 “I 

could remember spoken directions to a store a few miles from 

here. ” Twenty of the items in the SSQ are positive; e.g., “I 

could properly camouflage myself and my equipment.” The other 

twenty items are negative; e.g,, “If I were to drive an automobi 

le, I might commit traffic violations or cause accidents.” 

Each i tern is rated on a 6-point scale with discrete anchor 

points; i .e., “Not At al I , ‘I “Sl ight,” “Somewhat, ” “Moderate, ” 

“Quite a Bit,” and “Evtremely.” The SSQ can be administered as a 

mark-sent- ;LGtitionOait-e, as an automated questionnaire on a computer, 

or as a,\ oral survey. 

To simplify description and display of data from individual 

items of the SSQ, a I I rat ings for negat ive i terns are recoded and 

their verbal descriptions are restated positively. These transformations 

change each negat i ve item so it assesses a 

“capabi I i ty” and greater ratings reflect greater capability. For 

example, item 29 is “I would probably miss some information in 

military radio messages, without some “say agains”. During data 

analysis, this i tern’s ratings are recoded and it is restated as 

” I cou Id probably comprehend most information in radio messages, 

w l thout some “say agains”. After such transformations, all 40 

i terns of the SSQ assess “capabi I it ies” and larger ratings imply 

greater capability. 

Procedures ---Subjects exercised for 2-hours per day in a hot 

environment for 6 days before manipulation of experimental conditions. 

This avoided confounding the effects of physiological 

heat acclimation with heat strain induced by the experimental 

conditions. Acclimatizing conditions were 95OF dry bulb and 88OF 

Wet bulb (75% relative humidity) with a 2.0 mph wind. 

Then, the men were tested in a repeated-measures design to 

evaluate six configurations of a Navy firefighting ensemble with 

different heat-retaining properties (Pimental, Avellini, & 

Banderet, in progress). Each test day, each man wore a new 

configuration of the firefighting ensemble (randomly-assigned). 

Environmental conditions during each 2 hour experimental SeSSiOn 

were 9OoF dry bulb, 79OF wet bulb (60% relative humidity) with a 

2 mph wind. Subjects alternately sat for 15 minutes (metabolic 

rate 105 watts) or walked at 3.5 mph (500 watts) on a level 

treadmill. The time-weighted metabolic rate was approximately 300 

watts. 

The SSQ was administered 5, 20, 95, 110, and 125 min after 

the start of each experimental session. Each administration began 

the fifth minute of a scheduled resting, walking, resting, walki 

ng, or recovery i nterva I ; respectively. During each assessment, 

340

each item on the SSQ was read to the group by a medical NCO. Each 

subject’s ratings were sensed by a lapel microphone and recorded 

on a separate audio channel for subsequent data encoding and 

analysis. The last (fifth) administration was immediately after 

an experimental session. Subjects removed their un i forms and 

monitoring equipment and were tested in a room (at normal ambient 

temperature) 5 min after they finished walking on the treadmill. 

All data were analyzed with SPSS/PC+, V3.0. Results were 

significant if ~~0.05 (l-tailed). Data were frequently missing 

during a daily test session and often involved different subjects 

from administration to administration of an item. Paired T-tests 

were used in evaluating uniform ensembles rather than more traditional 

repeated-measures, analysis of variance statistics, since . . 

this statistic was not affected by missing values which occurred 

during another administration in a session, 

RESULTS 

These data demonstrate responsiveness of the SSQ to heat 

strain induced by metabolic heat production from exercise and 

high environmental temperatures. These data reflect average sub- 

ject responses under conditions of increasing heat storage 

induced by walk-rest activities. 

Activity-time changes on individual SSQ items dluring 

experi- 

mental sessions suggested three trends: decreased capabilities 

during the session with rapid recovery afterwards (Fig. 1). 

decreased capabilities, without rapid recovery (Fig. 2), and no 

apparent changes for some capabilities (Fig. 3). Each bar in Fig. 

l-3 has a “+I’ symbol above i t ; the hor i zontal bar on each symbol 

is the standard error of the mean for that data point. This 

report only shows illustrative data because of space limitations. 

Analysis of these data for other purposes required use of 

multiple comparisons to evaluate various configurations of the 

firefighting ensemble for different activities-times during the 

session. Table II shows i terns which appear most frequently 

affected by these conditions since these i terns were more often 

statistically significant for these comparisons. 

On some items, perceived capabilities are least during exercise, 

e.g. Item 24: “I would have trouble running 2 miles in 

anything near my normal t ime”. On other i terns, perceived capabilities 

are least during rest following exercise, e.g. Item 35: 

“I feel as good as I usually feel”. Most capabi I ities recover 

rapidly following exercise and heat exposure since values obtained 

5-10 min after the end of the experimental challenge are 

simi lar to base1 ine values. A few capabilities recover more 

slowly since they are still 

tion. 

impaired during the last administra- 

Missing data were evident for al I conditions but were more 

frequent during exercise or when subjects were approaching medical 

safety limits or feeling i I I. Although some data were lost 

because of equipment and procedural shortcomings, most missing 

data were caused by failures of the subjects to respond when they 

were uncomfortable or preoccupied with other activities. 

341

TABLE I: Actual Items on The Subjective States Questionnaire. 

l feel “overwhelmed.” 

I feel “vulnerable.” 

Right now, I could answer most promotion board questions. 

It would be more difficult than usual to understand new concepts that are being taught in a military class. 

My thinking and other mental processes are at their “max.” 

It would require more effort than usual to tell someone how to “shoot an azimuth.” 

My vision seems especially sharp and clear. 

My thoughts seem complete. 

I feel like spit shining my boots and polishing my brass. 

My body feels clumsy and awkward in this situation. 

I could complete gas mask conftience training, including unmasking in the “gas chamber” with no difficulty. 

It would take more effort than usual to complete a land navigation course. 

. 

I feel “out of touch” with my surroundings. 

I feel confused. 

I could easily play a difficult video game for 20-25 minutes. 

My thinking seems “sluggish.” 

I am having trouble remembering some things now. 

Staying in this study hardly ser IIS worth it. 

Sending a grid coordinate by radi , would require greater effort than usual. 

If I were +s,~-; _ .a Iotor vehi .le, my actions would seem “jerky” and “unconnected.” 

I could p;operiy cc L!sge myself and my equipment. 

I could remember spoken directions to a store a few miles from here. 

If I were driving an automobile, I might commit trafftc violations or cause accidents. 

I would have trouble running 2 miles in anything near my normal time. 

I can talk freely without stuttering. 

A 2-3 hour G.I. party might be difficult to “deal with.“ 

Telling even a short joke would require more effort than usual. 

If a “password” and “challenge” were changed every two hours, it might be difficult for me to remember them. 

I would probably miss some Information in military radio messages, without some “say agans. 

I feel disoriented. 

I am as aware of feelings in my arms, legs, and body as I usually am. 

It would be hard to be up for 24 hours of guard duty now. 

I could disassemble and reassemble an M-16 correctly within time limits. 

Detecting a soldier in BDUs in tall brush would take more effort than it usually does. 

I feel as good as I usually feel. 

I would confuse some of the azimuths with the directions they represent. 

I feel “ate up.” 

My memory is working as well as it usually does. 

I feel good enough to max at least one part of the PT test. 

I would find it more difficult than usual to find a landmark such as railroad tracks on a map. 

4.5 

C 

A I 

1 3.5 

6 

I 

: 2.5 

b 

1.5 

5 20 95 110 , 5 

TIME (min) , POST 

BASELINE WALK 1 REST 3 WALK 4 ; EXPOSURE 

FIG. 1: SSQ item 3,, “Right Now, I Could Answer Most Promotion Board Questions, ‘I shows decreased capability with 

increasing heat strarn with partial recovery following termination of heat exposure and exercise. 

342 

4.5 

HI 

3.5 

2.5 

1LO 

1.5 

.

I4 

BI 

k3 

; 

2 

BASELINE WALK 1 

FIG. 2: Transformed SSQ item 1, “I Feel In Control (w Overwhelmed),” shows decreased capability with increasing 

heat strain with little, if any, recovery following termination of heat exposure and exercise. 

C 

A 

5 

I4 

B I 

i3 

5 

2 L 

TIME hid 

BASELINE WALK 1 REST 3 

I 

1 POST 

WALK 4 : EXPOSURE 

FIG. 3: Transformed SSQ item 36, “I Could Associate Azimuths With The Directions That They Represent,” shows little, 

if any, effects upon capability with increasing heat strain or termination of heat exposure and exercise. 

TABLE II: Actual Items from The Subjective States Questionnaire that yielded frequent statistically significant differences 

on comparisons of firefighting ensembles. 

l 

1, I feel “overwhelmed.” 

2. I feel “vulnerable.” 

3. Right now, I could answer most promotion board questions. 

10. 

12. 

17. 

18. 

My bod feels clums and awkward in this situation. 

It woul d’ take more exoft than usus; to complete a land navigation course. 

I am having trouble remembering some things now. 

Staying in this study hardly seems worth it. 

23. 

24. 

If I were driving an automobile, I might commit traffic violations or cause accidents. 

would have trouble running 2 miles in anything near my normal time. 

30. I feel disoriented. 

37. I feel “ate up.” 

343 

5 

HI 

4 

3 

LO 

2 

. .

-- 

Acceptability of the questionnaire to our military test subjects 

was better than most symptom, mood, or personal ity questionnaires 

that we have administered before. This observation was 

supported by discussions about some items and comments suggesting 

the items were relevant to a soldier’s training and experiences. 

Subjects volunteered that items also made them think about the 

imp1 ications o f performing mi I i tary tasks in stressful situations: 

DISCUSSION 

This study explored the perceived capabilities of soldiers 

to perform common soldier tasks and other fami I iar activi,ties 

under varying degrees of heat strain. Varied human capabilities 

were affected by heat exposure and exercise. Interestingly, most 

items showed recovery even 5 min after termination of heat exposure 

and exercise. These preliminary results suggest that the 

SSQ may be usefu I in other situations which use military personnel 

as test subjects. The content of i terns fosters interest and 

cooperation, useful assets especially in challenging testing 

situations. 

Surveying a group of subjects oral ly, as was done in the 

present study, is advantageous in some exper imenta I situations, 

particularly when subjects are exercising or performing a task. 

To minimize missing data when the SSQ is administered Orally, 

Special emphasis must be given since there are fewer sanctions to 

encourage responding on each i tern than with other forms of questionnaire 

administration. 

These data demonstrate that subjects can provide systematic 

estimates of their perceived capabilities for varied tasks. Although 

this study did not validate subject estimates of their 

capabilities, the time courses of soldier capabilities appears 

plausible. Furthermore, the recovery of some capabilities in 5 

min or less emphasizes the limitations of using a “post” session 

measure to approximate “status” during an earlier stressful challenge. 

Th is observation a I so illustrates the importance of 

Sampl ing at appropriate times so that the time course of a phenomena 

can be accurately measured. 

REFERENCES 

Beck, A.T. Cognitive therapy for depression. New York: Gui I ford 

Press, 1979. 

Headquarters, Department of Army. Soldier’s manual of common 

tasks (skill level l), STP 21-I-SMCT, 1987. 

Pimental, N.A., Avellini, B.A., and Banderet, L.E. Comparison Of 

heat stress when the Navy fire fighter’s ensemble is worn in 

various configurations. Technical Report * , Navy Clothing and 

Textile Research Facility, Natick, MA, (in progress). 

344

Validity of Grade Point Average: 

Does the College Make a Difference? 

Diane L. iiomnglia, ClC 

J'acobina Skinner 

Manpower and Personnel IJivision 

Air Force Muman Resources Laboratory 

Throughout the military and private sector, undergraduate grade point 

average (GPA) piays an important role in jo’b selection decisions. 'I'his 

measure of academic achievement and demonstrated abiilty is widely field to 

predict.employee performance. Recent literature reviews show significant but 

modest relationships between GPA and employee performance both in training and 

on the job (e.g., Dye & Reck, lY88). . 

An issue raised by the use of GPA as a personnel selection factor 

concerns the possible lack of equivalence in the grade scale across colleges. 

The implication of these inequivalencies for emplojers is that expected 

performance would vary among job applicants who have tne same GPA, bc;t wuo 

graduated from different colleges. Research on this issue is sparse, but two 

studies suggest that a school factor may moderate the GPA-performance 

relationship. Dye and Reck (1988) have found correlations for graduat,es of 

the same college to be higher on average than those for graduates of ditterer;t 

colleges. Further evidence that college characteristics may influence tIie 

predictability of GPA has been reported for Air Force officers commissiniie!; 

from the Reserve Officer Training Corps (ROTC) program (harrett 6 &mstroug, 

_ 1989). Performance prediction was improved by considering a quality mec:sure 

for the officers' college (Scott, 1984) in addition to their tipA. 

The current study extends the investigation of the college and Cri issue 

in the Air Force to a second officer commissioning source: the Officer 

Training School (OTS) at Lackland AFB, TX. The findings of a two-phase st;xi> 

of the relationship between GPAs awarded to cadets graduating from ditferen? 

colleges and their subsequent performance in VI'S are reported. Yt1e study 

design was previously described by Skinner and Armstrong (lY!Nj. In tile 

analytic phase, the initial focus was on the validity or’ Gi?A as a cadet 

selector. Both simple GPA effects and the joint effect of college anu C:,A 

were investigated. If differential validity for colleges was observeo, tl:e 

study design provided for an explanatory phase to identify the characte;‘i+;~;',i.~ 

of colleges which may be responsible. 

Analytic Phase: GPA and College Relationships with Cadet Performance 

The analytic phase was conducted to answer two primary questions: i, is 

the GPA a valid predictor of OTS performance; and 2) is the relationship 

between GPA and performance moderated by college attended? 

Procedure 

Method 

Data were obtained from archival files maintaincc: on Xir ~'OTC.C oeiic:ers. 

An initial sample of 11,619 cadets who entered (jTS between l;"?jL .g..ilil i'y?:ii ;,> w,;:. y; 

identified. Source data for the primary predictor variables we:re cdc,eti' 

4-year undergraduate GPA reported on a 4.0 scale rind rile colicge i*ji;il:::! 

345

conferred their baccalaureate degree. hieasures 9i‘ cadet pt2riurbance zec5 

obtained Erom various phases of the 12-week W‘s program. Ke2sOfl for 

terminating training was used to generate a Pass/Pail dichotomy rekiectirig 

final training outcome for the total sample. Light aaditi.oLlaL measures of 

performance were available for graduates (ti = Y,Yjo>. k'irlal Course Grade was 

an overall rating of academic success in the training course obtained by 

s r;:jr 

GPA and colleges with cadet performance that were less complex than t;it? .ZIW 

hypothesized by the starting model. Possible outcomes were an interaction 

between GPA and college, but of a simpler functional form (either linear o' 

curvilinear). Alternate models provided for a joint but noninteracting errec:, 

due to GPA and college (with either a linear, quadratic, or ciibic form:!. i-1: 

these cases expected performance would differ by coilege at fixed *;fLi vaize.+. 

but the difference per unit change in GPA would be constant. I';1 e L s .-Cd._ P .c * 

complex models specified an effect due solely to GPA (Linear, qu;l(iratill. -:i.: 

cubic) or solely to college. 

To isolate the "best" model, pairs of models were C 2; :.;:L 

most appropriate model for each criterion. 

Results of GPA Validation 

Simple GPA effects were found for all criteriri except the pa.ss! c2.l i 

dichotomy. As shown in Table 1, the bivariate correlations kt;jecn :::-T .i:: 

performance criteria and GPA indicate low to medium-Low p

elationships. The highest correlation was observed for Final Course bra&e (r 

= .31 p q.01) and the lowest correlation for the Pass/Fail dichotomy (r = . ui; 

p).CSL Because the study focused on identifying a college effect for a 

specific criterion only if a GPA effect was found, the L-assiFai1 measure was 

excluded from further analyses. 

Joint GYA and college effects were found for all remaining criteria 

except the 6th week OTER. Information about both college identity and tiPA 

made a unique contribution to prediction of the cadets’ training performance. 

However, no interaction between GPA and college was detected. Expected 

training performance differed by a constant amount at all GL'A levels.for 

graduates of different colleges. The functional form of the CPA-performance 

relationship for colleges was linear for three performance criteria and 

curvilinear for four performance criteria. Figure 1 illustrates the , 

'representative finding. A curvilinear relationship between GPA and 

performance is depicted, and between-college differences in expected 

performance are shown to be the same across GPA values. 

Table 1.3 Correlations (uncorrected) 

of Criteria with GPA 

Criteriona r 

Pass/Fail .Ol 

Final Course Grade .31X" 

cwr 1 .19X" 

CWT 2 .22x* 

CWT 3 .21** 

CWT 4 .22** 

CWT 5 .18X* 

OTER 6th Week .07* 

OTER 11th Week .20** 

aPass/Fail N = 11,619. Other 

criteria N = 9,858. 

* p (.05. 

**z c.01. 

Expected CWT 5 Score 

,00 --.-..----- -.-.- 

98. 

08 

94' 

.__ _,.____ _. . 

92 + _-______ _..__ i 

-. 

_._.” * 

_... -.--.' . . ' 

SO 

88 

,---e---.. _- ..*.- -.--. -.-. 

. . . . 

88 * ---:.,:* 

L---.r ---. -- _.._ -_- 7.-..---. _- , _ , .. 

1.00 2.00 3.00 4.00 

_ _.. 

Grade Point Average 

-- School ‘A’ --+ School ‘8’ -- School ‘C’ 

Figure 1. Relationship Between Jrii 

and CWT 5 Score for Different Collepzs 

Explanatory Phase: Characteristics Which Account for College Effects 

The explanatory phase was accomplished once the results of the analytic 

phase showed that the relationship between GPA and cadet success varied by 

college. The objective of this phase was to identify variables reflecting 

the characteristics of colleges which might underlie the combined effect of 

GPA and college. Of interest was whether performance variance accounted for 

by colleges was due primarily to the talent of students (college 

selectivity) or to the nature of the academic experience (educational 

environment). Astin (1962, 1971) showed that both classes of variables can 

be used to distinguish colleges, but suggested (1972) tnat selectivity is 

the more important correlate of graduates' future performance. 

347 

4 

.

Subjects 

Method 

The unit of analysis was colleges. Eleven of the lU2 institutions were 

eliminated because data on all of the predictors could not be obtained. 

This reduced the number of colleges analyzed to 91. 

College Measures 

College Selectivity. College selectivity was defined as a measure 

which captured the prestige of the university as reflected by the talent of 

the students attracted and accepted to the college. To measure college 

selectivity, the average scores of the entering freshman classon . 

standardized tests (the Scholastic Aptitude Test (SAT) and the American 

College Test (ACT)) were recorded. In addition, the selection ratio of the 

college (i.e., percentage of applicants accepted) was computeo. 

Educational Environment. Educational environment measures reflecteti 

academic experiences provided by the university. Measures were percentage 

of graduate students, ratio of students to faculty, percentage of full-time 

faculty with PhDs, number of volumes in the library, and yearly dollar value 

of endowments. 

Procedure 

The sources of data for the college selectivity and educational 

environment predictors were various published documents reporting suct~ 

statistics (e.g., American Council on Education, 1983, 1987; The College 

Blue Book, 1987; Lehman, 1966; National Center for Educational Statistics, 

1987). Data used as criteria reflected the unique contribution of the 41 

colleges to the prediction of OTS cadet performance. l'hese values were the 

regression weights (b-weights) for the college membership binary variables 

from the "best" model in the analytic phase. 

Analysis 

Regression analyses were used to explore the relative contribution ot‘ 

the two classes of college charactristics in accounting for the college 

effect observed for the seven OTS performance measures. Two models, in 

which the b-weights for colleges were regressed on both college selectivity 

and educational environment measures (Model 1) and on college selectivity 

measures alone (Model 2), were analyzed. These models were designed to test 

the hypothesis that the variation in expected performance level observed for 

graduates of different colleges was due to college selectivity or the taletlt 

of the student body, not to the educational environment. The predictor seis 

included binary and product vectors for the SAT and ACT variables in order to 

account for the schools (N = 51) which reported only one test score, either 

SAT or ACT. The predictive accuracy of the two models was compared tising t!ie 

P statistic (p 4.01). If the models differed significantly for a criterion, 

stepwise regression analyses 'were also accomplished to identify the most 

salient indicators among the available educational environment measures. .% 

backward elimination method was used to determine which. educational 

environment measures improved predictability (p

College Selectivity Versus Educational Environment 

As shown in Table 2, the multiple correlations (K) for the college 

selectivity and educational environment measures in combination (&ode1 1) 

ranged from .45 to .66. The highest relationships (R = .60 or greater) were 

obtained for the Final Course Grade, CWT 2, and 11th week OTEK performance 

criteria, and the lowest relationships for the CWT 1 and CWT 5 criteria. 

The two classes of college characteristic predictors accounted for about 20x 

to 40% of the variance (R2) in expected performance due to college 

attended. 

Table 2. Regression Analysis Results for College Characteristics 

College Se1 & 

Academic Envir College Se1 

OTS Performance Model 1 Model 2 _ 

Criteria R RL K KL 

Final Course Grade 26 4 .42 .62 .3t5 .Y> 

CWTl .21 .42 .17 7' J 

CWT 2 .66 .43 .b3 .40 1:;ti 

cwr 3 .51 .25 .46 .21 .c14 

CWT 4 .57 .32 .53 .16 .';l;j 

cwr 5 .45 .20 .44 .2lJ . .LtJ 

OTER 11th Week .6O .jb .47 .22 3 . ly*"' 

** p

Personnel. manag~3rs 12sirig wxder.cjrG.~mtf~ CBA ri.s 3 .!ol se'lect 'on Factor 

shoul.d be cognizant that the expected futclre performance of employees ma:? 

vary as a function !:,f the college attsnfiec!. In aeerci.es . . wl th se? wt ion 

systems relying exclcsively on GPA, consideration of the se?.PctiVlty 

characteristics of: ?.nc!.!vf!Iual. instituttons 11oV.s pro:nise as the besix f'or 5 

methodology to adjust ior the col.le~:e effect. Apnc ies with ee!.ect ion 

procedures wh.ich include a measure or each applicant’s coflnitive al?!.!‘tV 

(i.e., standardized test score) Jn ar!d!tion to GPA ~.y fir;6 that the 

aptitude component cs?tures tf-rc. pcrfornancc vrlriance cllle to co?lep.e?. 

Barrett, L.E., & kxstroq, S.D. (1x9, Noventer). %tieratw effects of x?v?]. 

characteristics on tie predictive val.Idity of college ‘g&e Fint zverv (Cl?). Pqxr 

presentecf at the 31st Amxal Q&kxncc of the ?'XItw Testing: Assoclat~~, S;F7 

Antonio, 7x. 

Cman, D.K., R?irJ??tt, L.P., & !&gner, TX. (19%). Air Force r)ff?cer Tvrinirg S&w!1 

system (.4Efm-~-@MF5). F@lmk~ AFP,, lY: .v.r FmP? !3lm?! 

selectlm L 

Flesources J&oratory. 

Tr?m, A.!% (kx). (1935). Peterscm's g&k to fax-year colleges (.LW! ti.?. V.WMYF-,, 

NJ: Pc?%e?xon's Guides. 

350 

.

Introduction 

Flight Psychological Selection System - FPS 80: 

A New Approach to the Selection of Aircrew Personnel 

H.D. Hansen 

Ministry of Defense, Bonn, Germany 

The Selection of Ah-force and Navy flight personnel is a progressive process, commencing. _ . 

before the enlistment of the candidates (Phases 1 and 2) and continuing after the normal military 

training (which lasts for approximately one year) into Phase 3. 

The first Phase is a general screening of such factors as Intelligence and Leadership qualities, 

carried out in the respective Officer or NC0 Selection Centres. 

The second Phase is a preliminary flight-aptitude screening, using Computer-based psychological 

tests, grading candidates as broadly ‘Suitable’ or ‘Unsuitable’. 

The third Phase is more precise, making a final decision as to candidate suitability and further 

predicting what particular activity each candidate would be best suited for (e.g. Jet, WSO, Prop, 

Helicopter or Navigator). 

It consists of 3 weeks Navigation/Academic instruction, 1 week FPS 80 Selection and for those 

who have survived thus far, 5 weeks Plying instruction on light prop aircraft, including 18 flying 

hours. 

FPS 80 is the abbreviation for the Flight Psychological Selection System of the Aviation 

Psychology Section, Aerospace Medical Institute of the German Airforce. 

As the need was identified to improve the effectivity and reliability of the Selection System, 

FPS 80 was conceptual&d. It was then designed and a detailed Functional Specification was 

prepared, from which the required Hardware and Software was commissioned. 

FPS 80 was installed in July of 1987, from which time it was further tested and standardized. It 

was introduced as part of the selection process on the 1st April, 1990. 

In this paper, we will concern ourselves with a description and statistical evaluation of the FPS 

80 Selection system. 

An overview of FP!S 80 

All those skills which are very difficult or impossible to test in the flying part of the screening, 

need to be evaluated, and this is the principal function of FPS 80 - to determine the particular 

skills of each individual candidate. 

Such particular skills as the multiple tasks required of a WSO, speed of information processing, 

estimation abilities in formation flying, spatial orientation and visualization. FPS 80 is much 

better capable of categorizing these particular skills, than the later Flying screening. 

FPS 80 makes use of a complex simulator-like device, which provides a test-environment very 

close to actual flying. The advantage of such a device compared with an aeroplane, consists of 

the ability to make an objective measurement of candidate performance in a standardized test 

351

situation devoid of external distractions. In this way an adequate performance comparison 

between different candidates is provided. In addition a qualitative description of candidate 

behaviour may be formulated by observations during the test. 

Description of the FPS 80 Test Device 

Test position 

The two identical test positions are built to resemble cockpits. They contain a seat and the usual 

flight controls, viz: stick, rudder, flap-lever, gear-switch, and throttle. These are actual parts 

from scrapped military aircraft. In the interest of cost reduction, a stationary cockpit is used. 

Impressions of movement originate exclusively from visual inputs. 

Conventional flight instruments are depicted on an instrument panel. Three colour VDUs appear 

above the panel. These represent the view from the cockpit. The view forwards covers a 

landscape of approximately 80 kilometres square. The view includes an airfield and the 

surrounding landscape. The scale of the depicted landscape represents the cockpit current 

displacement from it; the speed of change of a display represents the speed of the cockpit and 

perspective of the objects displayed represents the current orientation of the cockpit. In this way 

a realistic impression of motion is convey to the candidate. 

In the lower third of the central VDU are displayed the following instruments: power, speedo, 

compass, horizon, altimeter, vertical speed indicator and G-meter. 

The cockpits additionally have a control and warning-panel that gives information about such 

things as landing gear (up/unsafe/down), flaps (up/down), parking brakes, stall warning. An 

input key-pad is found on the right side. The performance characteristics mirror those of a 

standard single engine machine. System parameters may be changed to simulate other machine 

types. The two cockpits operate independently of one another. 

System Configuration 

The FPS 80 system comprises 7 computers linked by a network. One of these is a central 

computer and each cockpit is driven by three more. Tests are controlled from the central 

computer console ftom where the test supervisor can start the different test programs, communicate 

with the candidates and monitor their progress. He can additionally intercept their visual 

displays, and speak with the candidates, singly or severally, by radio link. The candidate 

performance data is returned to the central computer where it is stored on tape, later to be 

processed on an external computer in combination with the results of the other screening 

procedures, to produce a composite performance profile for each candidate. The results from 

all candidates may then be statistically analysed. 

Test procedure 

The PPS 80 Test Procedure consists of 5 missions. Each candidate receives a standard briefing 

from the instructor befog beginning each mission. A mission is built up of various distinct 

manoeuvres, usually starting and ending with a take-off and landing. Every mission has three 

phases, viz: 

352

1) Demonstration Phase 

The control sequences and instruments required for each mission are first explained and 

demonstrated. An ideal mission performance is then displayed on the screens and described in 

. pre-recorded standardized form over the acoustic system. 

2) Practice Phase 

In the second phase, the candidate attempts the manoeuvm himself with assistance both from 

. the system (pm-recorded warnings) and from the instructor (optional intervention). The computer 

monitors his performance and generates warnings when it strays too far from the optimal . 

one. (Tolerances are adjustable.) Should his performance diverge unduly, the manoeuvre is 

interrupted and starts anew (up to three times). 

3) Test Phase 

There is no intervention or assistance during the test phase. The only acoustic inputs, am normal 

Controller communications. The same tolerances apply as during the practice phase, and 

automatic interruption and restart will occur in the same way. 

The candidate’s behaviour is additionally under observation during this phase from a Plight 

Psychologist, who subsequently completes an observation log of his performance. 

Description of Missions 

Mission FPS 01: 

- Introduction to the function of the video system, controls and flight instruments. 

- Taxiing, Takeoff with Abort, renewed Taxiing to “Number 1 Position”, Take-off and climb 

to Pattern-level (1000 ft AGL), Straight and level flight. 

- Turns with 20” of bank and 90’ direction change. -Turns with40’ of bank and 180” direction 

change. - Turns with 60’ of bank and 360’ direction change. 

- Automatic return flight to the airfield with landing. 


- Consists of pattern flying and landings. 

Mission FPS.03: 

- Take-off and climb to pattern level. Leaving the pattern over the NZP (Navigational Zero 

Point) to commence flight proper. 

- Navigation flight (1000 ft AGL) with location of targets and the solution of additional tasks 

(calculation of course and flight duration per leg). Finally return to airfield and land. 


- Take-off and climb to pattern level. The plane will then be automatically positioned at 6000 

ft AGL. 

- Recovery from unusual altitudes (nose-up/nose-down). The manoeuvre must be performed 

at 5000 ft on a prescribed course and within a given time interval.

- Pursuit of a leading. plane such that a given separation be maintained at all times. 

- Homing in on a target, persuit and attack of another plane. 

- Finally return to airfield and land. 


An endless tunnel appears on the screen comprising a series of concentric squares and a white 

line approaching the viewer through the center of the bottom edges. The squares appear to 

approach the viewer by diverging from the centre. The apparent speed of approach of these 

squares (which remains constant) simulates the speed of flying through the tunnel. Rotation of 

the squares transmits a sensation of banking in the opposite direction. Similarly changes in the 

relative displacement of opposite sides (left and right for the tunnel bending, top and bottom for 

the tunnel rising and dipping) create effects of the tunnel changing direction and orientation. 

These effects communicate themselves to the candidate not as changes in the tunnel however, 

but as changes in the attitude of the plane. The alignment of the squares can be restored by the 

appropriate control inputs, which in turn restores the impression of level flight. 

These effects are accentuated by examination pressure and the feeling of sensory deprivation 

caused by a closed cockpit. This is so realistic to some candidates that they experience a sensation 

of air-sickness. 

Statistical Evaluation 

The evaluation of the missions is performed in 3 steps, viz: 

1) Data compression 

2) Determination ofcorrelations between FPS missions and flight performance in the Screening. 

3) Calculation of transformed test results. 

Table 1 gives an overview of the number of variables to be processed from each mission. 

Table 1: Number of variables per mission 

Mission 01: 7 sections of 11 variables. 

Mission 02: 11 sections of 11 variables (times 3 circuits). 

Mission 03: I8 sections of 11 variables. 

Mission 04: 16 sections of 11 variables. 

‘\ 

Mission 05: 9 sections of 9 variables. c-.. 

‘llis gives a total of 895 processed variables for all 5 missions. This implies ca. 130 kByte raw 

data per candidate. Thus a condensation of data is necessary to enable evaluation. (Details of 

this condensation procedure are to be found in an exhaustive paper on the subject shortly to be 

354

published in the “Wehtpsychologische Untersuchungen”.). The condensation required 60 hours 

of processing time, and the results were stored in 7 data banks for later ease of access. 

In the second stage of evaluation, correlations were made between the individual variables and 

the results from the later Plying screening. In this way the variables best able to predict the 

results of the Plying screening were high-lighted. 

In the third stage of evaluation, based on a regression analysis of the most predictive variables 

from stage 2, a representative value for each candidate was calculated for the individual sections 

of each mission, and also for each complete mission (or in the case of mission 02, for each circuit 

of the mission). 

The validity of the values for each complete mission thus caIculated were correlated with the 

results of the Plying screening. All correlations were highly significant, but differed widely 

between missions. The fact that the first mission had a lower correlation could perhaps be 

explained by the unfamiliarity of the test environment at this early stage of FPS screening. 

In a fourth evaluation stage, the fall-out frequencies during Plying screening within groupings 

of candidates with similar PPS performances were computed. Table 2 (next page) shows clearly 

that candidates with low PPS performances frequently failed the Plying screening. 

The second mission appears to have been particularly predictive. Mission 3 and 4 show 

irregularities in the middle stages, which could perhaps be explained by the fact that some. of 

the skills being tested in these missions, do not play a part in the Flying screening. 

Those particularly at risk from the Plying screening are candidates who scored below 51 in the 

FPS (most candidates scored in the range 40 to 76). The group of candidates with the best FPS 

results ( 69 FPS points) on the other hand, had a 90% success-rate in the Plying screening. 

Conclusion 

After exhaustive statistical evaluation, it was possible to conclude that the FPS was capable of 

predicting success or failure at Plying screening with acceptable accuracy. 

A final evaluation of the success of this method of screening (remembering that the full screening 

process consists of all five stages in Phases 1 to 3, as at present Plying screening is being retained) 

will only be possible after the collection of sufficient statistical evidence of candidates subsequent 

performance in training and later operational flying. The same applies, of course, to the 

other in-flight disciplines for which PPS screening takes place. 

To date, second-rate pilot-candidates have been channclled into positions as Weapon System 

Officers and Navigators. It is hoped that the specific results of FPS missions 3 and 4 will show 

a better cormIation with subsequent candidate skills in these specialist activities. 

355

mission 1 

no of cand. 

attritions 

percentage 

mission 2 I) 

no of cand. 

attritions 

percentage 

mission 2b 

no of cand. 

attritions 

pcentagc 

mission 2c 

no of cand. 

attritions 

percentage 

mission 3 

no of cand. 

attritions 

percentage 

mission 4 

no of cand. 

attritions 

percentage 

mission 5 

no of cand. 

attritions 

percentage 

Table 2 

FPS 80 test results and attrition rates in Flying screening 

< 52 

46 

23 

50% 

43 

26 

60% 

44 

24 

55% 

Test results 

52-57 58-63 64-69 > 69 total 

50 98 120 43 387 

11 17 18 1 70 

22% 17% 15% 2% 23% 

44 71 60 56 274 

14 15 5 3 63 

32% 21% 8% 5% 2 3 % 

33 73 67 57 274 

18 8 9 4 63 

55% 11% 13% 7% 23% 

37 37 63 100 37 274 

23 12 17 10 1 63 

62% 32% 27% 10% 3% 23% 

35 23 31 34 32 155 

24 5 2 6 1 38 

69% 22% 6% 18% 3% 25% 

21 16 24 23 40 124 

13 2 3 3 2 23 

62% 13% 13% 13% 5% 19% 

19 13 25 27 30 114 

9 3 4 3 0 19 

47% 23% 16% 11% 0% 17% 

1) mission 2 consists of 3 identical patterns 

_ .

Introduction 

Leadership in Aptitude Tests and in Real-Life Situations 

A. H. Melter & W. Mentges 

Federal Armed Forces Central Personnel Office, K(iln, 

Federal Republic of Germany 

In the aptitude testing of German volunteers for officer and NC0 careers small groups 

of three or four applicants are given planning tasks to work out sequences for action or 

to organize items of information. The applicants have to produce an individual draft of 

their task solution. They prepare and give the group a short presentation of some aspects 

of the planning tasks, and they have to discuss and to decide their tasks and their 

individual solutions at a round table. 

Officer applicants must organize 

- a leisure activity, 

- a floor-plan of a supermarket, 

-the land utilization and development of a small town, 

- a school prize-giving day, or 

- a meeting place for young people. 

The rating sheet for group tasks is subdivided into four paragraphs: 

- The “written plan” section for making notes on the contents, presentation, accuracy, 

and lay-out. 

- The “short presentation” section for making notes on comprehensibility, behavior, and 

argumentation. 

- The “round table discussion” section for making notes on social interactions, plans, 

decisions,‘and behavior in changing situations. 

- The “overall rating of the group task” section for making notes on assertiveness, social 

competence and cooperation, argumentation and verbal expression, planning and 

decisiveness. 

The scale is defined as follows: 

1 Very good, obviously positive, clearly more positive characteristics than 

usually expected, 

2 Good, clearly above average, more positive than negative characteristics; 

357

3 Comp!etely satisfactory, somewhat above average, more positive than 

negative characteristics; 

4 Satisfactory, average, positive and negative characteristics are balanced, 

5 Adequate, somewhat below average, more negative than positive 

characteristics; 

6 Just adequate, clearly below average, more negative than positive 

characteristics; _ 

7 Unsatisfactory, obviously negative, clearly more negative characteristics 

than usually expected. 

The computer-assisted planning tasks and computer-simulated planning games consist 

of a comparable matrix of methods and dimensions, too (Melter & Geilhardt, 1989). As 

a rule military raters use the aptitude criteria achievement, social competence and 

cooperation, argumentation, planning and decisiveness. These concepts describe a 

range of characteristics denotable as leadership in small groups. 

The problem of behavior prediction 

Now, psychological aptitude researchers and military raters are confronted with the 

problem of whether real-life behavior in squads, platoons orcompanies can be predicted 

from task-generated behavior in artificial testing conditions. 

While the predictor situations are sufficiently described, the criteria referring to careers 

and jobs still have to be clarified to solve the prediction problem. Psychological research 

normally makes use of analyses of job demands. Such analyses produce the criteria by 

which leadership, for example in squads, platoons and companies, can be assessed by 

other military personnel (instructors, superiors) and teachers at the officer schools, at 

the universities of the German Federal Armed Forces, and in field appointments. If the 

predictors and criteria are similar and comparable, the results of such analyses will be 

reliable and valid. But if there are great differences between both situations, the 

psychological aptitude research unit and the personnel department have to look for the 

central personal constructs of the criterion situations. But neither psychologists nor 

military users are able to claim td?ave discovered them with hundred per cent reliability. 

Use of real-life situations to establish job demands 

. One approach to establish career or job demands translatable with psychological 

methods in measurements is to issue questionnaires to officers at the officer schools, 

the Bundeswehr agencies and in field appointments, In first analyses we used repertory 

grid techniques to question 25 military raters and staff officers from the Central 

Personnel Office, 17 officers from the Air Force Officer School, and 15 officers from 

the Army Officer School about their personal constructs of apt and inapt young officers. 

The aim of those studies was to hear the implicit aptitude theories of these officers about 

the new officer generation experienced in their own job environment (Mentges, 1989).

A further objective was to produce a diagnostic process model fordetermining and 

evaluating the aptitude criteria for selecting officer applicants (Behling & Neubauer, 

1990). We intend to question officers in the field, too. 

The personal constructs are defined in behavioral terms. However, the method does not 

allow to work out unambiguously, in which situation the behaviors defined has what 

kind of results, success or failure, for the man concerned. Such distinctions are only 

possible if we ask about so-called “behavior - situation - results - triangles” in real-life 

environments. This means asking about typical situations, about behavior in such 

situations and about the effects of this behavior, for example on the soldiers in the squad 

entrusted to the officer candidate for the first time in the training unit. I 

When asking about typical situations for leadership we have todifferentiate enormously. 

Firstly, the size of the military groups (squad, platoon, company) and the responsibilities 

increase during someone’s career. 

Secondly, typical situations in peace time, in periods of tension, and in war are different. 

Thirdly, we have different typical situations indoors and outdoors. Many further 

distinctions are imaginable. It is essential for our problem that while leadership in a 

small group will result in success, the same behavior might not be successful in a war 

situation. In threatening situations where prompt, precise, and right action is necessary, 

there is a need for different leadership qualities from those in situations where there is 

no stress (Cardoso de Sousa, 1990). 

Predictions for normal and dangerous situations 

All military experts concerned with such topics assume that they are unable to reliably 

predict leadership in war or to predict the character of that type of officer who would 

in fact be able to lead successfully in war simply because the speed, variety, and 

unforeseeability of events and behaviors in such crucial circumstances are beyond 

precise description and simulation (Oetting, 1988). 

On the other hand, there is some evidence that people with a certain pattern of basic 

abilities will most probably be unable to hold their own in typical situations. For the 

moment, we have left out of consideration the fact that a certain pattern of skills and 

knowledge can be generated by training and education. 

The psychological and medical assessments of such basic patterns conjoined with the 

prediction of success in typical situations are difficult enough, but the educational 

assessment of the increase achieved by training and education is incomparably more 

complicated. 

Let us take an example out of the domain of survival, The analyses of reports given by 

survivors of accidents have shown that 

- their belief in being rescued, 

-the fact that they did not panic, 

- their good morale, and 

-their will to survive,

each demonstrated in behavior, enhanced their chances of survival (Riider & Minich, 

1987). 

Of these four psychological characteristics, only morale and will-power can perhaps be 

detected in a basic assessment of volunteers. How can we assess whether morale and 

will-power of soldiers can be increased to such an extent through training and education 

so that they could survive dangerous situations. It is extremely difficult to predict such 

an “ultimate” criterion. And because of that we are unable to base a selection and 

placement model on aptitude criteria for extreme situations. 

It is by no means so that the prediction for “normal” situations are considerably less 

difficult than the predictions for dangerous situations. You only have to think of the 

quite “normal” prediction of the superior’s ratings at the end of any military training 

course, and of the many imponderable factors that can influence the aptitude and 

performance rating of an officer candidate. 

The environmental factors accompanying military operations, space missions, rescue 

operations, or sports activities can - as dangers - drastically affect the behavior of 

individuals concerned and have consequences for the life and limb of both those in 

charge and their teams, Although predictions are very difficult, psychologists remain 

under an obligation for ethical reasons to contribute to predictions by researching into 

aptitude criteria and the characteristics of poorer performance and performance enhancement 

due to training, in order to improve the selection, the training, and mission 

accomplishment with psychological methods. 

One example from the domain of sports activities serves as clarification: when dangerous 

situations in mountaineering have been analyzed retrospectively from a psychological 

view point, it has been noticed that some behavioral characteristics of the men at 

risk brought about the potential accidents of guided groups: 

- careless and technically deficient safety measures; 

- failure to give precise orders, if any at all; 

- unrealistic over-estimation of one’s technical skills and fitness; 

- euphoria or fatigue combined with decreasing attention; 

- arguments and annoyance. 

Accidents happen with increasing probability if such behavioral characteristics appear 

in the group, and if environmental factors interact in a fateful manner: The guide climbs 

a rock passage with crumbling grips and steps; the second member of the group fails to 

take adequate securing measures and at the same time chatters to the third member of 

the group without observing the guide, who for his part fails to give precise and pressing 

instructions to the group to do things right. 

The behavioral result of the leader may be a fall, if the environmental factor “loose grip” 

comes to bear, a fall which could mean the fall of the whole group because of the 

incomplete and inattentive securing, with fatal consequences for all the members. The 

guide should be advised to pay attention to the reliability of the members when selecting 

. ..~ -I. .- -. _- ._ 

360

his group, to insist on a short check of their communication and securing skills, and to 

attach importance to precise and prompt instructions during the climb. 

Results of previous job analyses 

Which criteria resulting on the one hand from surveys and on the other from real-life 

situations can be provided by aptitude psychologists for a basic assessment in order to 

get concepts and measurements of leadership in small groups of aptitude testing? 

Surveys with officers from different divisions of the Central Personnel Office (Mentges, 

1990) point in a very definite direction that can be paraphrased with 

-personal authority and the attending executive techniques, 

- assertiveness, taking consideration of the situation and of the people involved, 

- cooperation in the sense of commitment to the success of the team, 

- comradeship and carei 

- courageous and honest acceptance of responsibility. 

Anyhow, it does not include the ability to cause conflicts and to test the extent to which 

such conflicts can be endured and managed. Ideas of this kind should have been 

discharged once and for all from modem group psychology. 

References 

Behiing, A. & Neubauer, R. (1990). Eignungsmerkmale Offiizierbewerber. (Aptitude criteria for officer 

applicants). AbschluBbericht der Industrieanlagen-Betebsgesellsehaft mbH. Ottobrunn. 

Cardoso de Sousa, FJ.V. (1990). Leadership under stress: Immediate effects of the aggressive style. Paper 

presented at the I.A.M.P.S. conference. Vienna. 

Melter, A.H. & Geilhardt, T. (1989). Computer-assisted problem solving as assessment method. Proceedings 

of the 31st annual conference of the Military Testing Association (pp. 129-134). San Antonio: 

Air Force Human Resources Laboratory. 

Mentges, W. (1989). ImpliziteEignungstheorien als Bestandteil der Anforderungsanalyse im Assessment 

Center. (Implicit aptitude theories as part of job analyses for assessment centers). Diploma&it im 

Fach Psychologie an der Philosophischen Fakultit der Rheinischen Friedrich-Wilhelm.+Universitit 

Bonn. 

Mentges, W. (1990). Die Erhebung von impliziten Eignungstheorien als Beitrag zur Anfor&rungsanalyse 

filr den Offizierberuf. (Survey of implicit aptitude theories as a contribution to job analysis). Koln: 

Arbeitsbericht des Personalstammamtes der Bundeswehr. 

Oetting, D.W. (1988). Motivation und Gefechtswert - Vom Verhaltcn des Soldaten im Kriege. (Motivation 

and combat effectiveness - On the behavior of soldiers in war). Frankfurt und Bonn: Report 

Verlag. 

R&ler, K.-H. & Minich, I. (1987). Psychologie des Uberlebens - Survival beginnt im Kopf. (The 

psychology of survival - Survival begins in the mind.) Stuttgart: Pietsch. 

361

Computer-based Assessment of Strategies in Dynamic Decision Making 


Wiebke Putz-Osterloh 

University of Bayreuth 

In psychological testing, computers are used primarily as economically efficient tools to 

administer tests and to analyze and store individual data. This type of testing based on classical 

tests is not the subject of this paper, however. Instead, I intend to speak about the uses of 

computer programmes to simulate complex situations that call for dynamic decision making 

(Kleinmuntz, 1985) or for complex problem solving as Diimer defines it (1978). In the 

following, I will first discuss some reasons for complementing classical tests of intelligence 

by other methods to extend the range of intellectual demands. Secondly, I will mention three 

conditions that should be controlled if one intends to assess individual differences in complex 

situations. Then I will summarize empirical results concerning individual differences in 

problem solving strategies. Finally I will discuss some difficulties encountered in estimating 

the external validity of strategies. 

2. Reasons for expanding approaches to intelligence testing 

Classical tests of intelligence (whether computer-based or conventional paper-and-pencil) 

suffer from some common restrictions with respect to the intellectual demands they cover: 

- Test items are static: Items have to be answered independent of the answers given previously 

or to be given later. 

- Test items are transparent and well defined: Individual differences in knowledge used and 

strategies applied must be eliminated to make sure that only one single solution to each item 

can be evaluated as the correct one. 

- Intelligence is measured by the sum of the correct solutions to items that are to be solved as 

quickly as possible: Time consuming processes such as the use of heuristic strategies are not 

analyzable. 

-Answers to test items have to be selected rather than constructed: Although in real-life 

situations the rule is that one has to search for decision alternatives first and to select one of 

these afterwards, such search processes are excluded from test-intelligence. 

- Applicants often do not accept tests of intelligence as valid or fair predictors for personnel 

selection. One approach to overcome the restrictions mentioned is to assess individual 

behavior in multiple “real-life situations and exercises” as it is conceptualized by assessment 

ten ter methods. 

3. Conditions for the assessment of individual differences in decision making strategies 

The following conditions are not met or even controlled when using assessment center methods: 

- In complex situations there are multiple goals to be reached. Individual differences in decision 

making depend on specific defined goals. If individual behavior is to be assessed, the goals 

of each subject have to be controlled; otherwise the effectiveness ratings of individual 

I....__ _ ,. 

362

ehavior will be invalid. This condition is violated in unstandardized group discussions and 

in role-taking games. 

-In complex situations different strategies are possible, leading to different outcomes. Therefore, 

data on strategies should give more information about individual differences than 

performance scores alone would; otherwise the analysis of performance alone would suffice. 

The use of assessment center data on strategies and performance are interrelated. 

- Analyses of strategies are most informative if the data are not predicted by classical tests or 

other performance scores. They are useful if they are generalizable to other situations. 

4. Empirical studies using simulated dynamic situations* 

4.1 The simulated situations and their demands 

In our empirical studies two different simulated situations are used. The first system simulates 

a small industrial company which produces and sells textiles. The system consists of 24 

variables, of which 11 input-variables can be changed directly by decisions made by the 

subjects, including the volume of raw materials to be bought, the selling prices, the amount of 

advertizing, the number of workers, etc. Subjects are asked to aim at three goals while 

controlling the system: 

-to make as much profit as possible, 

-to increase the company’s capital from beginning to end, 

-to pay the workers the highest wages possible. 

These three goal variables are used in combination to rate the performance in system control. 

The subjects are asked to control the system for 15 simulated months and to decide what 

changes should be made in what input variables. As the experimenter operates the computer, 

the subjects have to ask questions about the actual state of the variables, and to communicate 

their decisions to the experimenter. So, while the subjects are thinking aloud, data can be gained 

in quite a natural manner. _ 

The second system simulates a forest region that is to be protected against fire. The subjects 

are asked to take on the role of a fire chief, giving different commands to 12 fire fighting units 

(using a mouse). The forest, the units, and the fires are displayed on a graphics terminal in front 

of the subjects. The goal is to minimize the area that is burned down (see also Brehmer, 1987). 

Again, the same criterion is used to rate performance. The system is to be controlled for 100 

time intervals which maximally last one minute each. 

Despite differences in the mode of control the two systems have the following demands in 

common: 

* This research is supported by grants from the Federal Ministry of Defense in Bonn, Federal 

Republic of Germany 

363 

. .

(a) The systems are complex: This means that they contain many variables that are interconnected 

by a relational network rather than by single unidirectional relations. Given one input 

change, the network between the variables causes not only a main effect but several side effects 

that also have to be taken into consideration. 

(b) The systems are nontransparent: This means that the relational network connecting the 

variables one to another is not shown to the subjects. Therefore the subjects have to generate 

hypotheses about the effects of their decisions, which they should then test against the feedback 

data. 

(c) The systems are dynamic: This means that the variables change their state over time,.even 

if there is no input change. As a consequence, the effects of input changes differ depending on 

the actual system states. 

(d) The systems are meaningful: This means that the variables and their interrelations are 

implemented in a system to correspond to a domain of reality. The subjects can use their 

domain-related knowledge to generate hypotheses. 

4.2 Control strategies and derived measures 

Due to the differences between these demands and the demands of test items, it is to be expected, 

and is substantiated by empirical data, that performance in system control is not predictable by 

intelligence test scores (see Dijrner & Kreuzig, 1983; Darner, 1986; Funke, 1983; Putz-Osterloh, 

1981). 

As DSrner (1986) argues, strategies in system control are determined by a superordinate type 

of intelligence, the so-called “operative intelligence”. This type of intelligence refers to the 

construction and adaptive use of, and control over, subordinate processes such as information 

gathering, hypothesis testing, planning, and decision making. 

Different parameters of individual strategies are combined to evaluate the control over 

subordinate processes, e.g. the frequency of correct verbalized hypotheses (correspondent to 

system reality) and the rareness of false or irrelevant ones. These parameters are analyzed and 

summed up over subsequent time intervals to evaluate control and adaptation over time. 

In the following, two examples of complex abilities which can be diagnosed from decisionmaking 

and from thinking-aloud data are defined and operationalized. 

Ability to organize 

High organizing ability is defined by the frequency of prospective decisions to prevent 

undesirable system states, the rareness of false decisions, and the coordination of different 

decisions to reach more than one goal. 

In the economic system, the following parameters are combined: the rareness of isolated 

decisions, the frequency of central decisions which directly influence one goal variable, and 

the frequency of coordinated (in relation to the goal variables) decision patterns over time. 

364

In the fire fighting system prevention is realized by the number of units distributed over the 

area before a fire is seen. False decisions mean forgetting to let the units search for and put out 

fires by themselves. 

Coordination is measured simply by the number of changing commands in the face of new 

fires throughout the game. 

Ability to decide 

A high degree of decision-making ability means the capability to plan in a goal-directed manner 

and to realize decisions quickly and precisely. 

The following aspects are combined in the economic system: The time to control the system, 

the frequency of postulated correct effects of decisions, and the rareness of decisions that do 

not work in the system. 

In the fire fighting system, the speed and accuracy of decision-making are rated in combination. 

This means that the number of new fires that are dealt with in precise commands are summed 

up and weighted by the average time lag between the time of the fire and the time that the 

corresponding command is given. 

4.3 Empirical results 

4.3.1 Estimates of reliability 

Concerning the economic system retests between different trials of system control do not seem 

to be appropriate. Here we can expect content related changes in strategies that may influence 

performance without being attributable to a lack of reliability. Empirical results from two 

studies show stability of medium-level strategies, either accompanied or not accompanied by 

stability in performance (see,Strohschneider, 1986; Funke, 1983). 

In the fue fighting system, content-dependent changes in strategies are not to be expected. In 

one experimental study (N = 50 university students) two versions of system parameters were 

constructed which differed in the number and timing of new fires. The subjects had to control 

each version for three trials in one session each. The correlations are lower between the first 

set of trials than between the second set. Between the last two trials in the second version, all 

correlations are higher than .80, referring to performance as well as to organizing and 

decision-making ability. In a second study (N = 80 university students), one system version 

had to be controlled for four trials. Performance data between the last two trials are correlated 

.84, whereas organizing and decision-making ability is correlated .79 and .76, respectively. 

These data on stability are accompanied by significant gains in performance as well as in 

strategies from the first to the last trial. This is equally true for both studies. 

4.3.2 Data on internal validity 

As has been mentioned above, it should be tested whether the subjects are aiming at comparable 

goals. If the goals are only vaguely defined, the subjects will probably define different specific 

goals for themselves. Consequently, in our studies the subjects are given specific goal variables 

. .

“Fzw%mmr-.-.‘~ . 

which should be influenced in a specified direction. The objectively defined performance is 

correlated with subjectively rated success after system control. In two studies all correlations 

are highly significant: For the economic system the correlation is .52 (N = 100) and .48 (N = 

48), and for the fire system it is .75 (N = 50) and .79 (N = 80). 

An important characteristic of performance in system control refers to its partially ambiguous 

meaning as performance level is not equivalent to a specific strategic variant. Following these 

arguments, the internal validation of identified strategies should be the proof that these 

differences are systematically related to performance level, whereas different strategic measures 

shouId not be correlated too highly. In our studies there is clear evidence that differences 

in strategies are systematically related to performance. Between studies there are differences 

in the amount of common variance. In the economic system, decision-making and organizing 

ability is correlated positively with performance, but this is not always significant (decisiveness: 

.28 and .20; organizing ability .50 and .13; N = 48, N = 100). For fire fighting the 

correlations are higher: decision-making ability with performance .58 (N = 50) and .45 (N = 

80); organizing ability .58 (N = 50) and .63 (N = SO). 

Further questions aim at the relation between the two strategic measures. In the economic 

system, no systematic correlation between decision-making and organizing ability is found (N 

= 48, N = loo), whereas in fire fighting there is either no relationship at all or no more than 

9% common variance between the two measures (N = 80; N = 48). Finally, the generalizability 

of strategies and performance between the two systems was also tested. In two independent 

studies, one group of subjects first controlled the fire system and then the economic system, 

while the other group worked the systems in the reverse order. A sequence effect is replicated 

in the two studies: If the subjects control the fire system first and the economic system 

afterwards, differences in performance as well as in organizing ability are correlated systematically 

between the two systems (N = 25; N = 30), whereas no systematic correlations are 

found if the systems are controlled in the reverse order. Despite some ambiguities in interpreting 

the sequence effect, these data give evidence of the generalizability of strategies in system 

control. 

4.3.3 Dara on exrernal validity 

If the systems do represent valid dynamic situations, experts in the simulated domain of reahty 

should do’ better in controlling a system than novices do. As is to be expected, in two 

independent studies (Putz-Osterloh, 1987; Putz-Osterloh & Lemme, 1987), university professors 

(N = 7) and selected postgraduate students in management science (N’= 22) systematically 

used more efficient strategies and achieved better performance scores in controlling the 

economic system than unselected students (N = 29) did. For the latter subjects, the intelligence 

test scores were controlled; they are not correlated with success in system control. 

Following the logic of this expert-novice paradigm, in a further study, field-grade officers 

(participants in a command and staff course) (N = 27) were compared with unselected students 

(N = 30) in controlling the fire system. 

Against expectations, no systematic differences between the two groups were found. DO these 

negative results falsify a possible external validity of the system to predict success in higherlevel 

military careers? There are two arguments that make me inclined to respond to this 

question with a negative answer. First, the military subjects are not homogeneous with respect 

. -. 366

. . 

. -.-- -.-I 

to their decision-making behavior. Instead, in some parameters they show greater variance than 

students do. Second, some military subjects reported after the tests that they used the commands 

in accordance with their specific military education, and that use of such knowledge hinders a 

successful control. In contrast to this, other subjects did learn the specific conditions implemented 

in the fire system, and they did well. These data shed light on the different demands 

of the fm system, depending on the specific knowledge used while controlling it. Further 

investigations are needed to specify system demands and the strategies required to deal with 

them successfully. 

5. Conclusions 

(1) There are individual differences in intellectual abilities that are not covered by the usual 

intelligence tests. These differences may be of significance for personnel selection. 

(2) There are strategic differences in system control that are related to performance; they are 

reliable if the subjects are allowed to control a system in repeated trials. 

(3) Simulated systems realize compIex demands that are standardized and replicable. Therefore, 

systems offer great advantages over standardized group situations. 

(4) Besides some evidence of the external validity of strategies and performance in system 

control further theoretical and empirical work needs to be done to specify the demands of real 

life situations and their correspondences with system demands. 

(5) Far from being able to predict precisely what strategies in system control imply for behavior 

in real life situations, I consider the reported approach to be worth further pursuit. 

References 

Brehmer, B. (1987). Development of mental models for decision in technological systems. In J. Rasmussen, K. 

Duncan, & J. Leplat (Ed%), New Technology and Human Error (pp. 11 l-142). Chichester: Wiley. 

Domer, D. (1986). Diagnostik der operativen Intelligenz. Diagnostica, 32,290-308. 

Domer, D. & Kreuzig, H.W. (1983). Problemlosefahigkeit und Intelligenz. Psychologische Rundschuu, 34, 

185-192. 

Domer, D. & Reither, F. (1978). iiber das Problem&en in sehr komplexen Realititsbereichen. Zeitschrifr fiir 

experimentelle und angewandte Psychologie, 25,527-551. 

Funke, J. (1983). Einige Bemerkungen zu Problemen der Problemliiseforschung oder: 1st Testintelligenz doch ein 

Pradiktor? Diagnostica, 29,283-302. 

Kleinmuntz, D.N. (1985). Cognitive heuristics and feedback in a dynamic decision environment. Management 

Science, 31,680-702. 

Putz-Osterloh, W. (1981). Ijber die Beziehung zwischen Testintelligenz und Problemlbseerfolg. Zeitschrijifiir 

Psychologie, 189,79-100. 

Putz-Gsterloh, W. (1987). Gibt es Experten fiir komplexe Probleme? Zeitschriftflir Psychologie, 19.5,63-84. 

Putz-Gsterloh, W. & Lemme, M. (1987). Knowledge and its intelligent application to problem solving. The 

German Journal of Psychology, II, 286-303. 

Strohschneider, S. (1986). Zur Stabilitit und Valid&t von Handeln in komplexen RealiBtsbereichen. Spruche & 

Kognition, 5,4248. 

367

A special Approach in Assessment- based Personnel Selection 

G. Rode1 

German Naval Volunteer Recruiting Centre 

Wilhelmshaven, Federal Republic of Germany 

Introduction: 

Due to the lower birthrates in the past and the politically based effects in the present, the FRG Artied 

Forces have to deal with shrinking numbers of volunteers. The German Navy ’ s efforts to exploit personnel 

resources is especially focusing attention on draftees. 

To become a temporary career volunteer in the Federal German Navy there are three different ways of 

enlistmenl: 

The first way involves civilian volunteers applying to the Naval Volunteer Recruiting Centre (NVRC), 

where their aptitude for a temporary-career enlistment is tested (selection) prior to their placement in 

the Navy, 65% of all temporary-career volunteers enter the Navy using this way via the NVRC. 

The second way is the recruitment of conscripts serving in field units. About 10% of the temporarycareer 

volunteers are recruited in lhis way. 

The third possibility of becoming a temporary-career volunteer in the Navy is through the.so-called 

“Information and counseling campaign” (IBA). I would now like to give a more detailed account of this 

model of recruitment. 

The IBA completes the quarterly temporary-career volunteer requirements which have not been met 

by NVRC and at troops level. If the NVRC enlists a great number of volunteers for a specific quarter, 

the complementary recruitment requirements to be met by IBA are correspondingly smaller. Thus, the 

number of volunteers that have to be recruited by IBA are subject to fluctuations. Usually, the percentage 

lies between 25 and 35 of the total need of temporary-career volunteers. 

In this context I would like to give you some figures underlying the importance of the IBA for the German 

Navy: 

The Navy has a strength of approximately 29.000 soldiers (without officers), of whom 3000are in their 

basic military service, 16.000 are temporary-career volunteers and 7.800 are regulars. 

Every year, about 1,000 soldiers are recruited by IBA as temporary-career volunteers. This amounts to 

a quarter of the annual requirements. 

The military training system of the Federal German Navy presents great advantages for the realisation 

of such a recruitment campaign. Training is provided centrally at only nine training centres - so-called 

schools -, and the maximum distance between these centres beeing 500 kilometres. All Navy training 

courses are held at these training centres. These training courses permit us a focused approach to all 

students for the purpose of recruitment, examination and placement. 

Another advantage is the central personnel management in the Navy under the responsibility of the 

Navy Enlisted Personnel Office, which keeps us informed about the specific requirements of the Navy 

for every quarter. In this way, we can steer the applicants for a placement in specific tasks or jobs. 

368

In the Federal Armed Forces, this campaign is unique, and feasible only in the Navy for the reasons I 

have explained earlier. 

The system of NVRC selecting personnel from the field units for extended military service in the Navy 

has already existed for more than 22 years. However, until three years ago, the field units had not been 

involved directly in this selection procedure. This task had been performed exclusively by NVRC. 

That means that the field units were not sufficiently concerned about recruitment, counseling and 

selection of new personnel, leaving this task to other navy institutions such as the Navy Enlisted Personnel 

Office, the Naval Office and the NVRC. 

This campaign is of particular importance especially now, in a period marked by a drop in personnel 

owing to age groups with declining birth rates and to a lack in motivation and insight in the necessity of 

armies in the face of the detente in West-East relations. This new procedure should lead to an active 

participation of superiors in filed units as multipliers in this process of enlisting, counseling and recruiting. 

Recruits are approached for a temporary-career enlistment already in the second month of their 

basic training. 

As the readiness for volunteering for a temporary-career enlistment is the greatest especially during 

the fist four months of basic military service it is absolutely necessary to conduct IBA during this 

period. 

Therefore, testing takes place in situ at the basic training unit. 

Method: 

During the first phase of IBA, officers go to the nine basic training garrisons every quarter year in order 

to recruit (advertise), volunteers for enlistment in the Navy by means of film, lectures and counseling, 

and to inform them about military and vocational possibilities. 

During the second uhase, psychologists go to the different garrisons two weeks later in order to examine 

the recruits who, during the first phase, have shown an interest in a temporary-career enlistment. 

Under the stipulations of the new procedure governing the recruitment of the suitable personnel for 

the forces, the task has to be performed jointly by the NVRC and the forces. 

The NVRC psychologists have been entrusted with this task for reasons of ensuring the application of 

uniform standards to the evaluation of applicants with or without prior service concerning their aptitude 

for a temporary-career enlistment and because of the fact that these psychologists have many 

years’ experience in personnel selection testing. 

The psychologist as well as the superior in the unit are directly and equally involved in the responsibility 

for the recruitment of personnel. 

By including the forces, this new methodology also takes into account the fact that the validity, i.e. the 

quality of a statement on the aptitude of a person increases considerably if the persons concerned is 

evaluated separately and independently as compared to those cases when observation, examination 

and decision are made jointly and simultaneously. 

For an evaluation of the applicant during the psycho-diagnostic interview, the following documents are 

available to the psychologist: 

369

Medical certificate (exclusions from certain assignments) 

Aptitude test results 

General application documents 

School reports 

Testimonials 

curriculum vitae 

First the new superior will be initiated in the procedure and trained as a rater by the psychologist. Under 

this personal responsibility and independently he will observe, judge and evaluate the military conduct 

of the applicant’s military qualification for a temporary-career enlistment. 

The superior’s aptitude statement must have been completed independently before the psychologist 

starts the aptitude test based on following documents: 

1. 

2. 

3. 

4. 

Application documents. 

School reports and reports of professional performance. 

Declaration of pending proceedings and financial liabilities. 

Section and platoon leaders make contributions to an efficiency assessment based on their observations, 

judgements and evaluations of the military conduct in the following military areas 

of activity: 

In general and specialized instruction 

in practical technical training 

in hand weapon training 

during drills 

in physical training 

in march training 

in field training 

Based on the contributions to an efficiency assessment and on his own conclusions from a personal interview 

with the applicant, the superior has to evaluate the following aptitude characteristics: 

- devotion to duty 

- comradeship 

- technical abilities 

- self assertion 

From the documents and the results of the psycho-diagnostic interview, the psychologist evaluates the 

aptitude characteristics: 

- initiative 

- motivation to perform 

- articulateness (verbal comprehension and expression) 

- judgement 

The characteristics “sense of responsibility” and “performance under stress” have to be judged by both 

the psychologist and the superior. 

Four gradations are the superior’s and the psychologist’s disposal for their recommendations: 

After psychologist and the superior have made their evaluations independently, this commission 

prepares a joint decision on acceptance or rejection of the applicant for a temporary-career enlistment. 

Then it is the psychologist task to determine a suitable placement for the applicant and to discuss it in 

detail. 

370 

.

Evaluation of the counseling nrocedure 

For the time beeing, a long-term investigation concerning the validity is not yet available as the new 

procedure exits only three years and soldiers have not held their posts in the units long enough to see 

whether the counseling procedure has proved its worth. 

However, a comparison of the different recruitment procedures of the NVRC and the IBA already 

permits a stalement on the quality of the new procedure. In this case, NC0 training course results obtained 

by soldiers that have been recruited for the Navy by NVRC and those recruited through IBA 

can be compared. 

It was to be expected that the results of the training course would not differ significantly as the 

psychologists involved are the same in both cases, and they can make use of their many years’ experience 

of test methodology. 

However the results also confirm the application of uniform standards to both procedures. 

A further confirmation of the new IBA procedure is the opinion poll about the personal involvement 

and the acceptance of the new procedure of the IBA has the following results: 

Out of 84 superiors only 3 officers had a negative or indifferent opinion about the new way IBA is 

practised now. 

The absolute figures of the results of the last years to be compared makes no sense because of the 

decreasing number of applicant for volunteering in the Navy. But if you look at the relative frequencies 

the new procedure succeeded. The percentage of the enlistment rate was in 1986 about 77.8 %. This 

rate increased 1989 up to 90.4 %. The difference is statistically not significant. It means however to the 

Navy that the absolute number of enlistments is nearly constant in the last years, though the total numbers 

of applicants is reduced. 

Conclusions: 

I cannot judge whether this procedure designed to recruit suitable personnel for the Navy can also be 

transferred to other Navies. According to the Naval Staff, this procedure has proved its worth in the 

German Navy. The forces and their superiors themselves feel that they are more actively involved in 

the process of recruiting and therefore they intensify their counseling efforts for individual applicants 

thus acting as multipliers. 

371 

.

_.-.- - . . -.-_--_ . .- .~ 

--...T 

TROUBLESHOOTING ASSESSMENT AND ENHANCEMENT (TAE) PROGRAM: 

TEST AND EVALUATION RESULTS * 

Paper Presented by Dr. Harry B. Conner, 

Navy Personnel Research and Development Center, San Diego, CA 921528800 

32nd ANNUAL MILITARY TEST& ASSOCIATION CONFERENCE 

November f&9,1990, Orange Beach, Alabama 

Nauta Rack- (1984) reported on a number of difficulties associated with the U.S. Navy’s ability to 

maintain fts weapons systems. He reported the costs of poor performance of maintenance personnel and 

recommended areas re uiring investigation if performance of these personnel was to improve. At about the 

same time at the Navy 8 ersonnel Research and Development Center (NPRDC), we determined that one of 

the difficulties we had encountered in the test and evaluation of an ongoing project (the Enlisted Personnel 

Individualized Career System-EPICS) was we had no way of comparing maintenance personnel in the most 

important aspect of their erformance: troubleshooting of the hardware system. We realized that we needed 

an objective way to eva P uate personnel performance in the skill of troubleshooting. A, literaturessearch 

supported the contention that most research and development efforts in this area start with a premtse of a 

known expert, journeyman/master, or experienced troubleshooter when in fact these are defined rather than 

empirically determined. Therefore, we concluded that efforts to improve maintenance personnel 

troubleshooting performance were futile until we could empirically and objectively define how a good 

troubleshooter performs. 

Aoproach. We addressed this evaluation issue first with a feasibility study (Conner 1988, 1987) followed 

by a more structured investigation the Troubleshooting Assessment and Enhancement (TAE) program. The 

TAE objective was to design, develop, test, and evaluate a low cost troubleshooting evaluation capability. 

The model (Figure 1) we used in our investigation shows that maintenance is just one of a number*of 

activities associated with a hardware system. Within the area of maintenance, one can perform preventattve 

or corrective maintenance. Within corrective maintenance, one troubleshoots or repairs. Specifically, we 

focused on the skill of troubleshooting, which we considered to be a skill of problem solving requiring abstract 

conceptualization capabilities. 

HARDWARE SYSTEM INTERACTfONS 

I I I 

CONSTRUCT INSTALL OPERATE MAINTAIN 

I 1 

PREVENTIVE-- . CORRECTIVE 

. .._____.... ^_ 

MAINTENANCE MAlNTtNANUt 

I 

I I 

+ TROUBLiSHOOTlNG REPAIR 

Figure 1. Hardware Activity to Troubleshooting 

With 25 subject matter experts, we developed a list of factors to be used to evaluate the proficiency of a 

troubleshooting technician in a high tech environment; that is, systems having state-of-the-art electronics and 

computers requiring troubleshooting. Next, we sent our initial factors list with definitions (shown in Table 1) 

to 1200 operational hi-tech personnel for ranking. The results were then weighted by a jury of experts (on the 

system under investigation). Once the factors were weighted, a scoring methodology was developed. Table 

2 provides the results of the factor development, weighting, and TAE scoring scheme. Our literature search 

caused us to add a tenth factor: redundant checks. 

:: 

3. 

4. 

2 

i: 

9. 

10. 

Rank Factor 

Solution. 

Cost (Incorrect Solutions). 

Time. 

Proof Points. 

Illogical Approaches. 

Invalid Checks. 

Out-of-Bounds. 

Test Points. 

Checks. 

Redundant Checks. 

TABLE 1. Factor Definitions 

- 

Definition 

Problem is correctly solved; fault is identified. 

Number of Lowest Replaceable Units (LRUs) incorrectly identified as faulty. 

Total minutes from login to logout taken to find the fault. 

Test points that positively identify LRUs as faulty. 

Inappropriate equipment selection. 

Inappropriate test at appropriate test point. 

inappropriate test point was selected. 

Total number of valid reference designator tests. 

Total number of tests performed at all test points. 

Same test performed at same point during the episode. - - - - 

� The opinions expressed in this paper are (hose of the author, are not official and da not necessarily reflect Ihe vim+s Of the Navy mPadme”t 

372 

. 

/

TABLE 2. Ranking, Weighting, and Scoring for Troubleshooting Evaluation Factors 

Rank Factor Weight Scoring Scale 

(Max Points) 

Scoring 

(Per event) 

1 

2 

3 

4 

Solution 

Cost (Incorrect Sol) 

Time 

Proof Points 

42.78 

13.13 

11.80 

9.88 

‘EZ 

20:62 

17.23 

-100 For Fail to find 

-0.5 X ea NFR LRU 

-0.5 X ea Minute 

- % X ea Proof Pt missed 

i 

Illogical 

Invalid 

Approach 

Checks 

6.87 

4.68 

12.01 

8.18 

-6.0 X ea Illogical App 

-0.8 X ea Invalid Check 

5 

Out of Sounds 

Tests Points 

4.00 

3.21 

8.99 

5.61 

-0.6 X ea Out of Sounds 

4.5 X # of Tests 

Q 

10 

Checks 

Redundant Checks 

3.08 

tbd 

5.38 

tbd 

-0.5 X # of Checks 

to be analyzed 

Scoring is designed to discriminate between levels of troubleshooting proficiency: failure to solve the 

roblem results in a score of 0, while solving the problem results in a score of 100. There is no partial score 

Por factor 1. Ability to discriminate between levels of troubleshooting proficiency is in scoring of the remaining 

factors. Wei hts for the factors were converted into a scale equaling 100 points. The final score for each 

subject equa Bs 

100 points minus the sum of points lost for each factor. The minimum score is 0: that is, no 

ne atfve scores. The scoring criteria for each factor, also shown in Table 2, are the wei hts that were used 

In tfe TAE epfsodes to evaluate and diagnose troubleshooting proficiency levels. The cost9actor 

was changed 

to incorrect solutions to more accurately describe the actual behavior. 

Once we had determined factors and scoring scheme, we selected and constructed practical 

troubleshooting episodes that provided a valid representation tionof the hardware system being used in the 

study. Our hardware system was the U.S. Navy s communications system, the Navy Modular Automated 

Communications System/Satellite Communications (NAVMACS/SATCOM). To construct TAE 

troubleshooting episodes, we focused on the fault diagnosis/problem solving behaviors (Table 3) that military 

schools have identified in their six step troubleshooting process (Conner 1986,1987). 

\ 

TABLE 3. Six Step Troubleshooting Process 

1. Symptom Recognition 

2. Symptom Elaboration 

3. Probable Faulty Functions 

4. Localizing Faulty Function(s) 

5. Isolating Faulty Circuit 

6. Failure Analysis. 

Although the design and delivery of the troubleshooting episodes did not require a computer, the amount 

of data made it obvious that the only efficient and cost effective approach would be utilization of microcomputer 

delivery and data gathering. Also, to keep developmental and hardware costs down, we limited ourselves to 

using off-the-shelf technology. We also reduced the “troubleshooting universe” of the episodes so that a 

standard microcomputer memory could handle data. 

The model developed for the troubleshooting actlviiy on a iven piece of hardware (shown in Fi ure 2) 

9 

provides a TAE Factors Model for “System Troubleshooting.” he modd works as follows: Once a8ystem is determined to be inoperative, the fault symptoms reduce the universe of type and location of tests to be 

made to a reasonable spectrum for further Investigation; that is the symptoms bound the problem and 

establishes what is in or out of bounds. This bounding of the problem reduces the number of tests in the 

spectrum to reasonable number and limits the amount of computer memory necessary. We called the “in 

bounds” checks that are not logical for the fault symptoms “illogrcal approach.” For a given set of symptoms 

for a given fault, there is an optimum troubleshooting path to determine the problem. To prove a component, 

or unit, is bad a number of tests must be performed; this requires testing of the “proof points.” 

. . 

IroubleshootlW 

-!+DrK~ 

Jrcub @shoctlna 

f Faull 0 llbgi~l Approach 

0 Optimum Path 0 Out Of Bounds 

0 Proof Points 0 In Bounds 

Figure 2. TAE Factors Model 

373

The goal in the TAE testing is to find and replace the LRU. Subjects begin TAE testing by reviewing a series 

of menus of symptoms, Panels, and diagnostic information; next they select equipment to be tested and 

conduct tests or replace a LRU. 

ch Hy@heseg, The 20 hypotheses for the TAE Test and Evaluation were organized into seven 

categories: experience, electronics knowledge, electronics performance proficiency, difficulty level, time, 

complex test equipment, and ranking. The hypotheses in each category, and method of testing each, are 

described in the following sections. 

METHOD 

Test Administration Procedu & Testing was conducted by NPRDC personnel in a classroom at the 

Advanced Electronics School: Department (AESD), Service Schools Command, San Diego, California. 

Testing was on the Zenith 248 microcomputer. Technical documentation for the hardware system was.in the 

classroom. Subjects were assigned randomized test sequences to protect from test order effects. Srxteen 

episodes were administered to each subject and each episode required about an hour to comPlete, but 

subjects had no specific time limit. Subjects completed all episodes in two to three days. The admrnrstrator 

was present in the classroom during testing. Subjects listened to an introduction to the TAE study and the 

technical documentation available; read and signed a Privacy Act release statement; and completed a 

computerized Learn Program, 2 practice and 14 test troubleshooting episodes. After testing, subjects 

received test performance feedback and completed a critique. 

Subjects. Subjects for the TAE test and evaluation were students in the “system” phase of the maintenance 

course and the system qualified instructors. All subjects were required to have school training on the 

subsystems. 

Data Data were collected for 53 students and 13 instructors in two data bases, using a standard 

statistical Package for analysis. The first contained demographic data: the second, performance data. Data 

were collected for seven classes of students between April and September 1989. Demographic data for each 

student included: SSN, time in service, Armed Services Vocational Aptitude Battery (ASVAB) scores, school 

subsystem scores, school corn rehensive score, school final score, class ranking, TAE ranking, and instructor 

ranking. Demographic and TAI! performance data for instructors were collected during September 1989. The 

demographic data for instructors included SSN, rate/rating, time in service & paygrade, time system qualified, 

time working on the system in the fleet and as a system instructor. The TAE program data for both students 

and instructors consisted of scores for 16 episodes encompassing 673 variables. Table A-l describes the 

variables for each episode (Episode 1 is presented). 

Data files were refined and evaluated. Data for five students were dropped due to missing data, and for 

two instructors due to lack of system qualification. Thus, the data of 59 subjects were used for this study, 48 

students and 11 instructors. The resultant data base were used to create files for testing the study hypotheses. 

The master file was used to create files with variables specifically required to test each hypothesis. The 

methods for testing the hypotheses are described in the following subsections. 

RESULTS and DISCUSSION 

Results of the data analyses are presented in Appendix A, and the specific areas investigated are discussed 

in the following: 

Demoaraohic Data. For the 48 students, the average time in service was 2.23 years. For the 11 instructors, 

9 had a rate of electronics technician first class (ETl) and 2, of ET2; the average paygrade was 5.82. The 

average time in service for instructors was 10.41 years and average time in paygrade was 3.64 years. 

Instructors were system qualified for an average of 4.67 years and had worked on the system hardware in the 

fleet an average of 2.94 years, In addition, they averaged 16.18 months as instructors. 

. 

EI@WIWZ (Table A-2). Hypothesis 1. Instructors (experts) will score significantly higher on the TAE test 

than students (novices). A one-way analysis of variance (ANOVA) was performed to test hypothesis 1. The 

F ratio value is not significant. 

Hypothesis 2. Sub’ects with a longer time in the electronics rate (i.e., Time in Service - TIS) will score 

significantly higher on tlle TAE test than subjects with less time in that rate. 

Generally, the relationship between experrence and TAE performance was not statistically significant. This 

apparent anomaly may be explained by the fact that instructors of the course are not required to be system 

qualified. Students must prove their system qualification to graduate. 

The lack of a significant relationship between experience and troubleshooting performance causes one to 

uestion if the experience measures were appropriate, if an appropriate set of subjects was tested, if the TAE 

1 elivery and evaluation systems are valid, or if there is actually no difference due experience. Given the face 

validity of TAE and the high level of expectation by subject matter experts of the relationship between 

experience and Performance, further testing is needed to resolve this issue.

. 

EJectronlcs KnowledQg (Table A-3). Hypothesis 3. Students with higher academic school final scores 

will score hi her on the TAE test than students with lower scores. The correlation between academic school 

final scores aover 

course final score) and TAE test scores is significant at the .05 level. However, the correlation 

between academic school comprehensive scores (final test) and TAE test scores is positiie but not 

significant. Therefore, academic school final scores were significantly correlated with TAE test scores, but 

school comprehensive scores were not. 

Hypothesis 4. Students wfth higher academic school subsystem test scores will score higher on the TAE 

subsystem tests (episodes) than students with lower school subsystem test scores. For Subsystem 1, the 

correlation of academic school subsystem test scores with TAE subsystem test scores is significant at the .05 

level. Subsystem 2 has a positive correlation, which is not significant. Both Subsystems 3 and 4 have negative 

correlations, which are not significant. Therefore, the only significant correlation between academic school 

subsystem test scores and TAE subsystem test scores was for Subsystem 1 (the computer). 

Hypothesis 5. Students with higher appropriate Armed Services Vocational Aptitude Battery (ASVAB) 

scores for Electronics Technician selection in general science, electronics information, mathematics 

knowledge, arithmeticreasonin (jGS + El + MK] +AR), and the armed forces qualification test (AFQT), will 

score higher on the TAE test ta an subjects with lower ASVAB and selection scores. All but one of the 

correlations is negative. The only significant correlation between ASVAB scores and TAE score is Arithmetic 

Reasoning (AR) with a negative correlation significant at the .05 level. The only positive correlation is between 

General Science (GS) and TAE score, which was not significant. 

There was no generally consistent relationship between electronics knowledge and TAE performance. 

There was a relationship where performance testing was a component of the academic score used. There 

was, however, a negative relationship between the scores used to determine selection to the occupational 

speciality (electronic technician) and performance scores. 

The lack of relationships of electronic theory or academics and troubleshooting performance need further 

investigation. As with a number of other studies of this type, there was no consistent relationship between 

knowledge of theory and the ability to perform. This may have been related to the method of determining 

knowledge and academic success in the school. Testing in the school does not appear to rovide 

discriminatory capability and correlational analyses do not show statistically significant results. tchools 

should ensure tests discriminate between student’s academic and performance ability and assess student 

behaviors in a more structured, formalized, objective way. Otherwise, effects of a change to instructional 

methods or techniques cannot be assessed in terms of course outcomes. FurtherTAE testing might determine 

the resulting relationships. 

Also, the relationships of selection requirements and troubleshooting performance need further 

investigation. Of greatest interest is the failure of performance results to positively relate to the ASVAB scores 

used to select personnel for this occupational speciality. The consistent negative trend seems to indicate 

that, while the ASVAB tests may relate to academic performance, there may be no relationship between 

ASVABs, TAE performance, and/or on-the-job performance. 

. 

etfwce Profa 

. . 

7 Fable A-4). Hypothesis 6. Subjects with a higher level of 

troubleshooting proficiency will make ewer invalid checks than less proficient subjects. The correlation 

between TAE score and the number of invalid checks is not significant. 

Hypothesis 7. Subjects with a higher level of troubleshooting proficiency will make fewer illogical 

approaches than less proficient subjects. The correlation between TAE score and the number of illogical 

approaches is significant at the .Ol level. 

Hypothesis 8. Subjects with a higher level of troubleshooting proficiency will make fewer incorrect 

solutions than less proficient subjects. The correlation between the TAE score and the number of incorrect 

solutions is significant at the ,001 level. 

Hypothesis 9. Sub’ects with a higher level of troubleshooting proficiency will make fewer redundant checks 

than less proficient sub jects. The correlation between TAE score and the number of redundant checks is not 

significant. 

Hypothesis 10. Subjects with a hi her level of troubleshooting roficiency will test significantly more proof 

ooints than less oroficient subjects. 7 he correlation between the f-AE score and the number of proof points 

js significant at the .OOl level. . 

Hypothesis 11. Subjects with a higher level of troubleshooting proficiency will make significantly fewer 

tests than less proficient subjects. The correlation between the level of troubleshooting proficiency and 

number of tests is significant at the .OOl level. 

The only proficiency factors that failed to show significance were invalid and redundant checks, which 

could have been caused by design of the delivery system and/or the method of determining these factors. 

This set of hypotheses strongly support the validity of the TAE technique and approach. 

The utility of the TAE as a job performance measure and as an objective measure of readiness in the skill 

area addressed (in this case, system troubleshooting) should be investigated further. 

. . 

DIfflcultv (Table A-45 . Hypothesis 12. The more difficult the episodes, the longer the average time 

needed to find the solution. ihe 

correlation of TAE difficulty with length of time to find the solution is significant 

at the .OOl level. 

Hypothesis 13. On episcdes of equal difficulty, subjects with a higher level of troubleshooting proficiency 

will take significantly less time than less proficient subjects in finding the solution. Episode difficult levels 

were determined and episodes were grouped with level 1 being the easiest and level 5 the most di fricult as 

375 

_ .

follows: (I 2 episodes (2) 4 episodes (3) 3 episodes (4) 2 episodes and (5) 3 episodes. Hypothesis 13 was 

significant 1y 

supported for each level. 

Hypothesis 14. The more difficult the episode, the less time the instructors will take to find the TAE test 

solutions when compared to the students (novices). The difficulty level of the episode and the dtfference rn 

time between instructors and students to find TAE test solutions is negatively correlated but not significant. 

Although no significant difference was found, the more difficult the episode, the less time instructors tended 

to take to find the TAE test solutions when compared to the students. 

Generally, the results were as expected; that is, the more difficult, the more time; at different levels of 

difficulty, better performers took less time. An unexpected result was the lack of significant difference between 

students and instructors. The difference was, however, strongly in the direction expected. 

The consistently significant relationship In this area clearly calls for further investigation and improvement, 

particularly in behavioral and cognitive task analyses. 

Iime (Table A6). Hypothesis 15. Subjects with a higher level of troubleshooting proficiency will take 

si nificantly less total time to find TAE e isode solutions than less proficient subjects. The correlation b&ween 

TfE score and total time to find epis OCPe fault is significant at the .OOl level. 

Hypothesis 16. Subjects with higher levels of troubleshooting proficiency will take a significantly longer 

time than less proficient subjects before making the first test point. The correlation between TAE score and 

time to first test point is significant at the .05 level. 

Results suggest that analysis of behavior and cognitive protocols could result in a dramatic change In the 

way the training community presents troubleshooting training. Here again, behavioral protocol analysis could 

provide useful information on training approaches. 

Comolex Test Eauioment (Table A-7). Hypothesis 17. Subjects with a higher level of troubleshooting 

proficiency will make significantly more tests using an oscilloscope than less proficient subjects. The 

correlation between TAE score and the number of oscilloscope tests is not significant. 

Given the nature of the hardware system and the resulting TAE delivery system, subjects did not a pear 

to have sufficient opportunity to use complex test equipment in the TAE episodes. Therefore, the lac R of a 

statistically significant result may have no practical meaning. 

m (Table A-8). H pothesis 18. The higher the student’s TAE class rank, the higher the student will 

be ranked in terms of trou‘6 leshooting proficiency by instructors or work center supervisors. Hypothesis 18 

was supported for two classes at the .OOl level. The correlation between TAE class ranking and in.structorMork 

center supervisor ranking was not significant for the other classes. Although not significant, two classes had 

an inverse relationshi . 

Hypothesis 19. Ph e higher the student’s TAE class rank (final score), the higher will be the student’s 

ranking in the class. Hypothesis 19 was supported for one class at the .Ol level of significance. For the other 

classes, the correlation between TAE class ranking and ranking in school class was not significant. Although 

not significant, two classes indicated a strong positive correlation. Conversely, one class showed a strong 

inverse relationship between TAE class ranking and school class ranking. 

Hypothesis 20. The higher the instructor ranking of the student in terms of troubleshooting proficiency, 

the higher will be the student’s ranking in the class (final score). Hypothesis 20 was supported for three 

classes, one class at .OOl level and two at .05 level. Although not significant, one class showed a strong 

positive correlation between instructor student ranking and class student ranking. One class showed a weaker 

positive correlation and two classes indicated an inverse relationship. 

There were no consistent results in rankings across instructors, TAE performance, or school Performance. 

In several classes, inverse relationships were shown. Only one class had a consistent significant relationship 

across hypotheses. 

The results of this area most clearfy attest to the need for an objective evaluation tool of the skill of 

troubleshooting. It shows that supervisors, and school results do not have the ability to evaluate personnel 

in this skill. 

FUTURE EFFORTS 

In addition to the recommendations made for each area of investigation, we also have the following general 

recommendations for future efforts in this area. 

1. Further investigate TAE validity and reliability. Design and development of the TAE approach and 

delivery system stron ly support face validity of TAE. Subject matter ex erts were involved in all phases of 

the project. They Betermined the factors of evaluation, weights oP the factors, evaluation scheme, 

troubleshooting episodes to be used, developed the episodes and participated in the test and evaluation. 

Since T&E results are somewhat ambiguous, areas dealing with validity and reliability should be investigated 

further. 

2. Analyze data to further develop discriminatory/predictive capability. Results of performance Of 

subjects on TAE episodes should be subjected to behavioral protocol analyses to develop a model of 

troubleshooting and further analyses of approaches used by good vs. bad troubleshooters and ultimately 

cognitive Protocol analyses to determine selection, training and evaluation requirements. 

376

3. Further test the TAE approach on a larger and more comprehensive popL!la:ion and on other 

equipment. Further investigation should use hardware that allows wider and less restnctfve utilization of test 

equipment. It may also be possible to select specific troubleshooting episodes that enable wider utilization 

of more types test equipment. This type of investigation should take place to determine if certain episodes 

and hardware types require special test equipment use capability. Investigate this approach in other high-tech 

hardware systems as well as other occupational areas (i.e., mechanical hardware troubleshooters/repair 

personnel). A TAE type delivery system should be developed for a number of other high and mid-tech 

hardware systems. -- - - 

4. Develop more troubleshooting episodes to provide directive training, guided training, and tests with 

feedback. Then, a complete and comprehensive troubleshooting skill development, maintenance, 

assessment, and evaluation program would be available for personnel from novIce to expert skill levels. TAE 

could be used for active duty personnel in a school or fleet environment and for reserve personnel at the 

readiness centers or aboard ship during active duty periods. 

For greater detail on the background, design/development and administration and the test and evaluatlon 

results consult: Conner and Hassebrock (in press); Conner, Hartley and Mark, (in press); and Conner,.Poirier, 

Ulrich and Bridges, (in press). 

REFERENCES 

Conner, H. B. (1988, October). o oub Tr Proficiencv I‘ eshoot in F v a l u a t i o n Proiect (TPE P). In Proceedings of 

Military Testing Association Conference, Mystic, Connecticut. 

ner, & Bjl(1987, Apri 

ace ss e System f 

of the National Security In 

on Pro&t TTPFP) form 

ationai Manpower and Training Conference 

Conner, H. B. & Hassebrock, 7 7 I(rn press). -Assessmentand Fnhancement (TAF) 

Proaram: Theoretical. Method0 oa ca , Test and Evaluation Issues. San Diego: Navy Personnel Research 

and Development Center. 

~)rr~mHTB.; H;rtley, S.,.Mark, L J: (In press). rent FAF) 

oo a : es a d Evaluatron. San Diego: Navy Personnel Research and Development Center. 

Conner, H. B., Poirier! C., Ullrich, R., & Bridges, T. (In press).. [ 

Des~gp, Devmnd Proom San Diego: Navy Personnel Research 

nt Center. ’ 

. . . . 

Nauta, F. (1984). Afl&z&g Fleet Maintenance-Q! Maintenance- 

Research. (VI NAVIRAEQUIPCEN MDA903-81-0188-l). Orlando: Naval Training Equipment Center. 

377 

- .

EKie 

;; 

;: 

;: 

v7 

V8 

VQ 

VlO 

Vll 

v12 

v13 

v14 

v15 

V16 

v17 

V18 

v19 

V2CJ 

v21 

Contents of Variable 

Subject’s Social Security Number 

Equipment (hardware subsystem) number 

(1 = USH26) 

Episode number (1) 

Found Solution (1 = Yes, 0 = No) 

Number of Test Points 

Number of Out-of-Bounds tests 

Number of Valid Checks 

Number of Invalid Checks 

Number of Redundant Checks 

Number of Proof Points subject tested 

Total number of Proof Points in the episode 

% proof pts tested: o/l0 % Vl l)*lOO, rounded 

to a whole number 

Total Time spent on the episode (in minutes) 

To Be Determined 

Number of Equipment Selection events 

Number of Front Panel events 

Number of Maintenance Panel events 

Number of Fallback test events 

Number of Reference Designator test events 

Number of Replace LRU events 

Number of Review Symptoms events 

APPENDIX A 

TAE DATA and ANALYSIS RESULTS 

TABLE A-l. Variables for TAE Episode 1 

TABLE A-2. Experience 

Variable 

Name 

Contents of Variable 

To be determined 

E Number of Diagnostic Test events 

V24 Number of Load Operational Program events 

v25 Number of Step Procedure events 

V26 Number of Revision events (instructor intervention) 

V27 Number of INCORRECT Replace LRU events 

V28 Number of GOOD FAULT Replace LRU events 

Time to first Reference Designator Test (in minutes) 

E Time to first Diagnostic Test (in minutes) 

v31 Sum of all steps of episode: ALL events, except Inst. 

actions. 

V32 Number of Waveform tests performed 

V33 Number of Voltage tests performed 

V34 Number of Read Meter tests performed 

V35 Number of Logic tests performed 

V36 Number of Current tests performed 

v37 Number of Frequency tests performed 

V38 Number of Continuity tests performed 

v39 Number of Adjustment tests performed 

V40 Final Score of the episode 

v41 To be determined - for possible future expansion 

V42 To be determined -for possible future expansion 

v43 To be determined -for possible future expansion 

HI TAE Student TAE Test Score & instructor TAE Test Score 

Group Mean N Variable l:TAESCORE 

1 70.396 48 Source Sum of Sqs D.F. Mean Sq FRatio Prob. 

G&d 70.980 73.422 59 11 Within Between 2057.124 81.073 57 1 81.973 36.090 2.271 .1373 

Mean Total 2139.098 58 

Correlational Hypothesis Statement N Correlation Critical Value 

H3 TAE Score vs TIS 59 .13676 .21638 

Correlational Hypothesis Statement 

H3 TAE vs School Final 

TAE vs School Camp 

TABLE A-3. Electronic Knowledge 

H4 Avg. TAE Subsystem 48 

1 vs. School Subsys 1 .27704’ 

2vs.” “2 .17579 

3v.s” “3 - .18146 

4vs.” “4 - .21972 

H5 TAE vs ASVABS 48 

AFQT - .00398 

AR - .32510’ 

El - 96673 

ASVABl - .02672 

ASVABT - .13055 

ii 

N 

Correlation Critical Value 

.30181* .24045 

.17311 .244X5 

TABLE A-4. Electronic Performance Proficiency. 

.24045 

.24045 

.24045 

a24045 

.24045 

.24045 

.24045 

.24045 

.24045 

Correlational Hypothesis Statement N 

Correlation Critical Value - 

H6 TAE vs Invalid Checks 59 

- .17107 .21638 

H7 TAE vs Illogical Approaches 

59 - 34057” .21638 

H8 TAE vs Incorrect Solutions 

59 - .69676”* .21638 

H9 TAE vs Redundant Checks 

59 - 98543 .21638 

HlO TAE vs Proof Points 

59 .56997*** .21638 

Hl 1 TAE vs X of Tests 

59 - .55201*** .21638 - - - 

378 

- 

_ 

,

* PC.05 

* * pc.01. 

*** pc.001. 


H13 Ep Diff vs. Ep Time 

H14 Ep Diff Lev vs Time 

b/81 1 (Easiest) 

Level 2 

Level 3 

Level 4 

Level 5 (Hardest) 

H15 Ep Diff vs. Time Dif 


HlSTAEvsTime 

H18 TAE vs Time to 1st Check 

TABLE A-5. Difficulty Level 

N 

ii 

14 

TABLE A-6. Time 

Correlation critical Value 

.93D51 � ** .45Qtxl 

- .81265*** .21638 

- 3x04** .21638 

- *74653*** .21638 

- .73553-e .21638 

- 587Q8*** .21638 

- 34658 .459rxJ 

Correlation Critical Value 

- .49233*‘* .21638 

59 - .23814* -21638 

TABLE A-7. Complex Test Equipment 

Correlational Hypothesis Statement N Correlation Critical Value 

H17 TAE vs 0SCOp8 use 59 .18T71 .21638 


H71;z yking vs lnst Ranking 

2 

3 

4 

5 

6 

7 

H21 TAE Rank vs Class Rank 

Class 1 

2 

3 

4 

f 

7 

yl;iss yk vs lnst Ranking 

f 

4 

f 

7 

TABLE A-a. Ranking 

379 

N 

7 

7 

,B 

8 

8 

7 

7 

7 

i 

8 6 

7 

7 

7 8 

ii 

9 

7 

Correlation 

.96429*** 

35714 

A6429 

.4S571 

- .14286 

- .07143 

.96429*‘* 

.8Q286** 

.57143 

- .14288 

46571 

- .37143 sQ524 

.60714 

.96429*** 

- 35714 .02381 

.75ax* 

:Z* 

64286 

Critical Value 

.87649 

67649 

.87649 

.73972 

.73972 

.82658 

47849 

m649 

67549 

.67649 

.73Q72 

.82658 .73972 

.87649 

.67649 

67649 .82658 

67649 

a2558 

.686Q7 

.67649 

. .

Incrementing ASVAB Validity with 

Spatial and Perceptual-Psychomotor Tests 

Henry H. Busciglio 

U. S. Army Research Institute 

The Army's Project A is a long-term, comprehensive effort to 

improve the selection and classification of enlisted personnel. 

One objective of this effort was to develop and validate measures 

of abilities other than the general cognitive domain covered by 

the Armed Services Vocational Aptitude Battery (ASVAB), including 

spatial, perceptual, and psychomotor abilities. Previous 

analyses of Project A data (Campbell, 1988) showed that the ASVAH .' 

is useful for predicting first tour performance. Therefore, the 

ASVAB serves as a baseline against which the marginal utility of 

other tests for selection and classification is judged. This 

analysis of data collected during the 1985 Project A Concurrent 

Validation attempted to answer three questions: 

(1) How much of the variance in comprehensive performance 

measures can spatial and perceptual-psychomotor tests account 

for, over and above that predicted by ASVAB subtests? 

(2) Is either type of test, spatial or perceptual-psychomotor, 

more useful for incrementing ASVAB validity? 

(3) Which specific Project A tests will make the highest 

individual contributions to this incremental validity? 

Subjects 

Method 

Subjects were first-term enlisted personnel in the nine MOS 

for which hands-on criterion measures were collected as part of 

the 1985 Concurrent Validation phase of Project A. The number of 

subjects from each MOS, as well as the total sample size, is 

shown in Table 1. 

Predictors 

Predictors were the nine ASVAB subtests, the six Project A 

paper-and-pencil tests of spatial ability, and 14 selected scores 

from the ten Project A computerized perceptual-psychomotor tests. 

Table 2 presents a list of these predictors, along with the 

specific perceptual-psychomotor scores used. 

Presented at the meeting of the Military Testing 

Association, November, 1990. All statements expressed in this 

paper are those of the author and do not necessarily reflect the 

Official opinions or policies of the U.S. Army Research Institute 


380

Table 1 

Subjects 

MOS Enlisted Job N 

11B 

13B 

19E 

31c 

63B 

64C (now 88M) 

71L 

91A 

95B 

TOTAL 

Infantry 

Cannon Crew 

Armor Crew 

Single Channel Radio Operator 

Light Wheel Vehicle Mechanic 

Motor Transport Operator 

Administrative Specialist 

Medical Specialist 

Military Police 

491 

464 

394 

289 

478 

507 

427 

392 

597 

4,039 

Note. Actual sample sizes for some analyses were 

smaller than those shown. 

Table 2 

Predictor Measures 

ASVAB Subtests: 

Spatial Ability Tests: 

Mechanical Comprehension Assembling Objects 

Auto/Shop Information Map 

Electronics Information Maze 

Math Knowledge 

Object Rotation 

Arithmetic Reasoning Orientation 

Verbal (Paragraph Comprehension Figural Reasoning 

+ Word Knowledge) 

General Science 

Coding Speed 

Number Operations 

Perceptual-Psychomotor Tests and Scores: 

Target Tracking 1 - accuracy 

Target Tracking 2 - accuracy 

Target Shoot - accuracy and time-to-fire 

Cannon Shoot - time discrepancy (from optimal) 

Simple Reaction Time - decision time 

Choice Reaction Time - decision time 

Short-Term Memory - decision time and proportion correct 

Perceptual Speed - decision time and proportion correct 

and Accuracy 

Target Identification - decision time and proportion .correct 

Number Memory - response time 

381

criterion Measures 

All criteria were comprehensive, llcan-dolt measures of job 

performance, as listed and described below. 

Total Score on Written Tests: measures of soldiers' 

technical knowledge pertinent to the various "critical tasks" 

performed in each MOS. 

Total Score on Hands-On Tests: measures of soldiers' ability 

to actually carry out the 14 to 17 major job tasks in each MOS. 

General Soldierina Proficiencv: a composite score on written 

and hands-on tests of tasks common to many MOS (e.g., determining 

grid coordinates on maps, recognizing friendly/threat aircraft). 

Core (i.e., MOS-specific) Technical Proficiency: a composite 

score on written and hands-on tests of tasks that are at the 

llcorell of each MOS (i.e., those that define the MOS). 

Skill Qualification Test Score (SOT): written tests Of MOSspecific 

technical knowledge developed by the U.S. Army Training 

and Doctrine Command for periodic testing of soldiers MOS. 

The comprehensive measures above are not mutually exclusive. 

Written and hands-on test scores were used in the computation of 

General Soldiering and Core Technical Proficiency, as well as the 

total scores for written and hands-on tests. , 

Procedure 

Collection of Project A predictor and criterion data was 

part of the 1985 concurrent validation. Scores on the ASVAB 

subtests and the Skill Qualification Test were obtained from 

archival data sources. 

A series of backward stepwise multiple regression analyses 

were performed separately for each MOS. An SPSS Regression 

program sequentially entered blocks of ASVAB, spatial, and 

perceptual-psychomotor tests, removing nonsignificant tests in 

each block before entering the next block. Two orders of entry 

were used. In both cases the ASVAB tests were entered first; in 

one analysis spatial tests were entered second, followed by the 

perceptual-psychomotor; in the other analysis this order was 

reversed. Results were corrected for restriction-of-range in the 

ASVAB scores (Lawley formula; Lord and Novick, 1968), and 

adjusted for shrinkage (Wherry, 1940). 

Results 

Table 3 shows the proportion of variance explained (R2) by 

the significant predictors of the criteria at each stage of 

382

Table 3 

Proportion of Criterion Variance (R2) Accounted for by 

Significant Predictors (Median Values Across MOS) 

Stage (1) (W Pa) (2b) (3b) 

Predictors Retained ASV ASV+Sp ASV ASV+P/M 

Predictors Entered ASV SP P/M P/M SP 

Written Tests: .59 .64 65 .61 .65 

Hands-On Tests: .29 .33 133 .31 33 

General Soldiering: .47 .51 53 .50 :53 . 

Core Technical: .44 .49 :50 .48 51 

Skill Qualification: .53 .54 .55 -54 :55 

analysis. A comparison of column 1 with columns 3a and 3b 

indicates that spatial and perceptual-psychomotor test scores 

substantially improved the prediction of Written Test scores, 

General Soldiering Proficiency, and Core Technical Proficiency. 

Increases in R*s for Hands-On Tests and the Skill Qualification 

Test Score were more modest. 

Regarding the relative usefulness of spatial vs. perceptualpsychomotor 

tests for incrementing the prediction of the 

criteria, columns 2a and 2b of Table 3 show median incremental 

R2s (across MOS) of spatial vs perceptual-psychomotor predictors 

at Stage 2. Spatial tests were slightly better than perceptualpsychomotor 

scores for improving the prediction of the criteria. 

The third research question concerns the validities and 

incremental validities of individual Project A tests. Table 4 

lists the three best spatial, perceptual, and psychomotor tests, 

in terms of frequency and magnitude of significant effects. For 

the tests of spatial ability, Assembling Objects, Figural 

Reasoning, and Map were superior incremental predictors. Among 

the perceptual scores, Target Identification (% correct), Short 

Term Memory (% correct), and Number Memory (response time) were 

especially useful as incremental predictors. For the psychomotor 

scores, l- and a-Hand Tracking (accuracy), and Target Shoot 

(time-to-fire) were the best. 

Discussion 

In these analyses Project A test scores substantially 

improved the prediction of the criteria. The results for Total 

Score .on Written Tests and General Soldiering Proficiency support 

the wide generalizability of Project A incremental validity. 

Specifically, the first measure may involve highly different 

content across MOS, while the second measures a set of more 

383

Table 4 

Best Spatial, Perceptual, and Psychomotor Tests 

Number of Equations Range of Median 

Project A Where Significant Semi-partial 

Tests (Maximum=86) Correlations 

Spatial: 

Assembling Objects 48 . 06 - .11 

Figural Reasoning 40 . 07 - .12 

Map 33 .07 -, .14 

Perceptual: 

Target Id. - % correct 32 

Short Term Memory - % correct 25 

Number Memory - response time 25 

. 07 - .lO 

.07 - .14 

ns - -07 

Psychomotor: 

a-Hand Tracking - accuracy 20 05 - .15 

l-Hand Tracking - accuracy 18 -:OS - .lO 

Target Shoot - time-to-fire 6 -.09 - -.07 

common tasks, but does so using both written and hands-on scores. 

Although spatial tests were slightly superior to the 

perceptual-psychomotor scores as incremental predictors, the 

latter group of measures accounted for criterion variance which 

is not redundant with the spatial tests. This is important 

because the perceptual-psychomotor tests require expensive 

computer hardware and software and must be administered 

individually. Thus, their utility should be considered 

separately with each selection or classification decision. 

These analyses also revealed that some individual Project A 

tests were significant incremental predictors across a wide 

variety of MOS and criteria (see Table 4). These measures are 

therefore strong candidates for addition to ASVAB. 

To interpret these results properly, a number of 

methodological considerations should be noted. First of all, 

ASVAB scores were employed for selection, while the Project A 

scores were used Infor research purposes only." Individuals may 

have responded more carefully, exerted more effort, etc., on the 

ASVAB subtests, thus making them more valid measures of abilities 

than the Project A tests. Another concern is a statistical one. 

Although the samples used were large enough to make the degree of 

shrinkage in each individual equation relatively low, the large 

number Of equations computed increases the probabilities that 

384 

. .

some ASVAB and Project A predictors were significant due to Type 

I errors. Although most of the Project A tests were significant 

far more often than the chance level (cf. the middle column of 

Table 4), the lack of opportunities at this point for crossvalidation 

renders the results reported in this paper exploratory 

and suggestive only. 

The Longitudinal Validation of Project A, which began in 

1986/87, will provide more definitive answers to the research 

questions involved in these analyses. Based upon the preliminary 

results reported here, we are optimistic about the findings of 

the Longitudinal Validation. 

References 

Campbell, C.H. (in preparation). Developing basic criterion 

scores for hands-on tests, iob knowledae tests, and task 

ratina scales (Draft of AR1 Technical Report). 

Campbell, J.P. (Ed.). (1988). Imnrovins the selection, 

classification, and utilization of Armv enlisted personnel: 

Annual report, 1986 fiscal year (AR1 Technical Report 792). 


Cohen, J., & Cohen, P. (1983). Anolied multinle rearession/ 

correlation analysis for the behavioral sciences. Hillsdale, 

NJ: Lawrence Erlbaum Associates. 

Davis, R.H., Davis, G.A., Joyner, J.N., & de Vera, M.V. (1987). 

Development and field test of iob relevant knowledge tests 

for selected MOS (AR1 Technical Report 757). Alexandria, VA: 

U.S. Army Research Institute. 

Lord, P., & Novick, M. (1968). Statistical theorv of mental 

test scores. Reading, MA: Addison-Wesley Publishing Co. 

Pedhazur, E.J. (1982). Multinle regression in behavioral 

research (2nd. Ed.). New York, NY: Holt, Rinehart and 

Winston. 

Peterson, N.G. (Ed.). (1987). Development and field test of the 

trial battery for Proiect A (AR1 Technical Report 739). 


Wherry, R.J. (1940). Appendix A. In W.H. Stead and C.P. Shartle 

(Eds. 1, Occunational counseling techniques. New York: 

American Book Company. 

385 

. .

Item Content Validity: Its Relationship 

With Item Discrimination and Difficulty 

Teresa M. Rushano 


At the USAF Occupational Measurement Squadron (USAFOMS), subject-matter experts 

(SMEs) rate the questions on promotion tests for content validity. 

They also use standard statistical criteria to determine whether test questions 

should be reused on subsequent test revisions. The purpose of this 

research was to explore the relationship between SME content validity ratings 

(CVRs) and item statistics. 

The Specialty Knowledge Tests (SKTs) used for enlisted promotions in the Air 

Force are written at USAFOMS by senior NCOs acting as SMEs under the guidance . 

of USAFOMS psychologists. Within each specialty, one SKT is prepared for 

promotion to staff sergeant (E-51, and one for promotion to technical and 

master sergeant (E-6 and E-7). 

The USAFOMS test development process includes a procedure based on the methodology 

of Lawshe (1975) for quantifying content validity on the basis of 

essentiality to job performance. As part of the process of revising an existing 

SKT, each SME independently assigns each test question a rating using 

the following scale: 

Is the skill (or knowledge) measured by this test question: 

(21, Essential 

Useful but not essential (11, or 

N o t n e c e s s a r y (01, 

for successful performance on the job? 

The SMEs as a team then use these ratings as a point of departure in discussing 

whether individual items should be retained on subsequent test revisions. 

Perry, Williams, and Stanley (1990) found that CVRs influence SME determina- 

tion of an item's test-worthiness and its subsequent selection for continued 

use or deactivation. However, the ratings are not the only factors which may 

impact the SME decision whether to reuse an item on an SKT. After completing 

the CVRs, SMEs review item statistics. 

For each SKT question, item statistics are provided which indicate how well 

,-an item is doing on the test. USAFOMS has an established set of statistical 

Tcriteria for test items which must be met. Test questions that do not meet 

these criteria must be revised in order to be incorporated on the revised 

3 version of the test. The two statistical elements examined in this research 

~.'are the difficulty index and discrimination index. The difficulty (DIFF) of 

# a test item, sometimes known as its ease index, is defined as the total percentage 

of examinees on a test who selected each choice. The DIFF value for 

the correct answer is examined to see if the item as a whole is too easy or 

too hard. For example, an item answered correctly by 97% of the examinees is 

considered too easy for the purposes of the SKT and would not be reused on 

subsequent test revisions. 

The s,econd statistical element used in this research is the discrimination 

index (DISC). This statistic is calculated for each item choice by subtract- 

ing the percentage of low-scoring examinees (i.e., those scoring in the lower 

50% of all examinees) who select a choice, from the percentage of high-scoring 

examinees making that choice. If a test question is working properly,

the higher-scoring examinees will answer the question correctly, while the 

lower-scoring examinees will select incorrect options. When this occurs, the 

correct answer’s DISC will be positive and the incorrect answers will have 

negative DISC values. 

METHOD 

Content validity ratings and item statistics were obtained from both the E-5 

and E-617 SKTs of 23 Air Force specialties (AFSs). Table 1 lists the AFSs 

examined and their Air Force specialty codes (AFSCs). Using USAFOMS standard 

forms, SMEs rated the content validity of each item on the tests they were 

revising. The AFSs chosen for this study were those found by Perry et‘ al. 

(1990) to have significant (p

Table 1 

Air Force Specialties and Specialty Codes 

SPECIALTY AFSC 

Pararescue/Recovery 115x0 

Visual Information Production 231X3 

Airfield Management 271X1 

Air Trafic Control 272X0 

Elec. Comp. and Switching Systems 305x4 

Maint. Data Systems Analysis 391x0 

Missile Systems Maintenance 411XOA 

F-15 Avionics Test Station 451x4 

FB- 111 Avionics Test Station 451X6 

Photo. and Sen. Maint. Tac/Recon 455XOA 

Photo. and Sen. Maint. ReconEl.0. 455XOB - 

Air Launched Missile Sys. Maint. 466X0 

Zomm. Computer System 491x0 

Refrigeration and Air Conditioning 545x0 

Construction Equipment 551x1 

Production Control 555x0 

Logistics Plans 661X0 

[nformation Management 702X0 

tianpower Management 733x1 

Radiology 903x0 

Medical Laboratory 924x0 

systems Repair 991x4 

scientific Measurement 991x5 

. ._ -. , _ . 

388

Table 2 

Correlation Coefficients of Content Validity Ratings and Item Statistics 

15156A *.282 .148 90370 .021 *.230 

S156B *.265 .182 92450 *.336 *.263 

15176 .009 .148 92470 *.315 *.259 

15550A *.240 .177 99154 .064 .055 

15570A *.278 ,076 99174 .105 .050 

i5550B *.196 .116 99155 .017 .083 

i5570B .175 *.210 99175 .004 .130 

*Indicates significant correlation t.05) 

389

an item is for a certain level of test and the two SKTs are constructed inde- 

pendently. Typically, both the specialty training standard (STS) and the 

occupational survey report (OSR) which are used in the development of SKTs, 

show that different levels of knowledge are required for these ranks and that 

different types of tasks are associated with E-5 and E-6/7 positions. 

Finally, a fourth post hoc analysis was conducted to examine the test populations 

of the 48 SKTs studied. It was hypothesized that SKTs with higher test 

populations would more likely be the SKTs with significant correlation be- 

tween CV-Avg and item statistics since statistics from higher populations are 

more reliable. . 

The first three post hoc analyses were conducted using chi-square tests of 

statistical significance. Even though eight of the 19 career fields with 

significant correlation between CV-Avg and DIFF were from the electronic area, 

no significant difference was found (p

-~ . -______ -__--..- 

REFERENCES 

Lawshe, C. H. (1975). A quantitative approach to content validity. Person- 

nel Psychology, 28, 563-575. 

Implementation of 

Perry, C. M., Williams, J, E., and Stanley, P. P. (1990). 

content validity ratings in Air Force promotion test construction. Proceed- 

ings o.f the 3Znd Annual Conference of the Military Testing Association, 1990. 

391

The Air Force Medical Evaluation Test, Basic 

Military Training, and Character of Separation 

Edna R. Fiedler’ 

Wilford Hall Medical Center 

Lackland Air Force Base, Texas 

Selection procedures and rapid early intervention are two strategies used 

by the United States Air Force to reduce the human and monetary costs of 

attrition in the enlisted force. Cognitive measures such as the Armed Services 

Vocational Aptitude Battery and the Armed Forces Qualification Test have long 

been used effectively for academically based screening. Se1 f reported 

biographical data (biodatal and personality measures have been used for 

screening noncognitive adaptability. 

Armed Services biodata techniques have included the Navy’s Recruit 

Background Questionnaire (RBQ) and the Army’s Military Applicant Profile (MAP) 

and Assessment of Background and Life Experiences (ABLE!. Currently, the Navy, 

as Executive Agent, designed the Armed Service Applicant Profile, a combination 

of the best items from MAP and RBQ (Trent, Quenette, 6 Laabs, 1990; Laabs, 

Trent, 81 Quenette, 19891. 

Other studies have used a variety of personality measures to predict basic 

military training attrition, While Spielberger and Barker (1979) studied the 

relationships of trait and state naxiety on attrition from basic military 

training for both Navy and Air Force recruits, Butters, Retzlaff and Gibertini 

(19861 used the Millon Clinical Multiaxial Inventory (MCMI) to predict 80% of 

mental health clinic recommended discharge versus return-to-duty dispositions. 

McCraw and Bearden 11988) have focused on motivational demographic, i;nd 

personality test scores to technical training school students referred to a 

mental health clinic. 

Since the 1970’s, the Air Force has used The Air Force Medical Evaluation 

Test (AFMET I to screen out those basic recruits likely to attrite from Basic 

Military Training, Early work on the development and initial validation of the 

instrument included the studies by Lachar (19741, and Guinn, Johnson, and Kenton 

(1975). Bloom (1977, 1980, 1983) reported on the ongoing operational aspects of 

the program. The interested reader is referred to Crawford ‘8 (19901 review of 

the history of AFMET. This study reports on the efficacy of the instrument used 

in the first phase, the History Opinion Inventory (HOI), for predicting BMT 

performance and Character of Separation. In addition, the Gordon Personal 

Profile (Gordon) and the Minnesota Multiphasic Personality (MMPI) are discussed 

in relationship to BMT performance and character of separation. 

METHOD 

Subjects. 

The total sample consisted of all USAF enlisted personnel whose total 

Active Military Service Date was calendar year 1985 through 1989 and who were 

also identified by Wilford Hall USAF Medical Center for testing on the AFMET, 

or 171,707 subjects (males = 138,601, females = 33,106). The number of 

1 Disclaimer: The views expressed in this paper are those of the authcr and do 

not neCeSSaPily represent those of the United States Air Force or the Department 

of Defense. Acknowledgments: The author thanks Melody Darby and Doris Black for 

their assietance in statistical analyses, Calvin Fresne for his assistance ln 

data management, and Malcolm Ree, PH.D. for his assistance throughout the study. 

a____.--- .-..__ --_ 

392

REFERENCES 

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel 


Perry, C. M., Williams, J. E,, and Stanley, P. P. (1990). Implementation of 

content val.idity ratings in Air Force promotion test construction. Proceed - 

ings of the 32nd Annual Conference of the Military Testing Association, 1990. 

391 

.

The Air Force Medical Evaluat ion Test, Basic 

Military Training, and Character of Separation 

Edna R. Fiedler’ 

Wilford Hall Medical Center 

Lackland Air Force Base, Texas 

Selection procedures and rapid early intervention are two strategies used 

by the United States Air Force to reduce the human and monetary costs of 

attrition in the enlisted force. Cognitive measures such as the Armed Services 

Vocational Aptitude Battery and the Armed Forces Qualification Test have long 

been used effectively for academically based screening. Self reported 

biographical data (biodatal and personality measures have been used for 

screening noncognitive adaptability. 

Armed Services biodata techniques have included the Navy’s Recruit 

Background Questionnaire (RBQl and the Army’s Military Applicant Profile (MAP) 

and Assessment of Background and Life Experiences (ABLE!. Currently, the Navy, 

as Executive Agent, designed the Armed Service Applicant Profile, a combination 

of the best items from MAP and RBQ (Trent, Quenette, 6 Laabs, 1990; Laabs, 

Trent, 81 Quenette, 1989). 

Other studies have used a variety of personality measures to predict basic 

military training attrition. While Spielberger and Barker (1979) studied the 

relationships of trait and state naxiety on attrition from basic military 

training for both Navy and Air Force recruits, Butters, Retzlaff and Gibertini 

(lp861 used the Millon Clinical Multiaxial Inventory (MCMI) to predict 80% of 

mental health clinic recommended discharge versus return-to-duty dispositions. 

McCraw and Bearden (19881 have focused on motivational demographic, and 

personality test scores to technical training school students referred to a 

mental health clinic. 

Since the 1970’s, the Air Force has used The Air Force Medical Evaluation 

Test (AFMET 1 to screen out those basic recruits likely to attrite from Basic 

Military Training. Early work on the development and initial validation of the 

instrument included the studies by Lachar (19741, and Guinn, Johnson, and Kenton 

(19751. Bloom (1977, 1980, 19831 reported on the ongoing operational aspects of 

the program. The interested reader is referred to Crawford ‘8 (19901 review of 

the history of AFMET. This study reports on the efficacy of the instrument used 

in the first phase, the History Opinion Inventory (HOI), for predicting 9MT 

performance and Character of Separation. In addition, the Gordon Personal 

Profile (Gordon) and the Minnesota Multiphasic Personality (MMPI) are discussed 

in relationship to BMl’ performance and character of separation. 

METHOD 

Subjects. 

The total sample consisted of all USAF enlisted personnel whose total 

Active Military Service Date was calendar year 1985 through 1988 and who were 

also identified by Wilford Hall USAF Medical Center for testing on the AFMET, 

or 171,707 subjects (males = 158,601, females = 33,106). The number of 

-------s.------------ 

1 Disclaimer: The views expressed in this paper are thoae cf the author and do 

MJt neceSsarily repreaent thoee of the United States Air Force or the Departffient 

Of Defenee. Acknowledgments: The author thanks Melody Darby and Doris Black for 

their assistance in statistical analyses, Calvin Fresne for his assistance ln 

data management, and Malcolm Ree, PH.D. for his assistance throughout the study. 

392

ubjects in each analysis may differ a8 not all subject8 had data on all 

variablea. 

In8 trumente . 

Inetrumente include the HOI, a SO-item, true-false aelf reported history 

of legal, antisocial, school, family, and alcohol problem8 with a weighted total 

score range of 0 to 30. Higher scores indicate greater endoreement of problems 

prior to rrervice, The Gordon ir an 18-item questionnaire in which subjectr murt 

choose which is most and least like them. Four scores were obtained: Social 

Aacendoncy, Responsibility, Emotional Stability, and Gregariousnees. Scores were 

entered as percentiles, ranging from 1 to 00. The MMPI, a measure of 

psychopathology, ha8 nine clinical and three validity scales, with raw score8 

ranging from one to 58. 

Procedure. 

The (HOI) wa8 given on the second day of training to all United States 

Air Force baeic trainees to identify high risk recruits. The Gordon wa8 given on 

the 6-0th day of training, during Phase II testing of identified high risk 

recruits. Any recruit referred to a credentialed provider based on phase 11 

results was given the MMPI (currently the MMPI-21 prior to a clinical evaluation 

by a peychologist or psychiatrist. Only after an evaluation by a p8yChOlOgiSt 

was a recruit recommended for discharge or for return to duty. 

Analyses include analysis of variance, pooled variance t-test, Pearson 

correlation, multiple regression, Cronbach coefficient alpha and the 

Wherry-Gaylord estimation of reliability of composites. 

RESULTS 

History Opinion Inventory. 

Reliability as measured by the Wherry-Gaylord procedure for weighted 

composite8 was .84. Internal consistency among all the item8 was .57, using 

Cronbach’s coefficient alpha. The substantially lower reliability using 

Cronbach’s alpha demonstrate that the instrument is multidimensional and the 

Wherry-Gaylord is the more appropriate index of reliability. 

Table 1 8hOW8 that recruit8 who graduated from BACP had significantly 

lower 8core8 on the HOI. than those who were discharged. A correlation analyals, 

corrected for unreliability, showed the HOI accounted for 312 lr=.31) of the 

predictive efficiency for BMT graduation/discharge. 

Character of separation was divided into three groupa: honorable, less 

than honorable, and entry level separation. Significant difference8 on the HOI 

were found among the types of separation with the entry ievel separation (ELS! 

group accounting for the significant difference, a8 seen in Table 2. 

A correlation analysis, corrected for unreliability, showed the HOI 

accounted for 36% of the predictive efficiency for character of eeparation. 

The Qordon Personal Profile Inventory. 

Means of the Cfordon rubrcales were significantly different for graduate8 

v8 discharges from BMT. Table 1 depicts these result8. As shown in Table 2, 

honorable discharge8 had significantly different average scorss on the four 

subaca’lea compared to ELS. There wa8 a nonsignificant trend for le88 than 

honorable discharge average score8 to be lower than honorable and higher than ELS 

fez- all eubrcales, Due to the small number of recruits who have So far taken the 

Gordon and received leae than honorable diahcarge (N=I4) 1 these results were not 

reported in the t.abie. 

393

MEASURE 

Table 1 

HOI, Gordon, and BMT Performance 

MEASURE GRADUATED DISCHARGED F T-TEST 

HOI (N=158,671) (N=12,0111 

MEAN 3.1702 5.0417 

SD 2.804 4.868 

GORDON 

SOCIAL ASCENDANCY 11=2760) 

MEAN 57.0000 

SD 32.318 

RESPONSIBILITY 

MEAN 

57.0000 

SD 32.430 

EMOTIONAL STABILITY 

MEAN 46.1513 

SD 32.300 

(N=O20) 

26.2413 

20.780 

22.5359 

26.703 

14.5707 

23.829 

SOCIAL GREGARIOUS 

MEAN 51.1224 25.8902 

SD 31.755 28.861 

*(I( p < -01 **w p ( .OOl 

3.01*** il. la*** 

1.18** 26.63*ww 

1.47rur 32.c91** 

!.B5rr* 31.64r** 

1.21*** 22.39rrw 

Table 2 

HOI, Gordon, and Character of Separation -.-- .- 

HONORABLE LTH’ ELS 

F 

HOI (N=21,641) iN=6081 (N=15,603) 

MEAN 3.42s 3.66* 5.56 

SD 3.04 3.08 4.64 

GORDON (N=3331 

SOCIAL ASCENDANCY 

MEAN 52.04w 

SD 33.24 

RESPONSIBILITY 

MEAN 54.02s 

SD 32.66 

EMOTIONAL STABILITY 

MEAN 43.67* 

SD 32.54 

(N= 1045) 

20.63 

31.70 

25.76 

28.98 

17.45 

26.08 

1462.15*** 

61.06*** 

112.69*** 

112.56*** 

SOCIAL GREGARIOUS 

MEAN 45.38~ 

28.56 37,7o*rr 

* 

SD 32.81 ____- _ _--. 30.13 .__,____ .__._..._. --_--significantly 

different from ELS, p < .OOl it** p < .OOOI 

_.-...- 

1 These results are not reported for the Gordon because only 14 recruits hew 

taken the Gordon and received a Less than Honorable Discharde. 

ELS = Entry Level Separation LTH = Less Than Honorable 

394

-.- -. -.-- -. 

The Minnelrota Multiphreic Inventory (~~11. 

Table 5 ahowo the mean8 and standard deviation8 for the validity and 

clinical scale8 of thr hMP1 by gender and BMT performance. For maleis, average 

differences across all rcales were statistically significant (p.( .OOll and T 

profiles were clinically meaningful. For females, there were no significant 

differencea on one of the validity indexes, L, or on scale 8, mania. All other 

measured indices were significant at the .Ol level. 

Table 4 shows the means and standard deviations for the validity and 

clinical scalee of the Ml&!1 by gender and character of separation. For malee only 

Scales 8, L, and K did not significantly distinguish between ELS and honorable 

discharge ( p. < 0011. For females, there were no significant differences among 

Table 3 

MMPI and BMT Performance 

SCALE GRADUATED DISCHARGED T GRADUATED DISCHARGED T 

_, 

(N=7341 (N=6881 (N= 102) (N=I 181 

L MEAN 4.35 3.62 

SD 2.31 2.50 

F MEAN 10.20 17.24 

SD 7.00 9.49 

K MEAN 

SD 

11.63 

4.07 

9.93 

3.97 

HS MEAN 9.86 15.62 

SD 6*80 7.52 

D MEAN 24.15 31.20 

SD 7.61 7.74 

Hy MEAN 21.79 27.08 

SD 5.98 6.58 

Pd MEAN 23.43 27.37 

SD 6.11 6.05 

Mf MEAN 24.96 27.18 

SD 5.11 5.02 

Pa MEAN 13.36 17.34 

SD 5.28 5.65 

Pt MEAN 22.52 31.39 

SD 11.76 10.45 

SC MEAN 23.48 35.15 

SD 13.55 14.38 

Ma MEAN 21.32 22.42 

SD 4.72 5.40 

5.9a** 4.06 3.56 

2.29 2.23 

15.83** 9.08 15.64 

6.06 9.23 

7.24** 11.92 

5.01 

10.15 

4.39 

15.11** 11.21 17.44 

6.78 7.82 

17.31** 35.46 32.99 

6.16 7.99 

15.83** 24.20 29.42 

5.68 6.67 

12.25** 24.28 27.31 

5.88 6.41 

8.27** 34.94 

5.15 

36.92 

4.53 

13.69** 13.11 16.34 

5.07 4.98 

15.07** 24.03 31.75 

11.45 10.63 

15.72** 24.33 33.76 

12.41 15.20 

4.09** 21.24 21.46 

4.65 4.50 

1.64 

6.30** 

2.76* 

6.33** 

7.88** 

6.28** 

3.66** 

3.01* 

4.75** 

5.16** 

5.06** 

Si MEAN 32.78 41.95 13.33** 32.74 42.61 5.71** 

SD 

* p < .Ol 

13.29 

** p < .OOl 

12.64 12.20 13.44- - - - 

395 

0.36

SCALE 

L MEAN 

SD 

F MEAN 

SD 

K MEAN 

SD 

He MEAN 

SD 

D MEAN 

SD 

Hy MEAN 

SD 

Pd MEAN 

SD 

Mf MEAN 

SD 

P a MEAN 

SD 

Pt MEAN 

SD 

SC MEAN 

SD 

Ma MEAN 

SD 

Si MEAN 

SD 

.-.-- .-_- 

Table 4 

hMP1 and Character of Separation - 

KALES 

HONORABLE ELS F 

IN=1451 (N=735) 

4.2897 5.6857 4.43 

2.1243 2.2646 

10.6138 16.8395 28.7291** 

6.7373 9.4024 

11.2690 10.0027 6.5935@ 

4.6685 4.0420 

9.0552 15.3537 45.3225** 

6.0882 7.5637 

23.2759 30.8204 57.9092** 

6.7757 7.0753 

21.3241 26.8381 44.9178*# 

5.7275 6.5806 

23.4897 27.2327 23.6745~~ 

5.6349 6.0954 

24.3862 27.0544 17.4523~~ 

4.9261 5.1062 

12.8552 17.1537 35.4004** 

5.4250 5.6698 

21.1034 30.8762 50.8987** 

10.5795 10.7211 

22.6207 34.6014 43.2064*# 

12.7247 14.5042 

21.5793 22.4503 1.8347 

4.5760 5.3862 

32.8062 41.3429 28.6511** 

11,4038 12.9246 

Y = p < .Ol ** = p ( .OOOl 

396 

FEMALES 

HONORABLE ELS F 

(N=26) (N=126) 

3.9231 3.5317 0.39 

1.9167 2.2043 

10.0385 i5.2063 4.7042 

5.0713 9.1933 

10.4231 10.2063 1.7106 

3.4195 4.3567 

11.5385 17.1905 6.3549* 

5.7218 7.8798 

26.5000 32.5873 6.9555W 

5.4498 8.0987 

24.1538 29.2540 6.766CW 

5.7667 6.7539 

25.6538 27.0079 2.2758 

4.0094 6.5976 

35.4231 36.8254 1.0651 

4.7428 4.5274 

12.8462 16.2063 5.6570~ 

4.5316 5.0045 

25.7692 31.4444 4.3895 

10.0332 10.7804 

25.5385 33.3571 4.0622 

11.0099 15.1385 

20.6923 21.4762 1.3661 

3.6306 4.5532 

36.8846 41.9683 2.0088 

10.8271 13.6510

the scales based on character of separation. As only eleven males and one female 

who had taken the MMPI had received a less than honorable discharge, this category 

was not included in the analysis. 

CONCLUSIONS 

It is concluded that the HOI as the first part of a psychiatric screening 

inventory to predict BMT performance is both reliable and valid. It also predicts 

character of separation, effectively contrasting those who receive entry level 

separations from those who are honorably discharged or those who are less than 

honorably discharged. 

Current research on the AFMET will determine the predictive validity, 

reliability, and clinical meaningfulness of all aspects of AFMET In 

relationship to Basic Military Training, technical school performance, 

unfavorable information, eligibility for promotion, and character of 

separation. Based on these findings the AFMET will be revised and refined to 

increase predictive and clinical efficacy. 

REFERENCES 

Bloom, W. (1977) Air Force Medical Evaluation Tests 

Digest, 2&, 17-20. 

. USAF Medical Service 

Bloom, W. (IQ801 Air Force Medical Evaluation Test (AFMETl Identifies 

Psychological Problems Early. USAF Medical Service Digest, 31, 8-Q. 

Bloom, W. (19851. Changes made, lessons learned after mental health 

screening. Military Medicine, 148, 889-890. 

Butters, M., Retzlaff, P., & ffibertini, M. (19861. Non-adaptibility to basic 

training and the Millon Clinical Multiaxial Inventory. Mi 1 i tary 

Medicine, 151, 574-576. 

Crawford, L. (19901. Development and Current Status of USAF Mental Health 

Screening. Manuscript submitted for publication. 

Ctuinn, N., Johnson, A., & Kenton, J. (19751. Screening for Adaptability to 

Military Service (AFHRL-TR-75-301, Brooks AFB, TX: Training Systems 


Laabs, Q., Trent, T., & Quenette, M. (1989). The adaptability screening 

program: an overview. Proceedings of the 318t Annual Conference of the 

Military Testing Association, 434-439. 

McCraw, R., & Bearden, D. (19881. Motivational and demographic factors in 

failure to adapt to the military. Military Medicine, 6, 325-328. 

Spielberger, C. & Barker, L. (19791. The Relationship of Personality 

Characteristics to Attrition and Performance Problems of Navy and Air 

Force Recruits (Contract No. MDA 903-77-C-0190). Orlando, FL: US Navy 

Training Analysis and Evaluation Uroup. 

Trent, T, Quenette, M., & Laabs, cf. (1990, August). An Alternative to High 

School Diploma for Military Enlistment Qualification. Paper Presented at 

the Q8th Annual Convention of th American Psychological Association, 

Boston, MA. 

397

Implementation of the Adaptability Screening Profile (ASP)* 

Thomas Trent, Mary A. Quenette, and Gerald J. Laabs 

Testing Systems Department 

Navy Personnel Research & Development Cente? 

San Diego, California 

At last year’s MTA symposium concerning the implementation of a biographical 

instrument (Adaptability Screening Profile/ASP) into military enlistment screening (Sellman, 

1989), we described technical issues (Trent, 1989), data analysis plans (Waters & Dempsey, 

1989), a methodology for controlling item response distortion (Hanson, Hallam & Hough, 

1989), and plans for accelerated implementation (Laabs, Trent & Quenette, 1989). While we 

made considerable progress towards these stated goals, the operational start and field test o.f 

the ASP has been postponed while the Armed Services review implementation options. This 

paper summarizes ASP objectives and updates the research results. In addition, unresolved 

implementation issues and preliminary plans for the development of a new Department of 

Defense (DOD) enlistment screening algorithm are described. 

The Problem Revisited 

Since WorId War II, the Services manpower and personnel research laboratories have 

conducted research on a variety of biographical and other noncognitive assessments for 

personnel screening (Laurence & Means, 1985). Nonetheless, the quota restriction that the 

Services place on the proportion of non high school graduates has operated as the primary 

attrition controlling screen. As an increasing number of high school “dropouts” earn 

alternative education credentials (e.g., adult school, high school equivalency certificate, 

certificates of attendance, and occupational programs), the U.S. Congress and advocacy groups, 

such as the American Council on Education, have requested DOD to augment educational 

enlistment criteria with a screening instrument that measures attributes of the individual 

applicant that are related to adaptation to military life and the probability of completing initial 

obligated service. 

Opposition to basing enlistment eligibility on educational group membership has 

intensified since a 1987/1988 DOD classification of educational credentials into three eligibility 

tiers. Table 1 shows that attrition during. the first year of enlistment varies considerably across 

and within the tiers by type of education credential. While Tier I applicants are given highest 

priority for enlistment3, the attrition rates for adult schoolers (23.6%) and recruits with one 

‘Paper presented at the 32nd Annual Conference of the Military Testing Association at Orange 

Beach, Alabama, November, 1990. 

‘The opinions expressed in this paper are those of the authors, are not ofliciai, and do not 

necessarily represent those of the Navy Department. 

‘The relutively small numbers of Tier II & Tier III non high school graduate applicants who arc 

selected must also score considerably higher on the Armed Services Vocational Aptitude Buttcrv. 

398 

.

school diploma graduates (10,6%, 14.3%, and 13.5%. respectivelY). 

Procedures 

Table 1 

Twelve Month Attrition Rates by Education Lcvcl 

DOD Fiscal Year 1988 Accessions’ 

Tie&ducation Lcvcl 

Tier I 

Hig$?%ool Graduate 

College 

One Semester 

2 Yrs or more 

Adult Education 

Tier II 

H.S. Equivalence Certificate 

Oct. Program Certificate 

H.S. Certificate of Attendance 

or Completion 

Correspondence 

Home Study 

Number of Percent 

Accessions Attrition 

235,388 13.5 

2,092‘ 21.1 

6,228 8.0 

275 23.6 

9,843 23.8 

98 14.3 

1,018 19.8 

87 24.1 

47 10.6 

Tier III 

No H.S. Diploma 5,350 26.6 

‘Non-prior service, active duty, E = 260,426. 

Two alternate forms of the ASP (Part 1) were developed, each consisting of SO items 

in multiple choice format with two to five response options. The items sampled construcIs 

representing delinquency, ‘academic achievement, career and work orientation, athletic 

involvement, and social adaptation. Item option scoring weights were developed utilizing 

Guion’s (1965) “horizontal percent” method in a randomly assigned scale construction sample 

(N = 26,857, Army, Navy, Air Force, and Marine Corps combined samples). This resulted 

in a three-point item scale and a single total summed score. In a national sample of military 

applicants (N = 120,175), the mean item reliability (item to rota1 score correlation) was .2 1 

and the estimates of internal consistency were .76 and .74 (coefficient alpha for the two 

forms). The predictive validity of the ASP was compared to the folIowing measures: Armed 

Forces Qualification Test (AFQT); education credentials (2 years college, high school diploma, 

high school equivalency certificate/GED, and no secondary credential); employed at time of 

application; 17 years of age at time of service entry; and eligibility waiver status as a result 

of preservice misdemeanor or felony arrests. 

The criterion is a dichotomous measure of attrition. Personnel who were voluntarily 

or involuntafily dishaged from serbice prior to the completion of their service contracts were 

coded ” 1”. Those personnel with medical disability, officer school discharges, service breach 

of contract, and the dead were excluded from analysis. All other personnel were coded as “0”. 

399 

.

The criterion is a dichotomous measure of attrition. Personnel who were voluntarily 

or involuntarily discharged from service prior to the completion of their service contracts were 

coded “I “. Those personnel with medical disability, officer schoo! discharges, service breach 

of contract, and the dead were excluded from analysis. All other personnel were coded as “0”. 

The biodata instrument (ASP-I) was administered to all active duty military applicants 

in the United States for a three month period (N = 120,175). The sample utilized in the 

following analyses consisted of 55,675 personnel who enlisted after the applicant 

administration. The applicant and accession samples were generally representative of military 

populations (Trent, Quenette, Ward & Laabs, 1990). 

Results 

points. 

Figure 1 graphically portrays average attrition rates at each of the biodata raw score 

I 

I 

0 

LI.,, ,,,,, ,__ .,,,,:,l,,,,,*,,,,,,,,,,,l,,,/,,jl 

m Do m 100 tm 110 11, 1-m tra 

ASAP Raw Score 

130 tl5 

Ftgure 1. Attrition rates by ASP-l score 

Table 2 shows the simple and incremental validities with the biodata score (ASP-l) 

forced into the regression equation last. This analysis was performed on a random one-half 

of the sample (“model construction” group; E. = 26,991). Aside from ASP-l and AFQT, the 

predictor variables were dummy coded. Validities for high school diploma, two or more years 

of college, AFQT, age 17, no credential, GED, and ASP were corrected for restriction of range 

using a univariate formula (Thorndike, 1982). Validities for employment status and 

misdemeanor/felony were not corrected because operational selection procedures resukd in 

larger accession sample variances as compared to applicant sample variances. 

The trJe 

unrestricted variance of the misdemeanor/felony measure is unknown since most potential 

applicants in this category are screened out at the recruiter level and do not reach the applicant 

testing stage. 

400 

.

criterion.(-.09). 

Variabk? 

Table 2 

ASP-l Incremcntnl Validity - DoD Sample’ 

Zcro- Incremental 

order Multiple Change 

I” r,’ R R2 F P 

Srepc 

1. HS Diploma -.14 -.I9 .14 .021 565.0 .OOO 

2. 2 Years College -.03 -.04 .17 .030 272.9 .ooo 

3. Employed -.09 .19 .O36 152.5 .ooo 

4. AFQT Pcrccntile -.06 -.07 .20 .039 92.8 .ooo 

5. MisdemcanorlFclony .04 .20 440 25.3 .ooo 

6. No Credential .I3 .17 .20 44 1 23.1 .ooo 

7. GED .09 .lO .20 .041 13.8 .ooo 

8. Age 17 .05 .07 .20 .042 9.2 .002 

9. ASP-l -.25 -.27 .27 ,073 912.0 .ooo 

‘DOD Accessions, Model Construction Group, B = 26,991. 

bAIl predictor variables arc indicator variables (dummy O/I coded) 

cxccpt ASP-l and AFQT scores. 

“All correlations are significant at .05 Icvcl.’ 

dCorrclations (validities) corrected for restriction of rang (univariatc 

correction; Thorndike, 1982) 

‘Order of entry of variables in steps 1-8 was dctcrmincd by prior stcpwisc 

proccdurc. ASP-l was forced into the equation last. 

Conclusions and Implementation Issues 

In the research mode, the use of the Adaptability Screening Profile for enlistment 

screening demonstrated incremental validity in addition to operational screens and other 

potential measures to minimize attrition and to improve the match between the demands of 

military service and the background and temperament of individuals. The utility of employing 

the ASP will vary as a function of the selection ratio* and the stability of the ASP in 

operational mode (see Trent, et al. 1990 for a more complete discussion of ASP utility). 

The research results support the contention of the American Council on Education that 

alternatives to the existing three-tier educational quota system are technically feasible. On 

.the other hand, educational attainment has a proven track record of good predictive validity 

and is in fact one of the most reliable of the biographical measures. From a technical 

perspective, type of education credential should be included in an array of adaptability 

indicators that samples the “whole person.” The approach of the ASP research program has 

been to operationalize constructs related to individuals’ adaptability to institutions in general 

and the likelihood of persistence in military trainin, (7 and occupations in particular. The 

biodata score resulting from the ASP is an economical method of capturing personal 

background data. In addition, a new research effort is underway at the Navy Personnel 

‘The proportion of qualified recrl& needed to meet manpower goals to the foul numlxr of’ 

military applicants. 

401 

.

Research and Development Center and the Human Resources Research Organization to 

construct a DOD attrition prediction model that could be used in a “compensatory” enlistment 

eligibility system (Laurence & Gribben, 1990). In such an algorithm the applicant’s qualifying 

score would be determined by a combination of measures such as aptitude test scores and 

personal background data, including educational achievement, criminal justice history, and 

employment history. The validity of this proposed screening model, as well as plans for DoD 

implementation, is planned for presentation at next year’s MTA conference. 

Two related issues have stalled the field rest of the ASP. In that the principal objective 

of the operational test was to evaluate the performance of the self-reported biodata in an 

operational mode, eligibility cutting scores were established to eliminate the bottom 10 percent 

of otherwise qualified applicants. This was a necessary condition to gain a realistic 

environment of recruiter coaching and applicant dissimulation to test for operational score 

inflation and possible validity degradation. The prospect of rejecting high school diploma 

graduates, especially in the upper “mental groups,” proved to be extremely unpopular among 

the Services. Secondly, the DOD is considering the feasibility of avoiding the “multiple 

hurdle” impact of the ASP field test by implementing the instrument within the new 

compensatory screening algorithm that is under development. Thus, the initial efficacy of the 

ASP would rely upon validity estimates from the non-operational administration (E = 120,175). 

Until score monitoring provides operational data, the uncertainty about the impact of recruiter 

coaching and applicant “faking good” on score distributions and predictive validity will remain 

unresolved. At present, the ASP relies upon empirical scoring and verification warning 

statements to minimize score inflation. Moreover, experimental studies (e.g., Trent, Atwater 

& Abrahams, 1986; Trent, 1987; Hough, Eaton, Dunnette, Kamp & McCloy, 1990) indicate 

that the problem of item response distortion is minimal. That is, applicants’ responses do not 

demonstrate extreme distortion and validities of biodata instruments are not seriously moderated 

by distortion. 

REFERENCES 

Guion, R. M. (1965). Personnel testing. New York: McGraw-Hill. 

Hanson, M. A., Hallam, G. L., & Hough, L. M. (1989, November). Detection of response 

distortion in the Adaptability Screening Profile (ASP). Paper presented to the 31st Annual 

Conference of the Military Testing Association, San Antonio, Texas. 

Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). 

Criterion-related validities of personality constructs and the effect of response distortion on 

those validities. Journal of Applied Psychology, 75 (5). 

Laabs, G. J., Trent, T., & Quenette, M. A. (1989, November). The Adaptability Screeninp 

Pro.gram: An Overview. Paper presented at the 31st Annual Conference of the Militaq 

Testing Association, San Antonio, Texas. 

Laurence, J. H., & Means, B. (1985, July). A description and comparison of biographical 

inventories for military selection. (FR-PRD-85-S). Alexandria. VA: Human Resources 


402

Laurence, J. H., & Gribben, M. A. (1990, July). Militarv selection strateeies (FR-PRD-90- 

15). Alexandria, VA: Human Resources Research Organization. 

Sellman, W. S. (1989, November). Implementation of biodata into militarv enlistment 

screening. Symposium presented at the 31st Annual Conference of the Military Testing 

Association, San Antonio, Texas. 

Thorndike, R. L. (1982). Anplied psvchometrics. Boston, MA: Houghton-Mifflin Company. 

Trent, T. (1987, August). Armed forces adaptabilitv screening: The problem of item response 

distortion. Paper presented at the American Psychological Association Convention, New 

York, NY. 

Trent, T. (1989, November). The Adaptability Screening profile: Technical Issues. Paper 

presented at the 31st Annual Conference of the Military Testing Association, San Antonio, 

Texas. 

Trent, T., Atwater, D. C., & Abrahams, N. M. (1986, April). Experimental assessment of item 

response distortion. In Proceedings of the Tenth Psycholonv in the DOD Symposium. 

Colorado Springs, CO: U.S. Air Force Academy. 

Trent, T., Quenette, M. A., Ward, D. G., & Laabs, G. J. (1990). Armed Service Applicant 

Profile (ASAP): Development and validation (in review). San Diego, CA: Navy Personnel 

Research and Development Center. 

Waters, B. K., & Dempsey, J. R. (1989, November). Development of the Adaptability 

Screeninp Profile score monitoring svstem. Paper presented to the 31st Annual Conference 

of the Military Testing Association, San Antonio, Texas. 

403 

.

FOR U.S. NaVY TYF’I NG F’EHFOI~MANCE TESTS 

MASTER CHX EF YEOMAN STEVE D. MCGEE, USN 

NAV;,;L EDUCAT I ON TRA IN X NG PI:‘;OGF\‘AM MANAGEMENT SUPF’OHT ACT IV I TY 

The Department of the Navy is considering the use of: word 

processnrs/personal computers in typing per+ormance tests. 

F’j--e-ajpf~t 1 y .tp,ese te%sts are accomplished LttiliZing elect.ric 

kypewri ,ters. l-hi LL~ report presents resul. ts of a stt.ldy to 

determi ni? the f easi bi 1 ity of using word pr-ocessor-~/persi~rlal. 

computers versL1s the electric typewriter for typing perf:or inance 

tests. 

Purpose 

The put-pose of the study is to determine if typing 

perfc3rmanc:e tests could be pcrf ormed with word 

proc~~~,c.)r-~~/persc~nal computers thereby speeding up word product i on 

AS We1 1. &5; ~~l:C:Uri+.Cy. 

MethodoX oc~y 

SLI~.I j cct i) wet-k2 J51’3 en1 i r;ted U. S. Navy personnel (f_-1 tf.,r-L, E:-.. 

6) within the administrative and supply communities that raqulre 

typing per+orrri~knc:e tests. Sub jtllct:!s were rcindomly se1 ected from 

throughout the Navy. Al 1 0.1: t:l~e sub jectc-; haci prior key board 

experience on the typewriter and 2t5 subjects had experience with 

the word proce~sor/per~onal computer in thei r normal day-,to-day 

work . 

Twa c;fficj.~~l U.S. Navy typing performance tesis cprtict.:!cil) 

were used +r-urn series 87 published by NETF’MSA. The standard 

t?li?ctric:: typewriter (IBII Selectric, Selectric II, and Selectric 

III) were utilized +or the typewriter portion o-i: both exams krhile 

‘ t h e wet-cJ C‘,~ocl:t?ssar/persi:)~1al ccJlTlpL~tt?r porti cjr-1 Of the fi:.:Jms kiel"k 

iidmi r1.i st.ered usi ug the XEEOX 13621, IBM FC, WANG F’C, ZENITH 245, 

i-c I1 cl cf P T . 

On day one, the subjects were admini stered Tes,t A and timed 

for five minutes Losing the typewriter-. On the same day, they 

were adrrlirrlc,t.ered Test. A and .ti med fur. five minutes Losing thp 

word processor/personal computer - Or\ day two, procedureci wer-e 

r’evorse, for e:.:ample, the subjects were ,xdmi r,i c,,l:ered ‘rest I? and 

timed +ot- five minutes using the word processor/per~onaL 

C~OillpL.lti?r~. They khen were tested using the typewrltet-. Scot-i nq 

hiis dcjrit? b y 1. ine at a rate r:,t: f ~.ve 1::ey stroI-::es per word wl th t71tiCk 

c- r r 0 r 5., I .t ti 5; ‘t. r t* c .t i r‘l g 5 (?I 1:: e y 5 t. r (3 1.:: c? 5 i: r 08-n t he .t 13 t a 1 5 t r- 0 I:. ~2 !is . 

_rm .>*.; .-.. L_-..r_l._ ,_.__.

Subjects were al lowed to use the automat i c wrap--around (autcmat1c 

return 1 and backspace ieatures o n the word processor/per~~onal 

computer. Al 1 tests were propel-1 y mC2n.i tor-ed by local command 

5;c.cper vi 5tr~:, . 

Rl.?sul t.s 

The results of the administered tests showed that the. 

.3Vet-‘age &Or-d pet- minute (WF’M) production was 42.5 using a word 

proce:2sor/personal computer and 35.5 WPM using a11 electric 

typewri t.er. Table 1 is an illustration of these results by T.tiat ., 

A and Test E. fi repeated measurec3 analysis of VariNiCe ruas 

cor~duct.ed on the word processor/per5onal computer and typowri ter 

words per minute data. Post-hoc test 011 the data indicate thhtr 

(a) for- test A the subjects perfor’mecl si yni ,+i cantly better on the 

word processor/personal computer than on the electric typewriter, 

(b) for test B the subjects performed si gni+i cant1 y bettar on the 

word pl-o(_e~j.~or/persollal computer than on the el ectr i c typewr i ter , 

!c) f o r either test sub ject:i performed equal 1 y wel 1 using the 

word processor/personal computer ‘I and (d) using the typewriter 

subjects pet-f armed signi+icantly bet.ter o n test E -than on tecjt ii. 

Thi 5 is i 11 ustrGttrd at Table 2 Showing a breakdown by paygrade. 

The test rv:sul ts demonstrate that productivity is increased 

b y an aver&ge of 7 WPM using a word proces*sor/personal computer. 

Therefore, it would prove advantageous .For the U.S. Ni3Vy t0 allOW 

the word PrcjcC5sCjr/Per~onal computer to be uz:,ed for typing 

~erfOrmGnCS tests. Additionaly, it id3 recommended that the word 

per rrli rlute rf2qui rement be i rrc::rrased by .2X fur the Yeoman rating 

since they are t-he individuals who accomplish the rnhjority of the 

Navy’ 5 text typing as opposed to form typing. Furthermore, if 

the word production for the Yeoman rating is increased by .2X, 

then the wrap-around and backspace features Should be authorized 

since ti-tics 1s a f eatut-e tf~at i c; uti 1 i zed 01-t a day--to-day basi!s by 

t.hese typists. 

405

--- --.----- _.-.. ..-- . ..-_ ---__ ---. 

” 

i 2..-’ ,:. i ,..z ..,. i’ ,,..” f” .,.. / ,.i” ,,,X ,,,. .” ,,.i’ ..,. .“’ ,,.” /’ ,,.. .’ :” ,, .” ,,. 

).. ;;.. .,,,.i ,#. .,.’ -.” ,.: .?’ *..’ -..’ _.: . . . ..’ _... : ;.’ ,d .’ i _’ ” 

t ;,. -,,, ,;. .,.. .- ;.,.’ ,...’ . ..‘i ,...’ /’ ;’ ,,.. .,- ,...’ . . . . . . . . . ..’ ,..” I’- :’ :’ ;.-:’ 

.i 

y /. / /. .’ . ..’ / ,,/.l ....‘,. _/ /’ ,,,.” ,;.. .,’ ,.,.. i’ ;-:.‘. ,.: :’ ,.‘.. .. ,; ? : ” ,/. i _,. 

t, /‘/..I /....‘~y; /. (,.... ./ / .’ ,... 2 ,/” . ...” ,... .. ‘. .,,. .. ‘. . . . .” ..f ;.l ” .” ,_.: .., ,, ., ,... .j ,,,_ ..: .. ,,_ ,. . . . . ;; ,.: ” ,, 

/’ ,,.-’ ;,. ,’ ;.-,’ ,.i .,.- .,i .’ ..’ 

,. /. / ,,.. ,.. ,,. . . : : i . ...’ .’ / ,.’ ,. -. ..f 

;.i ,, .’ ,,. ,_.’ ,: ,,. ,.. . ..’ ,:’ ,;.’ (...’ ..’ .’ ‘. ,_. : .’ ; .’ 

i ‘.., .., . ..,,, ‘... ‘... “... ‘.,, ‘..,, ‘\ “... “.*, “.. ‘..., ‘, ..,. ” ..,. 

. . . . .,, . . . . ‘., ‘.., . . ‘.. 

., ., ‘.. ‘. ‘, ..,, ‘.., x., ‘. 

i ‘. ..,, ‘. . . . . ., ‘,, x., ‘..,, ‘..,, . \ % ..,, ‘.,, . . . . . ‘\; ‘. .,,, 

“..,, ‘... ‘.,, 1. 

. . 

‘.. ‘... c / ., 

p., ‘L. 

‘.., . . . . ‘.. %. . . . . . ‘. 

i . . . . i., . . . . . . . i., % ‘.,. . ‘\ -. ‘.. x. L ‘,.. . . . . ., .( ., “., .‘.,, ‘.-., 

: %. “..., ‘. . . . . -. .\ ‘. ..,, ..,, ‘. ‘.., ‘. . ..., ‘5 ,, .‘..,, ‘... ‘..., 

‘i. 

,, 

. . . . ‘. ..,, .,., .., .,, .., . . . . ‘... ‘. ..(, 

‘, . . . 

I. ‘, ..,. ..,_ “.. . .-., k. ‘., -.., .,,, .., ‘.. “..,. ‘5 x., ‘l,, ‘L., 4, I ., c. . . ‘.,, .,, ._ 

-.. ‘k., ‘-., . . ..( 

! .,,, ._, . . . %.. 

‘.., 

i.... ‘...., ‘?. ‘.., ‘... “... ‘x ‘.., ‘... 

‘... . . . . “... .., ‘..., ‘.. “.. ‘. ._(, ..,, ‘.._., 

! ‘..,, ‘i.., “.... ,, ‘.. 

“.. “.., ‘.., . ‘...., ‘. . . . . ‘., ._ . . ‘.. . . ‘. .;. x., ‘... i 

“..., ‘. ..,, .., ‘.., 

‘... 

‘. ‘.., 

..,, .; ‘.. ‘%. ‘.., 

;. ‘. .,.., ‘. ..; . . . . i . . . . . . . 

‘k; ‘X., .%.. . . . . . . . . . . 

j ‘;, ., ‘.. ., 

‘. ..,, .. \, 5 c., .., 

‘XC 

.,, “‘. x., x.. ‘.., . ,, . >. ‘.... . . ‘; . . . s_ %., . . . . k., :.., 

‘., ._ ‘L, ,, ., .‘.. 

406

-- 

I

Acute High Altitude Exposure and Exercise 

Decrease Marksmanship Accuracy 

W.J. Tharion, B.E. Marlowe, R. Kittredge, 

R. Hoyt and A. Cymerman 

United States Army Research Institute of Environmental Medicine 

Natick, Massachusetts 01760 

ABSTRACT 

Many moderate to high altitude areas occupy militarily . 

strategic parts of the world. This study quantified the 

effects of endurance exercise, acute altitude exposure (AAE) 

and extended altitude exposure (EAE) (16 days at 4300 m), on 

marksmanship performance. Sixteen experienced male marksmen 

fired a de-militarized M-16 rifle equipped with a Noptel ST- 

1000 laser system from a standing unsupported position at a 

2.3 cm diameter circular target from a distance of 5 m. 

Subjects were tested at rest and after a maximal 20.4 km 

run/walk ascent from 1800 m to 4300 m, following AAE and EAE. 

Sighting time (the interval between a signal light to fire and 

trigger pull) and accuracy (distance of shot impact from 

target center) were measured. Exercise and time at altitude 

had independent effects on marksmanship. Sighting time was 

unaffected by exercise, but was 8% longer following EAE (5.61 

+ 1.25 set AAE vs 6.06 ?I 1.06 set EAE (mean i SD), ~C.05). 

Accuracy was reduced 11% by exercise (3.63 ?: 0.69 cm at rest 

vs 4.01 + 0.89 cm post exercise, ~

-~-- -___- - 

Subjects Sixteen soldiers, 18-39 years of age, volunteered for the study. 

Subjects were not from nor had they lived during the three months prior 

to the study at altitudes greater than 1500 m. All subjects were 

experienced marksmen prior to study participation. 

Eouipment 

Marksmanship performance was quantified with a Noptel ST-1000 (Oulu, 

Finland). laser marksmanship system. The system consists of a laser 

transmitter attached to a de-militarized M-16 rifle, a laser switch, an 

optical target, a personal computer, printer, and software provided by 

, Noptel. 

TAELE 1. TESTING SCHEDULE FOR WARKWSHIP kdEF&uRES. 

DAYS 1-5 SEA LEVEL 

Days I-4 Marksmanship Training 

Day 5 Marksmanship Assessment 

DAYS 6-23 4300 M ALTITUDE 

Day 6 Marksmanship Assessment, Acute Altitude Exposure, Fatigued State 

Days 'l-9 Marksmanship Assessment, Acute Altitude Exposure, Rested State 

Days lo-19 No Testing 

Days 20-22 Marksmanship Assessment, Extended Altitude Exposure, Rested State 

Day 23 Marksmanship Assessment, Extended Altitude Exposure, Fatigued State 

Procedure 

The schedule of testing is shown in Table 1. On Day 6, subjects 

ascended (2500 m vertical ascent) 21 kmto the summit of Pikes Peak (4300 

m) as quickly as possible. Within 5 minutes upon completion of the 

ascent marksmanship was assessed. Subjects then resided for 16 days at 

the summit. On Day 23, subjects were returned to the base of Pikes Peak 

for a second ascent and subsequent marksmanship assessment. Each 

marksmanship test consisted of a total of 20 shots. Subjects were 

instructed to shoot at will for the first ten shots to obtain the best 

accuracy score possible. 'For the second ten shots, subjects were 

instructed to shoot as fast as possible without sacrificing accuracy 

(speed and accuracy). During the latter assessment, subjects were 

required to hold the barrel of the rifle below their waist. Following 

a verbal ready signal and a l-10 set randomly-varied preparatory 

interval, subjects were signalled to shoot upon illumination of a red 

stimulus light. Subjects shot in the free standing unsupported position 

from's distance of 5 m at a 2.3 cm diameter circular target. 

RESULTS 

A significant effect of altitude condition was observed for distance 

409 

.

__ _________..-.-... 

._ - -___. - ----.. 

from center of mass (DCM) Q5.03). Post-hoc t-test analysis revealed 

that DCM for the accuracy-only test was greater @

The effects of both altitude exposure and fatigue on the various 

marksmanship parameters are summarized in Table 2. When shooting 

exclusively for accuracy, significant differences assessed via ANOVA 

existed for DCM &.Ol) and shot group tightness (SGT) &.02). Acute 

altitude exposure elicited a greater DCM and a more dispersed shot group 

than at sea level or after extended altitude exposure. When shooting for 

both speed and accuracy, DCM (e

_^,--_ ---. 

, 

at altitude, pi;;;: shooters als;ffired more quickly but less accurately. 

He suggests feelings sickness and increased physical 

symptomatology (acute mountain sickness) experienced in the first few 

days of altitude exposure lead to lowered motivation to perform well, 

presumably because of one's preoccupation with bodily discomfort. It is 

also possible that subjects become impatient trying to maintain a good 

aiming point with increased body sway encountered at altitude (Fraser, 

Eastman, Paul and Porlier, 1987). They may then shoot prematurely, 

resulting in the decrease in sighting time. It is speculated that 

subjects may feel that taking additional sighting time would not improve 

their accuracy. Another possibility may be that subject's time 

estimation is affected. Time may seem to pass more quickly than it 

actually does. 

Upon acclimatization to altitude, individuals took 8% longer (Acute 

Altitude Exposure 5.61 set vs Extended Altitude Exposure 6.06 set [means 

of rested and fatigue conditions combined]) to sight the target. The 

extra time apparently enables increased accuracy of shooting. Increased 

respiratory rate is among the physiological adaptations that occur with 

acute exposure to altitude, the faster the respiratory rate the more 

breaths that are missed during the breath-holding phase of aiming and 

pulling the trigger. This may increase discomfort associated with 

breath-holding and thereby decrease sighting time. 

While shooting at altitude, DCM, a measure of accuracy was 11% 

greater after exercise (4.01 cm) than for the rested condition (3.63 cm) 

[means of acute and extended altitude exposures combined]. Sighting time 

was not affected by fatiguing exercise. In contrast to the present 

results, Evans (1966) found accuracy was not affected by fatigue but 

firing latency was. Other previous findings proposed increased body sway 

after exercise as an explanation for reduced shooting accuracy of 

soldiers after a forced march (Knapik, Bahrke, Staab, Reynolds, Vogel and 

O'Connor, 1990), and biathletes after cross country skiing (Niinimaa and 

McAvoy, 1983). Increases in heart rate resulting from intense aerobic 

exercise also may impair shooting proficiency. Heart rate control by 

beta-blockers (Kruse, Ladefoged, Nielsen, Paulev, and Sorenson, 1986; 

Siitonen, Sonck and Janne, 1977) or biofeedback techniques (Daniels and 

Hatfield, 1981) are possible remedies. 

If military forces are to be prepared for deployment in a high 

terrestrial environment, it may be advantageous to have them training . 

routinely at high altitude. These results showed marksmanship accuracy 

returned to normal after two weeks residence at altitude. For events 

such as the biathlon and shooting competitions, athletes may benefit from 

both acclimation to altitude prior to competition and routine training 

at altitude. 

Daniels, F.S. & Hatfield, B. (1981). Biofeedback. Motor Skills: Theorv 

Into Practice 2, 69-72. 

Dusek, E.R. & Hansen, J.E. (1969). Biomedical study of military 

performance at high terrestrial elevation. Militarv Medicine, 134, 1497- 

1507. 

Evans, W.O. (1966). Performance on a skilled task after physical work 

or in a high altitude environment. Perceptual and Motor Skills, 2, 371- 

380.

___ .- _- .~- __--_.--~~ .--. --.- _ --.-- _ 

Fraser, W.D., Eastman, D.E., Paul, M.A., C Porlier, J.A.G. (1987). 

Decrement in postural control during mild hypobaric hypoxia. Aviation, 

Space and Environmental Medicine, 58, 768-772. 

Fulco, C.S. & Cymerman, A. (1988). Human performance and acute hypoxia. 

In Human Performance Phvsiolocy and Environmental Medicine at Terrestrial 

Extremes. KB Pandolf, MN Sawka, and RR Gonzalez (editors) Benchmark 

Press, INC. Indianapolis, IN, pp. 467-495. 

Knapik, J., Bahrke, M., Staab, J., Reynolds, K., Vogel, J., C O'Connor 

J. (1990). Frequency of loaded road march training and performance on a 

loaded road march. United States Army Research Institute of 

Environmental Medicine Technical Report. T13-90, pp. 18-25. 

Kruse, P., Ladefoged, J., Nielsen, U., Paulev, P.E., C Sorenson, J.P. 

(1986). Beta-blockade used in precision sports: effect on pistol shooting 

performance. Journal of Anplied Phvsiolocv, 61, 417-420. 

Marlowe, B., Tharion, W., Harman, E., & Rauch, T. (1989). New 

computerized method for evaluating marksmanship from Weaponeer 

printouts. United States Army Research Institute of Environmental 

Medicine Technical Report. T30-90. 

Niinimaa, V. & McAvoy, T. (1983). Influence of exercise on body sway 

in the standing rifle position. Canadian Journal of Applied SDort 

Science, 8, 30-33. 

Siitonen, L., Sonck, T., c Janne, J. (1977). Effect of beta-blockade on 

performance: use of beta-blockade in bowling and shooting competitions. 

Journal of International Medical Research, 2, 359-366. 

413

__-----,_ __---__. 

HUMAN PERFORMANCE DATA FOR COMBAT MODELS 

COLLINS, Dennis D., Department of the Army, The Pentagon, 

Washington, D. C. 

The conceptualization of any modern system requires early 

integration with its operational environment. The requirement 

for early systems integration is particularly important for 

military systems which are unique in that they must function 

against an enemy intent on their destruction. Survival in this 

environment is frequently the principal mission of the system 

and also its principal measure of effectiveness. It is the 

analytical merger of the conceptual system with its operational 

environment which defines both the objective and importance of 

military combat modeling. 

Current versions of systems development models are virtually 

all computer resident. Because of the complexity of systems 

development, plus the requirement for many repetitions modern 

combat models are best suited for an automated environment. 

Combat models differ from Computer Aided Design/ Computer Aided 

Manufacturing (CAD/CAM). CAD/CAM is used to conceptualize and 

manufacture a specific system. A systems development combat 

model, on the other hand, is used to demonstrate a system's 

performance in its anticipated wartime environment performing 

against its probable enemy. Combat Models are also unique in 

that both the system and the wartime environment are required to 

be speculative in order to estimate the probable reality at the 

time the system will actually perform its battlefield mission. 

Modern data systems provide the capability to view systems 

,operational performance early in design, allowing elimination of 

candidate concepts even before they leave the drawing board. 

This relatively new capability to observe ltdraftVt or ltnotionalt' 

systems inside a model of an operational environment presents 

not only new powers of design, but new problems as well. The 

process of systems development from design through testing now 

takes place inside a computer. Entire technology options and 

systems design concepts can be eliminated long before even 

drawings are completed. Traditional human factors engineering 

begins when the concept of a system is sufficiently firm to 

permit the design of at least a mock up of the man-machine 

interface such as a cockpit simulator. The combat model, however, 

has allowed the selection of first order military technologies 

and systems candidates completely inside the notional 

reality of a computer. 

Because the systems development combat model grew from 

analytical communities which were oriented to tactics and 

engineering, the representation of human performance parameters 

in the evolution of combat models was rarely considered. The 

impact of this evolution has been subtle. By omitting human 

414

factors from both enemy and friendly forces, the engineering 

modeler intended to deal with the amorphous area of human 

factors through a balanced omission: Since neither side showed 

human factors, the effect was balanced and should have had no 

effect on the tactical or engineering conclusions drawn from the 

model's output. In early tactical wargames and engineering 

models, this approach was reasonable because the computers of 

the day were functional only in aggregated, t‘low resolutiont8 

modeling. Low resolution models provided valuable tactical 

insights, but little information about specific systems. 

Engineering models were also simple: one tank fired at another 

in a straightforward duel format. 

As wargames became automated, the ability to conduct 

high-resolution simulation allowed the tactical and engineering 

'modeling of actual systems in dynamic combat. Automation of 

wargames also made omission of human factors both unnecessary 

and problematic. "Balanced omission" of human factors in 

systems development combat models is more accurately described 

as the actual modelinq of the human as 100% effective. By 

failing to properly consider the human component of systems 

performance, the human has an assumed value of 100% effectiveness. 

It is generally accepted, even among combat modelers, that 

this assumption has the effect of exaggerating systems performance, 

and accelerating the tactical pace of a battle. 

The two original clients of the combat model, wargarners and 

hardware engineers, have had an understandable lack of interest 

in representing the human factor component of systems performance 

as anything other than 1.0. Human performance parameters 

are still much less defined than hardware performance parameters, 

and no clear consensus has emerged as to how human factors 

should be modeled. The case for improving the representation of 

human factors in systems development combat models focuses on 

the impact of modeling humans as 100% effective. Notional 

systems over-perform and technologies and systems candidates are 

eliminated in an occult process long before their interaction 

with the human dimension can be measured. 

There are additional dimensions to the dilemma of human 

factors in combat models. Combat model proponents have a 

somewhat justified view of their critics as romantics who wax 

philosophically about the value of such human traits as leadership, 

morale and courage on the the battlefield, but cannot 

quantify these dimensions in order that they be shown as "independent 

variables" in the outcome of analytical combat. 

A proposed approach for change is outlined in figure 1. A 

first step would be to identify the combat models most often 

used in the design and selection of systems. While this step 

may appear obvious, there could be a drift toward models which 

have little impact on systems development, but are easily 

modified for human dimensions. systems development models are 

415

usually sophisticated engineering development models which do 

not lend themselves to human dimension integration. Subsequent 

steps, in turn, would be: 

-Select those systems for study which require "man-in-the-loopI' 

for optimal functioning. Good candidates for study are those 

systems which depend upon humans for the performance of critical 

functions. The intent, early in a human dimensions integration 

program, is to pick those systems for study which are likely to 

show the importance of human dimensions, even when only limited 

human performance is modeled. 

-Select human systems tasks which are currently modeled by 

implication (i.e. man as 1.0) and for which data can be obtained, 

such as "acquire target'*,"identify target", or "lock-on 

target and fire". When systems are conceived, their designers 

allocate some tasks to man, some to the machine and some to both 

man and machine. A combat aircraft, for example, might acquire 

a target automatically through the system itself, depend on its 

operator for correct identification and attack decision, then 

return control to the system for attack launch and execution. In 

some highly sophisticated design processes using elaborate task 

analysis this process is formal. More often it is informal., 

Selection of human-critical tasks will, like the first step, 

increase the likelihood that human variance will have an 

independent-variable impact on model outcome. 

-Modify the selected model to allow replication of the discrete 

human functions selected. Actual model algorithms need not be 

complex. The initial modifications need only demonstrate that 

the human tasks selected do, in fact, influence the outcome of 

the analysis as shown by the measures of effectiveness. Modifying 

complex models to show the more discrete human functions 

such as suppressed action due to fear or diminished target 

acquisition due to cognitive overload is within our current 

capability. Some models already represent these functions to 

a degree. 

-Run the model with the human factors modifications using the 

best available data. 

-Compare the model output (systems exchange ratios, force 

exchange ratios, etc.) between the basic combat model and the 

human factors modification. At this point human performance 

can be observed in a quantified fashion which is both understandable 

and acceptable to the senior engineering design 

community. 

-Demonstrate the value of human.factors algorithms in combat 

modeling through the (hopefully) significant differences between 

the basic and human factors modified model. 

416

Using those systems tasks which are frequently assigned to 

humans in systems design (identify friend-or-foe, for example), 

develop a plan for the collection of human performance task 

data: 

First, -search for existing data with high human factors and 

engineering community acceptance. In other words, use what we 

have first. This approach is particularly important early in 

the effort when the needs for combat modeling data are ill 

defined. Data collected without a good understanding of how it 

will be used is likely to go unused. As the process matures, 

the personnel data development and modeling communities will 

develop an understanding of one another's needs and a protocol 

for data communication will evolve. 

Second, develop data through the use of cost effective means 

such as developmental tests in training simulators. Since 

personnel data formats for combat models are likely to evolve, 

the costly process of test or field developed data is likely to 

be wasted due to inevitable changes. The new family of flight 

and vehicle simulators offers an excellent opportunity to 

collect human performance data for combat model input. 

Third, develop data through field operations research. An 

excellent example of this concept was the Fire Fighting Task 

Force study sponsored by the U.S. Army's Concepts Analysis 

Agency in Bethesda, Maryland. The Fire Fighting Task Force 

studied the psychological impact of stress caused by U. S. Army 

Infantry units fighting the Yellowstone National Forest fire in 

1988. This type of effort not only generates data for use in 

modeling, but contributes to our understanding of combat theory. 

Finally,loop early data development back to the human factors 

modified model in order to demonstrate human factors as an 

independent variable in the outcome of combat and document those 

human variables which warrant further developmental research. 

This loop-back function will automatically develop personnel 

combat modeling data protocol as a by-product. 

417

i 

_.___ - . . . ..~. - 

****************************t************************************ 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

ID MODELS USED ‘. 

IN SYSTEMS DESIGN/ 

SYSTEMS DEVELOPMENT 

REFINE SELECTION FOR*’ 

“MAN-IN-THE-LOOP” 

SELECT TASKS 3* 

CURREN;~Y,~ODELED 

MODIFY SELECTED 

MODELS 

4. 

I RUN SELECTED 

MODELS AS MODIFIED I 

4 

DEVELOP DATA 2. 

THRU 

SIMULATION 

DEVELOP DATA 3. 

THRU 

FIELD COLLECTION I 

**t************************************************************* 

Figure 1 

A Paradigm for the Integration of Human 

Factors in Combat Models 

418 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

-

'L'KADLNC; OE'k' PEKFOKMANCE, TKAlNLNG, AND EQULPMENT 

FACTOHS 'I'0 ACHLfiVE SIMLI,AK PEKFOKMANCE 

Janet J. Turnaqe, university of Central Florida 

Robert S. Kennedy, Essex Corporation 

Marshall 8. Jones, Pennsylvania State University 

INTRODUCTlON 

Military systems performance is generally the joint outcome of the human 

interacting with the machine. It is convenient to think oE this outcome in 

terms of causal models, where the elements that determine or drive systems 

performance may be relegated to equipment characteristics, training variables, . 

and individual capabilities. We suggest that an appropriate starting place 

for such causal analyses is to select a specific level of desired operational 

or systems perEormance beforehand. We call this device level Isoperformance. 

Then one can employ the different potential predictors of this outcome in an 

Isoperformance model. The way the model works is to select each variable as a 

potential predictor and then submit it to a trade-off methodology whereby each 

variable is compared in light of total operational proficiency desired. 

Examples of operational proficiency might be: (1) escape from an aircraft 

water crash within 60 seconds, (2) completion of a forced march carrying a 

36)-pound pack for 20 miles within 6 hours, (3) an 80% carrier landing board 

rate, or (4) control of 20 aircraft in the same airspace simultaneously. 

There are three meanings of the term “Isoperformance.” The first is a 

conceptual approach to human factoring. Second, the term may describe a 

curve, plotted against training time on the abscissa and aptitude on the 

ordinate. Third, Isoperformance is a specific interactive computer program. 

In this paper, we shall describe each of these features, in turn, and will 

present an illustration of one type of application. 

But first, let us more Eirmly specify the premise of Isoperformance. The 

premise is that the same (Iso) total systems efficiency (performance) is a 

function oE trade-ofEs between personnel, training, and equipment. To achieve 

this state of affairs, the Isoperformance model is intended to: 

(1) Make estimates of training outcomes for different categories of 

personnel; 

(2) Check internal consistency of estimates: 

(3) Compare estimates to known relations from human engineering, 

personnel, and training: 

(4) Counsel how to change “wrong” estimates: 

(5) Output Isoperformance curves; and 

(6) Leave a hard- copy audit trai 1. 

Isoperformance as a Conceptual Approach to Human Factoring 

Isoperformance was inspired by a long history of involvement in human 

engineer inq research, development , test, and evaluation. For example, 

military speci.fications and standards Eorm the basis for numerous systems 

419

-. “----- 

requirements, but their number and complexity often makes trade off decisions 

difficult because there i.s no context for their Cost. The literature has not 

helped either. A U.S. Air Force review of 114 human factors studies from 

1958-72 Eound that the physical characteristics of the stimulus were most 

often the signif icant factors in performance outcomes and there were few 

interactions. 

But these studies tended to ignore the contribution of practice 

or individual differences. This general finding suggested the multivariate 

(holistic) approach that was subsequently employed by the Navy in over a 

decade of simulator research, ranging from carrier landing, to air-to-ground 

combat , to Vertical Take-Off and Landing (VTOL) studies. The strong inference 

suggested by the results of these later studies was that people accounted for 

the most variance in performance, followed by training manipulations, and then 

by equipment variations. 

In this work, what was surprising was the modest amount of performance 

variance that could be accounted for by equipment features. In partitioning 

the performance variances over numerous experiments, equipment accounted For 

15-20%) trials of practice accounted for lo-25%. and people accounted for 

50-80%, with error usually in the 25-50% range. Again, interactions were few 

and far between. This implied that these main effects could be traded off if 

you started with them in the first place! Thus was born Isoperformance. That 

is, because some pilots are simply better than others, because repeated hops 

are costly, and because costly changes in equipment features may produce 

minimal changes in performance, we should concentrate on trade-offs among 

these known relations to bring about desired goals rather than only one or 

another mechanism. After the relative contributions are determined, a price 

tag can be placed on all dimensions and the cheapest solution sought. 

Isoperformance is therefore designed to accomp.lish Personnel, Training and 

Equipment trade-offs. In the Isoperformance complete program, the term 

Personnel can represent such features as sensory capabilities, cognitive and 

information processing abilities, anthropometry, or test scores, such as the 

Armed Services Vocational Aptitude Battery (ASVAB) scores, (presently the 

default condition). The term Training can represent such features as 

practice, sequence and series effects, learning, training regima and 

schedules, or number of sessions. The default condition is trial of practice. 

Similarly, the term Equipment can represent new vs. old, smart vs. dumb. 

hi-fidelity vs. low-fidelity, or any general A vs. B conEiguration, which is 

the default condition. The data sources for estimates of the scale values for 

these variables can come from various origins, including lay opinion, the 

scientific literature, explicit experiments, or technical data bases. 

Isoperformance Curves 

Figure 1 presents an illustration of IsoperEormance using two categories 

and one equipment feature (the current technoloqy). 

The Personnel category is divided into high- and low- to medium- aptitude 

groups, the Training time allotted is 9 weeks, and the proport ion of people 

that is desired to complete the training successfully is set at 50%. One can 

see that it takes the high-aptitude group only 4 weeks to achieve the SM:~ 

proficiency level that it takes the low- and mediurr-aptitude group to achieve 

in 8 weeks.

PPOPORTI ON OF PEOPLE 

04 .T.# - 

90 

65 

50 

35 

20 

5 

TRAINING TIYE (in Weeks) 

LOW-u!D 

Figure 1. Illustration of Isoperformance Using IWo 

Categories and One Equipment Feature (The Current Technology) 

~- __ ----_ 

Fiqure 2 shows that, if a second new equipment or technology feature is 

introduced (which reduces the training time to reach proficiency), then the 

time it takes for the groups to achieve the 50% criterion of proficiency 

reduces to 3 weeks and 6 weeks, respectively. 

PROPORTION OF PEOPLE 

I I I I I I I I 

60 

65 

50 

I v I I I I J 

2 3 4 5 6 7 6 0 

TRAINING TIME (in w*eks) 

nt APTITUDE 

LOW-UID 

Figure 2. If a Second Equipment Feature is 

Introduced (Which Reduce the Training Time to Reach Proficiency) 

Figure 3 illustrates the relation between category and equipment 

differences in terms of Isoperformance curves, where any point on the curve 

identifies the same (Iso) performance. Note that the Equipment difference is 

smaller than the “Personnel” difference. 

EQIJIPXLNT 

DIFFERENCES 

I CATEGORY 

E 

DIFFERENCE 

N 

e 

NEW EQUIPMENT OR TEC HNOLOGY 

CI y I 7 2 3 4 5 6 7 6 9’ 

TRAINING 71hiE (in Weakr) 

Figure 3. lsoperformance Curves Helatinq Equipment 

Differences to Aptitude Differences 

421

---_l_ ~..__. --. -- .- - 

From these types OE ‘Looper Eormance curves, one can determine feasible 

combinations of personnel, training, and equipment. features for any specified 

level of des’ired performance. In addition, one can rule 011 t various 

combinations if there are constraints, Eor example, on personnel availability 

or training time. 

Isoperformance as a Computerized Program* 

An interactive, expert decision aid has been developed to quantify the 

trade-off methodology implicit in the Isoperformance approach. The 

computer-based “smart m system is intended to aid in decisionmaking by 

mechanizing the trade-offs between human (aptitude, training > and equipment 

variations in order to achieve the same (Iso) system performance outcome. The 

Isoperformance core subprogram is composed of four phases, Specification, - 

Input , Verification, and Output. 

Specification, the first phase of the Isoperformance program, requires the 

user to state the problem, in effect, by specifying: 

(1) the system under study, 

(2) what is meant by “proficient performance”, 

(3) the aptitude dimension to be used, 

(4) how that dimension, is to be divided into ranges or “personnel” 

categories, and 

(5) the maximum amount of training to be considered. 

These specifications are purely descriptive and no relationships have to be 

estimated. 

Input, the second phase, asks that, for each personnel category, the user 

estimate: 

(1) the minimum training time necessary Eor people in that category to 

become proficient in terms of number of weeks and 

(2) the proportion of persons in the category who are expected to become 

proficient given the maximum training time. 

The program works by receiving input from the user 2-3 estimates as needed 

for each aptitude category. The estimates can come Erom any reliable source 

(e.g., simulators, extrapolations from related tasks, etc.>. Estimations are 

planned for because it is expected that the data required are not readily 

available at the present time. However, the input can also be data from 

technical data banks iE available. 

In the third stage, Verification, the Isoperformance program checks to 

make sure that input estimates are “reasonable.” These checks are conducted 

whether the input data are “estimates” or actual data Erom a data base or 

experiment. There are three types of checks on user estimates: (a) formal, 

*Copies oE a demonstration disc are available Erom R. S. Kennedy, Essex 

Corporat ion, 1040 Woodcock Road, #227, Orlando, FL 32803

which is a check oE logical necessity, (b) qeneral. which compares est i.mates 

with known reqularities, and (c) specific, which compares user input with 

library validities. In general, an implicit correlation between aptitude and 

the performance dimension on which “proficiency” is defined can be calculated 

at every level of training. Also, the implicit correlations should decrease 

with training, and the IsoperEormance curves shou Id be decreasing and 

negatively accelerated. The results of these checks are repor ted and 

explained to the user, together with suggestions as to how the estimate might 

be modified to coincide with known regularities and ranges. The fourth phase, 

output, is simply the computer output Erom the preceding phases. 

AN APPLICATION 

The Isoperformance methodology can be applied to numerous human f’actors 

areas. Here, we will use as an exemplar freedom Erom simulator sickness in 

ground-based flight trainers. Motion sickness is a common problem in the 

military, particularly in testing and simulation devices. Virtually everyone 

with intact organs of equilibrium is susceptible to one form or another, but 

some people get sick all the time while others are virtually immune. However, 

we know that practice usually results in adaptation to motion sickness, and 

some specific equipment configurations are more conducive to adaptation than 

others (e.g., .2Hz). 

An example of the approach for applying IsoperEormance to simulator 

sickness is as follows: 

(1) Obtain a large data base with simulator sickness incidence, 

(2) Determine the relationship for each variable, 

(3) Isolate variables which are causal, 

(4) Select acceptable Isoperformance levels, 

(5) Calculate Isoperformance curves using two continuous causal variables 

as X/Y and one dichotomous causal variables as comparison, 

Afterwards, it is possible to put cost values on the outcomes and determine 

trade-offs Erom which decisions can be made. 

Therefore, we took from our large data base (N > 1000) of simulator 

sickness a series of correlational relationships. We cast them into a 

multiple regression equation and obtained the beta weights for such continuous 

variables as length of hop, whether visuals are on/off, field of view, usual 

state of fitness, etc. using the continuous variables, plus the dichotomous 

fit/unfit dimensions, we created Figure 4. Note that a four and one-half hour 

hop using a 305-degree field of view for a pilot who was fit would have the 

same simulator sickness score (110) as a pilot who had been ill and flew a 

two-hour hop with a 195 degree field-of-view. 

423

Esthdd Hop laqttl 

I Visuals On 

5.5 T 

Fitness usual 

Enough =I-l-m+...1 

ccn c, 

l-- 

0.5-- 

0 

I I I I I I 

Jo J&n 254 193 1W 83 dl 

field of View (Dcgrres) 

-Honotboulm “‘Ha hII 

Figure 4. Isoperformance Curves Comparing Simulator Field 

of View, Hop Length, and Pilots’ Report of Recent Illness 

CONCLUSIONS 

In Navy flight simulator studies, half of the variance appears to be 

attributable to Personnel differences, with Training and Equipment dividing 

the rest. Therefore, it is more inFormative to know who is flying than what 

trial of practice or on what equipment they are flying. The case EOr 

simulator sickness is similar. It appears that a considerable amount of 

variance in motion sickness research is attributable to Personnel differences 

with smaller proportions attributable to Equipment and Practice. 

In general, we believe that lsoperformance goals have merit because they 

estimate train-ing outcomes by: 

(1) Forcing the user to make estimates of training outcomes for different 

personnel categories. 

(2) Providing checks on the internal consistency and logical coherence of 

these estimates. 

(3) Providing checks on how we31 the estimates conform to known 

regularities from human engineering, personnel, and training research. 

(4) Informing the user as to the results of these checks, together with 

information about what can be done to make estimates consistent or 

bring them into closer conformity with known regularities and facts. 

(5) Leaving a hard-copy audit trail of all estimates, feedback, and 

outputted Isoperformance curves. 

Implementation of this model can help human engineering practitioners, 

training systems designers or human resource managers compare the relative 

Costs of differing combinations that lead to the same performance level. This 

trade-OCf technology is especially relevant 

in military budgets. 

today given pro;ected constraints 

424

FINAL REPORT, COMPUTER ASSISTED GUIDANCE INFORMATION SYSTEMS 

BAYES,Andrew H. Defense Activity for Non-Traditional Education 

Support, Pensacola, Fl 32509-7400 

INTRODUCTION 

In June 1989 DANTES released a final report covering the pilot study of four ., 

computer based guidance information delivery systems. In this report, a major 

recommendation was that the pilot study be expanded and additional data be 

gathered. 

The pilot study was expanded to a total of 102 sites in all active duty 

Services and to two Air Force Reserve sites. Regional training was conducted 

and only those sites that attended training were given the software. 

Each site was given User Surveys (Tab A) to be completed by each participant. 

The data from these surveys have been summarized in this report. 

SOFTWARE 

Based upon the results of the pilot study, DISCOVER by American College 

Testing and GIS by Houghton-Mifflin/Riverside were the software systems used 

in this expanded pilot study. 'They were chosen because they were the two 

highest rated by both the counselors and the clients. While they represent 

two different styles of counseling, they both contained the same basic 

modules and information. GIS provides more specific information on 

occupations and education. while DISCOVER uses the more traditional counseling 

approach. 

DATA COLLECTION 

The User Surveys requested demographic data as well as data reflecting the 

reactions of the clients to the software. It is interesting to note the data 

trends when pay grade or education is compared to reactions. 

STATEMENT OF PURPOSE 

* To determine if these systems were meeting expressed needs of the client 

population. 

* TO determine if these systems were a valuable addition to the resources 

available from the education centers 

* To determine what if any additional data bases would be valuable. 

* To determine if the systems were cost effective. 

* To determine if counselor time better utilized as a result of clients having 

used CAGIS.

DATA ANALYSIS AND INTERPRETATION 

While over 800 User Surveys were analyzed, the numbers do not remain constant 

because in some cases, the pay grade was not available, not all questions were 

answered by all respondents, or directions for completing the forms were not 

followed. It is felt, however that enough data was collected that the results 

are valid and do represent the cross section of clients visiting the education 

centers. It would be risky, however, to extrapolate from this data to the 

entire Military population. In a sense, the data here, represents only the 

reactions of individuals visiting the education centers and this may be a 

special sub-population. No attempt has been made to compare this population 

with the Military in general. 

P El 

A E2 

Y E3 

E4 

G E5 

R E6 

A E7 

D E8 

E E9 

TABLE I 

EDUCATIONAL LEVEL BY PAY GRADE 

EDUCATIONAL LEVEL 

1 2 3 4 5 6 

.Ol .60 .30 .Ol 0 0 

0 .77 .63 0 0 0 

.Ol .51 .29 .04 .04 .lO 

0 .47 .3a .06 .04 .02 

0 .32 .30 .lO .09 .08 

0 .33 .33 .16 .09 .03 

.04 .13 .33 .15 .ll .15 

0 .17 .22 .17 0 .33 

0 .19 0 .31 .25 .06 

_------ 7 8 

0 0 

0 0 

0 0 

C.01 C.01 

.02 .Ol 

.05 0 

0 .06 

0 .I1 

0 .19 

C 

--.-.A 

0 

0 

0 

G 

0 

0 

-02 

0 

0 

01 .25 0 .50 .12 .12 0 

02 . 78 .ll .ll 0 

03 . 62 .15 .23 0 

04 .09 .18 .72 0 

05 . 11 .22 .22 -44 0 

EDUCATIONAL LEVEL 1 = No diploma 

EDUCATIONAL LEVEL 2 = High School/GED 

EDUCATIONAL LEVEL 3 = l-2 Years of College 

EDUCATIONAL LEVEL 4 = AA/AS Degree 

EDUCATIONAL LEVEL 5 = 3-4 Years of college 

EDUCATIONAL LEVEL 6 = BA/BS Degree 

EDUCATIONAL LEVEL 7 = Some graduate study 

EDUCATIONAL LEVEL 8 = Masters degree 

EDUCATIONAL LEVEL 9 = Doctorate 

CAREER PLANS 

When asked to describe their career plans, the enlisted population at the 

El-E4 level indicated that they planned to leave after their current 

enlistment. E5s were almost evenly divided between remaining on active dut) 

426

___-. ----- ..~._ ~. ----~--..---.- .____ 

until retirement and being uncertain about leaving after their present 

enlistment. E6-E8 indicated that they planed to stay until retirement. The 

following table indicates career plans by branch of Service and pay grade. 

427

TABLE II 

CAREERPLANS 

1 2 3 4 5 

P 

A El .08 .08 .25 .50 -08 

Y E2 . 11 .04 .21 .43 . 21 

E3 . 11 .51 . 16 .43 .24 

G E4 . 12 .05 * 10 .42 * 31 

R E5 .33 .05 . 18 . 30 . 13 

A E6 .70 .02 -08 .16 .03 

D E8 .72 . 11 .08 .05 .03 

E E9 .96 0 0 0 .03 

% % % % % 

ARMY .20 .05 .09 .38 .27 

AIR FORCE .41 .04 .12 .31 .ll 

NAVY .29 .04 -18 ‘29 .20 

MARINES .40 .03 .08 . 38 . 11 

CAREER PLANS: 

l=Probably stay until retirement 

2=Stay beyond present obligation but not to retirement 

3=Probably stay beyond present obligation but not until retirement 

4=Probably leave after present obligation 

5=Definitely leave after present obligation 

While the majority of clients learned about CAGIS when they visited thr: 

Education Center or attended a briefing by the Education Center staff, about 

l/3 learned about CAGIS from a, co-worker. This would indicate that the 

program was felt to be valuable enough to recommend it to a friend. 

Because part of the data collection was designed to determine the relati,Je 

effectiveness of the two systems, several comparisons were made. Eighty-cr,e 

percent of the sites received GIS but only 45% of the User Surveys were 

returned from GIS sites. Clients spent more time using Discover (29% 46 to 60 

minutes) as opposed to GIS users who spent less time with that system (34%-16 

to 30 minutes). Seventy-one percent of the clients spent between sixteen and 

sixty minutes using the software. It is interesting to note that E3s and E6s 

are spending the most time using the computer. This appears to be a critical 

time for them in their careers. Eighty-one percent of the GIS users felt they 

understood the system they used, while 91 percent of the Discover users felt 

they understood that system. 

EFFECTIVENESS OF SYSTEMS 

In an attempt to determine the effectiveness of the two systems, clients were 

asked to compare the computer systems with other types of reference materials. 

The first question asked the client to rate the CAGIS information in relatrcn 

to any other reference source. Eighty-two percent of the users rated CAGIS 

either superior or better than any other sources. Clients were then asker: 

about the currency of the information, and again 77% rated the inform#a%icn 

either superior or decidedly more current.

As a further measure of the effectiveness of the systems, the clients were 

asked if they talked with a counselor following their session on the computer, 

and if they did talk with a counselor did they feel better prepared. Forty 

percent of the users did not talk with a counselor after interacting with the 

software. Ninety-one percent of those that talked with a counselor stated 

that they were better prepared to talk with a counselor. Seventy-five percent 

of those that did not talk with a counselor felt that they did not need to 

talk with a counselor because the system had answered all their questions. 

These statements would indicate that the systems are maximizing the counselor 

resources by screening out those clients that were basically seeking only 

information. This frees the counselors to do counseling and relieves them of 

simple information giving. 

RANKING THE DATA BASES 

When asked to rank which data base they felt was most useful, the percentage 

of clients ranked the following data bases as number one: 

Civilian Careers .35 

Undergraduate Degrees .25 

Graduate Degrees .13 

Military/Civilian Crosswalk .ll 

Financial Aid -07 

Military Careers .06 

Resume .02 

OVERALL RATING OF CAGIS 

As a- feature in the Services provided by the education offices, users were 

asked to rate the CAGIS they used. For the two systems, the following ratings 

were .assigned: 

TABLE III 

RATING 

1 2 3 4 5 

GIS .56 .37 .06 c.01 c.01 

DISCOVER .62 .34 .04 c.01 x.01 

l=Essential 2=Important 3=Neutral I=Not important S=Not required 

CONCLUSIONS 

In reviewing the Statement of Purpose, the data supports each statement. 

The systems are meeting the needs of the clients as evidenced by comparing the 

responses of the clients as to which data bases they used and how they 

ultimately ranked those data bases. This is also shown by the responses to 

the completeness and currency of the information provided. 

429 

---_

Clearly these systems are felt to be valuable resources. Over 90% of the 

users felt the systems were either essential or very important additions to 

the education centers. 

cost effectiveness is difficult to determine, but when one notes the amount of 

time spent using the software and compares this to the hourly cost of a GS 

9/11 guidance counselor it is apparent that money is being saved. with 

increased quantities, the price of the leases becomes even less. Site 

licenses also provide more software at an even greater saving. The fact that 

family members and DOD civilians can also access the systems at no additionai 

cost further enhances the cost effectiveness. 

. 

Better utilization of the counselor time is apparent by the data indicating 

the number of users that did not need to meet with a counselor upon.completion 

of their use of the software. The number of clients indicating that they were 

better prepared to meet with a counselor also allows the counselor to provide 

assistance with things other than simple information giving. (e.g. information: 

integration) 

RECOMMENDATIONS 

1. DANTES seriously investigate the possibility of adding some of the 

requested additional dat,a bases. Working with the vendors to incorporate 

these data bases into the existing systems should be relatively easy. The 

vendors have been asked to identify SOC schools in their next editions. 

2. As the personnel resources of the education centers are being drawn down 

and more Service members are being released from the Service, these systems 

should be expanded to reduce the quantity of personal counseling. Education 

centers should consider increasing their investment in computer hardware in 

order to expand their counseling efforts. When on-site scoring becomes 

possible, sites will want the capability to take advantage of this enhancement. 

DANTE.9 plans to expand the program to approximately 250 sites, but only 

to those sites that are willing to participate in training and have the 

hardware available. Several Educational Services Officers stated that the 

addition of CAGIS was extremely valuable in augmenting their resources for 

Project Transition. 

3. Increasing the "user friendliness" should be a major objective of the 

vendors. While the information in the systems is valuable, difficulty in 

accessing the information diminishes the usefulness of the systems. The 

vendors need to be aware of this shortcomming and either provide additional 

training or provide more technical support to the education centers. 

4. Counselors need to overcome their reluctance to use the computers. Their 

resistance to become involved with computers is denying wider use of the 

systems. The counselors do not really know how much information each system 

has and inconsequence do not take full advantage of the breadth of information 

available. An effort should be made to work with the counselors during workshops 

and national conventions. 

430

5. Training is essential to the success of the program. It is recommended 

that someone from DANTES attend each of the training Sessions. This was not 

done for this portion of the pilot study and inconsequence, data collection 

was slow and many follow-up letters had to be written. 

6. The need for extensive and current data is apparent. Many outdated and/or 

hardbound references can be replaced by the CAGIS software. DANTES should 

consider distributing reference materials less frequently and rely more on the 

information available in the CAGIS software. 

SUMMARY 

The data from this expanded pilot study clearly substantiates the data from .. 

the initial pilot study. The systems have considerable value not only to the 

clients but also the education center personnel. It is cost effective, up-todate, 

thorough, and most importantly readily available. The users, ranging 

from active duty personnel to DOD civilians and family members indicated very 

strongly that this is an essential service. 

As the Military enters into a period of austere funding and personnel 

reductions, programs such as CAGIS will become increasingly important to help 

personnel make the transition back to the civilian workplace and higher 

education. Comments from program administrators clearly demonstrate the 

feeling that these programs are going to fill a very large gap in their 

services. 

Education centers need to move more rapidly into the world of automation and 

take advantage of the information explosion. These systems are going to make 

information retrieval instantaneous and eliminate hours of tedious research 

using hard cover reference materials. 

It would appear from the current data that these systems have value for all 

pay grades and all branches of Service. The program should be expanded to 

allow all sites that have a need to be able to access one of the systems. 

431

VERTICAL COHESIOk PATTERNS IN LIGHT INFANTRY UNITS' 

Cathie E. Alderks 




Researchers have shown that strong cohesion among soldiers 

as well as cohesion within platoon level leadership teams has a 

consistent association with platoon performance and the ability 

to withstand stress (Siebold and Kelly, 1988a, 1988b). However, 

research pertaining to the impact of vertical cohesion up and 

down the chain of command on small unit performance is limited. 

In this paper the pattern of vertical cohesion from squad through 

company and its impact on performance at Army Combat Training 

Centers are examined. 

METHOD AND SAMPLE 

Data were collected by questionnaire from soldiers and 

leaders within five light infantry battalions (N = 60 platoons) 

at three points in time. The first point in time (Base) occurred 

4-6 months before the battalion was scheduled to go through a 

training rotation at either the U.S. Army National Training 

Center (NTC), Fort Irwin, CA, or the U.S. Army Joint Readiness 

Training Center (JRTC), Fort Chaffee, AR. The second point in 

time (Pre-rotation) was 2-4 weeks prior to the rotation; the 

third point (Post-rotation) occurred 2-4 weeks following the 

training rotation. 

Base and pre-rotation questionnaires were administered by 

researchers from the U.S. Army Research Institute to platoon 

level soldiers (squad members (SM), squad leaders (SL), platoon 

sergeants (PS), and platoon leaders (PL)) one company at a time 

in either a classroom or dayroom setting. Soldiers took 

approximately 30 minutes to complete the 160-item questionnaire 

after instructions. Soldiers responded on a machine readable 

answer sheet. Post-rotation questionnaires were given at the 

start of interviews in an office or dayroom setting to the 

following groups of soldiers within a company: 1) all PLs, 2) all 

PSS, 3) two-thirds of the SLs, and 4) all SMs from one intact 

squad in the company. Post-rotation questionnaires were short 

(21 items plus some unit and position identification questions} 

and took soldiers less than 10 minutes to complete: responses 

were made on the questionnaire itself. 

'The views expressed in this paper are those of the author 

and do not necessarily reflect the views of the U.S. Army 

Research Institute or the Department of the Army. 

432

Post Selection Board Analysis 

Post-selection board review of the 1986/87 NROTC scholarship 

year pointed to the need to build more structure into the 

evaluation system in order to (1) provide more consistency in 

the evaluation of records and (2) permit the selection of those 

who were truly best qualified in both an academic and potential 

officer sense. 

Assessment of the criteria used by board members to assign 

points to an application suggested that there was wide variance 

among board members in the value placed on the level of a 

student's academic or extracurricular performance and the type 

of student extracurricular activity. For example, some board 

members felt that athletic participation was essential for 

success as an officer; others did not. Applications were 

scored accordingly, with the resulting selection scores 

dependent upon the values of the particular selection board 

members assigned to review an application. This created the 

potential for wide variance in the scoring of similar 

applications by different selection boards. 

Analysis of the scores assigned by the weekly boards revealed 

that the average score awarded was over 80 points (out of 

100). This meant that weekly selection board members had very 

little ability to "reach down" to select an applicant who came 

to the selection process with a less competitive Quality Index, 

regardless of the merit of the applicant. 

Solution 

To address the problems of evaluation consistency and the 

extremely high average selection board score, a more formal 

method of application evaluation was instituted. Applicant 

evaluation categories were developed from observation of board 

member discussion during the initial weekly selection board 

sessions. Those areas that selection board members appeared to 

value consistently as most important when discriminating 

between competitive scholarship applicants were incorporated 

into a revised applicant evaluation system. Each evaluation 

category was also assigned a scoring level maximum . Optical 

Mark Reading (OMR) equipment was purchased and the NROTC 

Scholarship application was redesigned to be read by an optical 

scanner. Additionally, a formal selection board training 

program was developed to ensure that each weekly selection 

board began the selection board process with the same 

application evaluation guidance. 

This revised selection system was finalized during the summer 

of 1987 and used by the first weekly selection board of the 

1987/1988 NROTC program year. Each year, data based on 

selection board actions are reviewed and the system modified as 

483

The base and pre-rotation questionnaires contained items 

which formed scales measuring interpersonal, organizational, and 

leadership constructs (e.g., SM horizontal cohesion, job 

satisfaction, command climate, training effectiveness), as well 

as various demographic items. The post-rotation questionnaires 

focused mainly on soldier perceptions of performance during their 

recent rotation. In addition, for the two battalions which 

rotated through the JRTC, ratings on leader and platoon 

performance were provided just after the rotation by the 

observer/controllers (0~s) who observed each platoon during the 

rotation. In other words, the base and pre-rotation 

questionnaires contained the home station determinants 

(predictors) of performance; the post-questionnaires and the 

ratings from the OCs provided criterion measures for that 

performance. 

For the present paper, only Pre-rotation interpersonal 

scales and Post-rotation performance scores were considered. 

Platoon scores were obtained for each of the scales using a mean 

aggregate procedure. Standard scores were obtained to compare 

scores on the same scale. 

Vertical cohesion scales were used to examine the strength 

of each segment in each SM to Company Commander (CC) chain of 

command. These scales included 1) SM rating SL, 2) SM rating PS, 

3) SM rating PL, 4) SL rating PS, 5) SL rating PL, 6) PS rating 

PL, 7) PS rating CC, and 8) PL rating CC. It must be emphasized 

that in each case, a subordinate was rating a superior. These 

scales were composed of items such as "the leader treats us 

fairly", "the leader looks out for the welfare of his people", 

"the leader is friendly and approachable", "the leader pulls his 

share of the load in the field", and "the leader would have my 

confidence if we were in combat together". Scale item factor 

loadings (where N was sufficiently large to justify a factor 

analysis, i.e., sets of scale ratings by SMs and SLs) were .80- 

. 87, .79-.8G, .80-. 86 for SMs rating SLs, PS, and PL, 

respectively, and .65-.88 and .72-.90 for SLs rating PSs and PLs, 

respectively, with each scale forming independent factors. 

Performance scales were obtained from ratings of missions 

performed at JRTC/NTC. They were determined four ways: 1) oc 

ratings, 2) CC ratings, 3) Platoon ratings composed of the mean 

ratings of the PL, PS, SLs, SMs with each level receiving a 

weight of one, and 4) Overall ratings composed of the mean 

ratings of the OCs, CC: PL, Ps, SLs, SMs with each level 

receiving a weight of one. 

Two approaches were chosen to examine vertical cohesion in 

the chain of command. The first approach was to identify the 

lowest break. The rationale was that since the lower leaders 

oversee the squad members who accomplish the direct fighting 

tasks, lower breaks in the chain of command might have a greater 

impact on direct platoon performance than breaks that occurred 

433

-- 

higher. The second approach was to count the total number of 

breaks that occurred anywhere in the chain of command. In both 

approaches, z-scores were computed to determine if and where 

breaks occurred. The decision rule for a break to occur required 

a z-score I -.5 on the scale measuring cohesion between one 

position in the chain of command and a higher position. Where 

two or more scores for rating a particular leader were available 

(e.g., SMs, SLs, and PSs each rating PL) only one of the scores 

.was required to meet the decision rule of z 5 -.5. By example, a 

platoon could have a z ( -.5 at the SM-SL level and also at the 

PL-CC level. It would be included in the SL lowest break group 

and not considered for further lowest break groups. However, in 

composing the number of break groups, this platoon would be 

counted as having two breaks. 


The lowest break in the chain of command could occur at any 

point. Table 1 shows where the lowest level break occurs and 

lists the number of platoons per battalion in each cn the 

categories. 

Table 1. 

Frequency at Which the : Frequency Distribution of Lowest 

Lowest Break Occurred : Break by Battalion 

. 

Level of Platoon . Battalion Level of Lowest Break 

Lowest Break Freq. % : SL PS PL CC NONE 

. --- 

: 

SL 20 33 : V 3 0 3 3 3 

. 

PS 16 27 : 

. 

W 4 5 2 0 1 

PL 10 17 : 

. 

X 3 2 1 1 5 

cc 4 7 : 

. 

Y 6 4 10 1 

NONE 10 17 : z 4 5 3 0 0 

Table 2 gives similar information for the analysis approach 

considering the total number of breaks within each platoon 

focused chain of command. As there were four levels within each 

chain, a range of zero to four breaks was possible. 

Correlations indicating the relationship between the lowest 

break and the number of breaks in the vertical cohesion chain of 

command with the performance scales are listed in Table 3.

Table 2. 

Frequency of Total Number : Frequency Distribution for the 

of Breaks per Platoon : Number of Breaks by Battalion 

. - - . _.-..-__. 

Total Number Platoon : 

of Breaks Frey. % : Battalion 

. 

*- 

. 

Number of Breaks 

0 1 2 3 4 

------- - _ _ .__ _,. _ 

0 10 17 : 

. 

v 3 3 6 0 0 

1 17 29 : w 1 3 6 2 0 

2 21 34 : X 5 3 2 2 0 

: 

3 9 15 : Y 1 6 2 2 1 

: 

4 3 5 : z 0 2 5 3 2 

Table 3. Lowest Break and Number of Breaks Correlated with 

Performance Measures 

Type of 

Measure Platoon Performance Rated By: 

oc cc PLT 

-- 

OVERALL 

LOWEST BREAK .37 . 03 .33** .zG"-k 

--_-. _ 

---. 

NUMBER 

OF BREAKS -*Zig* -.34* -.37** -.44*** 

~-_- __----_ 

* p < .05 ** p < .Ol *** pc -001 

Figures 1 and 2 illustrate the relationship between the mean 

overall performance scores and the lowest break and number of 

breaks conditions, respectively. Analysis of variance provides 

an F of 3.74, p < . 01 and an F of 4.47, p < .004 for the data in 

Figures 1 and 2, respectively. Similar results were obtained by 

using any of the other methods of computing the performance 

measures. 

Examination of Figure 1 reveals that platoon performance is 

most degraded when either the PS or the PL is at the position of 

the lowest break. Performance is better than average when the 

lowest break in vertical cohesion occurs at the CC level and is 

best when the vertical chain has no breaks at all. Since 

performance measurement was at the platoon level, some cloudinrj 

435

Figure 1. Lowest Break in Vertical 

Cohesion by Mean Platoon Performance 

0.75 

0.5 

0.25 

0 

-0.25 

4.5 

A- 

I -,---L-i 

SL PS PL CC NONE 

.WTY POSITION OF LOWEST BREAK 

Figure.2. Number of Breaks in Vertical 

Cohesion by Mean Platoon Performance 

0.75 

0.5 

0.25 

iz 0 

4 

3 -0.25 

0 

t -0.5 

i -0.75 

Y -1 

.-e.. .._. / ,---.--_ 

..-i .,-. _-,- --. 

& _______---_d 

4 3 2 1 0 

NUMBER OF BREAKS 

436

-- 

of the results occurred at the SL level of breaks. Seldom would 

one find all SM-SL links within a platoon equally rated. 

Therefore, taking an average SM-SL cohesion rating for the three 

squads within a platoon moderated particularly strong or weak 

links. A break at the SM-SL level would meet the z 5 -.5 

criterion only if one or more of the SM-SL links were extremely 

weak. Nevertheless, good links in other squads could compensate 

and result in the platoon having acceptable performance. This, 

and other explanations are being studied. 

Examination of Figure 2 reveals additional findings. 

Generally, the fewer cohesion breaks there are, the better the 

performance with performance being best when there are no 

cohesion breaks at all. Performance is maintained at an average 

level with one or two breaks. Additional breaks in cohesion 

correspond to less than average performance. 

In summary, while a causal relationship can not be inferred, 

it appears that the strength of vertical cohesion as measured 

prior to engagement is a good predictor of platoon performance at 

a Combat Training Center. Vertical cohesion appears most 

important to platoon performance at the top platoon leadership 

levels, that of PS and PL. Where cohesion breaks at this level, 

performance tends to be less effective. However, when vertical 

cohesion is strong (that is, when subordinates see their 

superiors at taking care of them and being skilled), performance 

is strong. These findings are important because they 

quantitatively confirm "common lore"; they suggest the cohesive 

strength of a chain can be measured, and they indicate that the 

success of any efforts to increase or maintain the strength of 

vertical cohesion in a platoon focused chain of command can be 

assessed against a clear criterion measure. 

REFERENCES 

Siebold G.L. and Kelly, D.R. (1988a) The impact of cohesion on 

platoon performance at the Joint Readiness Training Center. 

Technical Report 812. Alexandria, VA: U.S. Army Research 

Institute for the Behavioral and Social Sciences. ADA 

202926. 

Siebold, G.L; and Kelly, D.R. (1988b) A measure of Cohesion which 

predicts unit performance and ability to withstand stress. 

Proceedinss: Sixth Users' Workshop on Combat Stress, San 

Antonio, TX, 30 Nov-4 Dee 1987. Consultation Report 88-003. 

Fort Sam Houston, TX: Health Care studies and Clinical 

Investigation Activity, Health Services Command. 

437

THE USE OF INCENTIVES IN LIGHT INFANTRY UNITS' 

Twila J. Lindsay and Guy L. Siebold 



The research described in this paper is part of a larger 

project to examine the home station determinants of subsequent 

small unit performance at U.S. Army Combat Training Centers. 

This paper focuses on describing the patterns of utilization of 

standard incentives in units and the extent to which these 

patterns were associated with other organizational variables ,and 

small unit performance. The incentives examined were llPublic 

recognition for a job well done", "Passes", "Awards", 

I'Specialized training coursesIt, "Letters of appreciation or 

commendationl', and ttPromotions.lV 

METHOD AND SAMPLE 

Data were collected by questionnaire from soldiers within 

five light infantry battalions (N = 60 platoons) at three points 

in time. The first point in time (base) was 4-6 months before 

each battalion was scheduled to go through a training rotation at 

either the U.S. Army National Training Center (NTC), Fort Irwin, 

CA or the U.S. Army Joint Readiness Training Center (JRTC), Fort 

Chaffee, AR. The second point in time (pre-rotation) was 2-4 

weeks before the rotation; the third point (post-rotation) was 

about 2-4 weeks after the training rotation. There were two 

other sources of data : a) platoon mission performance ratings at 

JRTC by the platoon level observer/controllers (O/Cs) on 23 of 

the. platoons, and b) company commanders' ratings of the mission 

performance of their subordinate combat platoons at NTC/JRTC. 

Base and pre-rotation questionnaires were given typically to 

all soldiers (squad members through platoon leader) in one 

company at one time in a classroom or dayroom setting. The 

soldiers responded on machine-readable answer sheets. The 

questionnaires consisted of about 160 items and took the average 

soldier about 30 minutes to complete after instructions. Postrotation 

questionnaires were short (21 items plus some unit and 

position identification questions) and took soldiers less than 10 

minutes to complete; responses were made on the questionnaire 

itself. Post-rotation questionnaires were given at the start of 

group interviews to four separate groups of soldiers in a company 

(platoon leaders, platoon sergeants, squad leaders, and members 

of one intact squad). Post-rotation questionnaires, along with 

the subsequent group interviews, were usually given in an office 

or dayroom setting. 

The base and pre-rotation questionnaires contained items or! 

'The views expressed in this paper are those of the aUthOrS i-zne 

do not necessarily reflect the views of the L:,S. Arm'; Research 

Institute or the Department of the Army 

--. .- - . - __

incentive utilization, scales measuring important interpersonal 

and organizational constructs, and various demographic items. 

The post-rotation questionnaires focused on soldier perceptions 

(self ratings) of mission performance during their recent 

rotation. In other words, the base and pre-rotation 


(predictors) of performance, including utilization of incentives; 

the post-rotation questionnaires (platoon self ratings) and 

ratings by the O/Cs and company commanders functioned as 

criterion measures of that performance. 

The analyses prepared for this paper focused on the 

responses from the squad members to the pre-rotation 

questionnaire which included a measure of incentive use. The 

soldiers assessed the utilization of each incentive; an 

aggregation of responses to the items was used to assess the 

total level of incentive utilization. The use of each incentive 

was assessed by a five point scale: 1 = seldom used, 2 = used 

occasionally, sometimes for the wrong people, 3 = used 

occasionally, given to the right people, 4 = used often, 

.sometimes given to the wrong people, 5 = used often, given to the 

right people. A two dimensional response scale was used due to 

shortage of questionnaire space. 

RESULTS 

The distribution of overall individual squad member 

responses assessing the utilization of each incentive is 

illustrated in Figure 1. The figure indicates that giving 

'lPasseslt was the incentive most frequently utilized and the 

incentive most often given to the right soldier. The least 

utilized incentive was "Letters of appreciation or commendation." 

The incentive seen as most often given to the wrong person was 

*8Promotions.n 

This incentive utilization pattern was similar across the 

five battalions and for most companies. Most variation in the 

utilization patterns was across platoons. This finding may 

indicate that there was an attitudinal component to the ratings 

which may have biased their accuracy. Nonetheless, the overall 

responses of the soldiers, as well as the platoon mean 

utilization levels shown in Table 1, suggest that, on the whole, 

incentives are not as frequently or effectively utilized as they 

might be'. 

A key focus of analysis in this research was to estimate the 

relationships between use of incentives, standard organizational 

variables, and platoon performance. The estimates of these 

relationships were needed to develop a working model of the 

interactions among the variables. Such a working model, in turn, 

was needed to develop a more thorough model for use in designing 

programs, tools, or interventions to enhance unit performance. 

In the analysis for this paper, the authors examined a set 

of standard organizational variables to find their relation to 

the use of incentives: 1) company learning climate, 2) job 

satisfaction, 3) platoon pride, 4) expectations that the NTC/JRT.C 

rotation would be valuable training, 5) motivation for the 

3 39

%5 (+ 

40- 

‘-J 

3 o- Lt) 

2 o- 

1 3- 

‘) 

2 5 

17 

llll-n10 

1 2 3.4 5 

PUBLIC RECOGNITION 

1 

25 

2 3 - 

- 

17 

FIGURE 1. UTILIZATION OF INCENTIVES BY SCXJAD MEMBERS 

7 

%tj o- 

40- 

! O- 

30- 

20- 

1 o- 

x5 o 

4 0 

3 0 

20 

10 

0I 

27- 

1 

33- 

1 

1 

-- 

KEY 

1 =seldom used 

2=occasionally used, wrong person 

3=occasionally used. right person 

4=used often, wrong person 

5=used often, right person 

27- - 

28 

16- 16- 

14 

I 

5 II 

25 

23 

2 3 

17 

l-n 

8 

5 

4 

PASSES 

AWARDS 

18 

11 

-TRAINING COURSE LETTER OF APPRECIATION PROMOTION

----------~-_-~._-.-- ..-- .- . ..-..._-.-__ _------_- 

-.. ..- ~_ 

Table 1. Overall Platoon Means and Standard Deviations for 

Utilization of Incentives (N = 60 platoons) 

Incentive 

Public recognition for a job 

well done (1) 

Passes (2) 

Awards (3) 

Specialized training courses (4) 2.5 

Letters of appreciation or 

commendation (5) 

Mean ___ SD 

2.6 . 51 

2.8 . 54 

2.5 .48 

.52 

2.4 � 52 

Promotions (6) 2.6 .50 

Incentives - aggregated (7) 2.6 .42 

Table 2. Correlations Between Incentive Items and Organizational 

Variables and Performance Criteria 

Organizational Variables 

& Performance Criteria 

Learning Climate 

Incentives (see Table 1) 

t11 (21 (3) (41 (51 (61 (7) 

.60 .53 .56 .52 -38 .56 .74 

Job Satisfaction .61 .43 .62 .53 .49 .68 .71 

Platoon Pride . 48 .54 .62 .40 .44 .46 .67 

NTC/JRTC Expectations .61 .43 .46 -40 -54 .55 .61 

O/C Criterion Ratings . 19 .32 .45 -27 -18 .42 .39 

Company Commander Ratings -.05 .06 .29 -22 -.08 -.03 .09 

Platoon Self Ratings I 09 .23 .25 -13 .005 .23 .19 

Note: N = 60 platoons for correlations in first four rows; all 

correlations = pc.01. For O/C Ratings, N = 23 platoons: r values 

of .32 or higher = p

otation, and 6) general job motivation. These variables were 

selected because it was felt that incentive utilization would 

affect or be affected by these variables and that the latter 

should directly impact upon unit performance. In particular, it 

was anticipated that the use of incentives would relate to both 

event (NTC/JRTC) motivation and qeneral motivation. 

Table 2 presents platoon level correlations between the 

utilization of incentives (specifically and in the aggregate) and 

four key organizational variable scales. The reader will note 

that the use of incentives in the aggregate was more strongly 

correlated with the organizational variables than were the six 

specific incentives. Of the specific incentives, "Public 

recognitiontV, llAwardsl'., and '8Promotions'8 were the more strongly 

correlated. Table 2 also presents the platoon level correlations 

between the utilization of incentives and the three types of 

performance criteria (O/C ratings, commpany commander ratings and 

platoon ratings). While a few of the correlations reached 

statistical significance, the correlations are not that strong, 

particularly in comparison with those between NTC/JRTC motivation 

and unit performance or between platoon pride and performance 

(presented later). Thus, as suspected, the utilization of 

incentives seem not to be strongly associated with good unit 

performance but is strongly associated with other factors which 

more directly affect performance. 

Based on the pattern of highest inter-correlations and a 

little logic, the authors developed a tentative model describing 

how incentives might interact with other key organizational 

variables to impact upon platoon performance. The model, at this 

stage, must be considered only hypothetical: nevertheless, it 

provides a good starting point for subsequent inquiry. The model 

is protrayed in Figure 2. 

DISCUSSION 

While incentive utilization seems to play an important part 

in supporting variables directly impacting on unit performance, 

incentive utilization in the units examined was nonetheless low. 

This indicates both that leaders can more effectively use 

incentives and that, with more effective utilization, the 

numerical relationships found in this research might change. 

Since the aggregate use of incentives was more strongly, 

correlated with important organizational variables than the 

individual incentives, leaders may be able to shift from the use 

of constrained or slow-to-process incentives, or ones that take 

the soldier away from the unit (passes), to the use of incentives 

which are more efficient or effective (e.g., public recognition 

and awards). In interviews conducted at the post rotation, it 

was found that a major limitation on perceived incentive 

effectiveness was the length of time that occurred between the 

act or basis for the incentive and actual receipt of the 

incentive. Simply put, incentives should be used more and 

Processed more quickly. If this is done, unit performance should 

be Significantly enhanced. 

------a--...-~ .---. . ,

LE 

A 

D 

E 

0R-) 

S 

H 

I 

P 

I; 

L c 

E L 

A I 

RM 

NA 

’ T 

NE 

G 

4 

FIGURE 2. (TENTATIVE ) INCENTIVE UTILIZATION IMPACT MODEL+ 

With Direct Inter-Scale Correlations 

VARIABLES b. c. d. e. 1. g. h .0/c h.co. h.plt 

a. Learning climate .79 .74 .81 .60 .60 .71 .52" .17 .30' 

b. Platoon pride .67 .82 ‘53 -62 .74 .57" .25 .26' 

c. Incentive utilization .71 .61 .55 .55 .39’ .09 ..l 9 

d. Job satisfaction .67 .74 .77 .65" .23 .22' 

e. eNxTpce?t!X%s .82 .55 -55" -.17 .08 

f. NTQJRTC .75 .65" .16 .07 

9. 

OWfl 

Job motivation -63" ,370. .31" 

Number of platoons 6 0 60 6 0 60 60 60 23 42 58 

‘rqx.05 *‘r=p

COHESION IN CONTEXT 

Guy L. Siebold 



In the last few years, there has been a substantial amount 

of research on military unit cohesion. The research, by this 

author and others, has addressed some key questions: what is 

cohesion, how does it differ from similar constructs {e.g., 

bonding and morale), how can it be measured, what impact does it 

have, and how does it change over time. However, left relatively 

unaddressed are the questions of how cohesion is associated with 

other major job related and organizational constructs and which 

of these constructs, relative,to each other, really makes a 

difference in organizational performance. The research presented 

in this paper was designed to start to answer these latter two, 

unaddressed questions. Specifically, the research examined the 

association between unit cohesion and unit performance directly 

and in the context of the platoon average degree of job 

satisfaction and platoon level of training proficiency. 

Method and Sample 

Data were collected by questionnaire from soldiers (squad 

members, squad leaders, platoon sergeants, and platoon leaders) 

within five light infantry battalions at three points in time. 

The first point in time (Base) was 4-G months before the 

battalion was scheduled to go through a training rotatior at 

either the U.S. Army National Training Center (NTC), Fort Irwin, 

CA or the U.S. Army Joint Readiness Training Center (JRTC), Fort 

Chaffee, AR. The second point in time (Pre-rotation) was 2-4 

weeks before the rotation: the third point (Post-rotationj -&as L- 

4 weeks after the training rotation. Questionnaires were 

administered by researchers from the U.S. Army Research 

Institute. 

Base and pre-rotation questionnaires were given typically to 

one company of. soldiers at a time in a classroom c.r dayroom 

setting and, being up to 160 items long, took the average soldier 

about 30 minutes to complete after instructions. Soldiers 

responded on a machine readable answer sheet. Post-rotation 

questionnaires were short (21 items plus some unit and position 

identification questions) and took soldiers less than ,lO minutes 

to complete; responses were made on the questionnaire itself. 

Post-rotation questionnaires were given at the st;?rt of 

.---.--- --.--. - -....... -_ - _..., 

The views expressed in this paper are those cf the :ii;thor 

and do not necessarily reflect the views of the c.C. *_^ >- v- 7 II . . 1. 

Research Institute or the Department of the Arm{.

interviews to groups of*soldiers in a company. All the platoon 

leaders in a company were one group; all the platoon serge,ants 

were a second group; two thirds of the squad leaders were a third 

group; and all squad members from one squad in the ccmpany formed 

a fourth group. Post-rotation questionnaires, along with the 

subsequent group interviews, were usually conducted in an office 

or dayroom setting. 

The base and pre-rotation questionnaires contained scales 

measuring cohesion and other job related and organizational 

constructs along with various demographic items. The postrotation 

questionnaires focused on soldier perceptions (self 

ratings) of mission performance during their recent rotation. In 

addition, for the two battalions which rotated through the JRTC, 

ratings on leader and platoon performance were provided just 

after the rotation by the observer/controllers who observed each 

platoon'during the rotation. In other words, the base and prerotation 


(predictors) of performance: the post-questionnaires and ratings 

from the observer/controllers functioned as criterion measures of 

that performance. The total sample from the 5 light infantry 

battalions was 60 platoons: 45 line platoons, 5 scout platoons, 

5 mortar platoons, and 5 anti-tank platoons. 

Questionnaire items were structured to form scales measuring 

the constructs investigated. Scales addressed the following 

aspects of cohesion: squad member horizontal bonding, platoon 

leadership team (platoon leader, platoon sergeant, and squad 

leaders) horizontal bonding, vertical bonding between the squad 

members and the platoon leaders, platoon pride, and Army 

identification. Squad member horizontal bonding (SMHB) items 

measured whether squad members felt they cared about one another 

and worked together well as a team. Platoon leadership team 

horizontal bonding (LHB) items measured the extent to which the 

platoon leaders cared about one another and worked well together 

as a team. Vertical bonding (VB) items measured the extent to 

which subordinates felt their leaders were skilled and looked out 

for the needs of their subordinates. Platoon pride (PRIDE) items 

measured the extent to which members were proud of being in their 

platoon and played an important part in it. Army identification 

(AI) items measured the extent to which soldiers felt a part of 

the Army and that its successes were their successes. Soldiers 

responded to the cohesion questionnaire items using a five point, 

strongly agree to strongly disagree response scale. 

Scales also addressed constructs such as job motivation (job 

involvement), JRTC/NTC motivation, expectations of the value of 

JRTC/NTC training, job satisfaction, company learning climate, 

and level of task and mission training. As examples, the co-mpany 

learning climate items measured whether soldiers were given a lot 

of responsibility, got feedback on how they were doing, and were 

helped to learn from their mistakes; the job satisfaction items 

measured whether soldiers felt their work was interesting and 

useful. Most of the scales, or earlier versions of them, had

een used in prior research and thus had known or expected 

characteristics. 

For criterion measures of platoon performance, the ratings 

of three groups were used: observer/controllers (OCs) at the 

JRTC, company commanders (COs) rating their three platoons, and 

the platoon members. (PLT) themselves. The OC ratings were done 

at the JRTC after the rotation was completed: the CO and PLT 

ratings were made during post-rotation data collection. Each 

rater rated each platoon, about whose performance he was 

knowledgeable, on its performance during each mission conducted 

(e.g., movement to contact, deliberate attack, and defense). A 

rater's average rating of the platoon across observed missions 

became the criterion score. Raters used a 4 point scale: 

Trained, Needs a little training, Needs a lot of training, 

Untrained. PLT ratings were computed by averaging criterion 

scores across the four positions (squad member, squad leader, 

platoon sergeant, platoon leader), i.e., equally weighted by 

positon. Readers can contact the author for additional 

information on any of the scales. The predictor data used for 

the analyses in this paper are from squad member responses only. 

Leader perspectives will be addressed in future analyses. 

Results 

Scales. The questionnaire predictor scales used in this research 

typically had means of about 3.1 - 3.6 on the five point response 

scale, with standard deviations around 1.0 at the individual 

respondent level and around .5 as averaged at the platoon level. 

Scale reliability estimates (alpha values) were typically around 

the .8 level. The platoon performance criterion scales had the 

following characteristics: OC ratings - Mean = 2.1, SD = .41; 

CO ratings - Mean = 3.2, SD = .43; PLT ratings - Mean = 3.2, SD = 

. 32. Number of platoons rated were: OC - 23; CO = 42; PLT = 59. 

Direct impact of cohesion. As noted in Table l.a., all the 

aspects of cohesion correlated significantly with platoon 

performance as rated by the OCs at JRTC and as rated by the 

platoon members. The cohesion - performance relationship based 

on CO criteria was in the same direction but at a lower, nonsignificant 

level. Also, as noted in Table l-b., the different 

aspects of cohesion all correlated significantly with each other, 

although at a notably lower correlation coefficient level with 

the Army identification aspect. An initial factor analysis of 

squad member responses indicated that Army identification was a 

separate construct from the others, that squad member bonding was 

a separate construct, and that the other scales were linked to 

perceptions about the platoon leaders. Platoon pride loadings 

were split between the squad member and leader factors. 

Relation of cohesion to other constructs. As noted in Table 

l.c., other standard organizational constructs and level of 

training were related to the cohesion scales. In short, there 

-I -i ii

Table 1.a. Correlations Between cohesion and Platoon Average 

Mission Performance at JRTC or NTC by Rater of Performance 

Cohesion Scale 

SMHB 

Performance Raters 

oc co PLT 

. 52** .31* .30* 

LHB . 52** .15 .38** 

VB .47** .18 .31** 

PRIDE . 57** .25 .26* 

----._ 

AI . 43** .20 .24* - 

Table 1.b. Intercorrelations Among Cohesion Scales 

SMHB 

LHB VB PRIDE AI 

. 74 .63 .a4 .54 

LHB . 77 .a1 .47 

VB .73 .45 

Table l.c. Correlations Between Cohesion and Standard 

Organizational Constructs 

Construct 

Job Motivation 


SMHB LHB VB PRIDE AI 

. 65 .66 .51-- .74 -71 

JRTC/NTC Motivation . 47 .49 -31. .62 .65 

Job Satisfaction . 67 -72 -59 .82 .69 

Learning Climate 

Task/Mission Training 

.67 .81 -75 -79 .64 

. 30 .31 -48 -31 .29* 

Note : * = p

-_._____ __ _.._^.._ _.. _.^ _-- 

was a great deal of inter-dependence among the predictor 

constructs. This, of course, led to some analytic concerns, in 

particular about whether the cohesion construct correlations with 

the criteria were independent or due to the influence of some 

underlying factor or other construct. An examination of all the 

construct inter-correlations and exploratory factor analyses 

suggested that there might be a soldier general perception of job 

conditions accounting for the level of inter-correlation. To 

investigate this possibility, the correlations were re-computed 

controlling for the mean platoon job satisfaction. The results 

are shown in Tables 2.a. and 2.b. 

Table 2.a. Partial Correlations Between Cohesion and Platoon 

Average Mission Performance at JRTC or NTC by Rater of 

Performance, Controlling for Job Satisfaction 


_ Performance Raters 

oc co PLT 

SMHB 

-- 

. 16 .21 

___--___-_--_- 

.21 

LHB . 10 -* 02 . 32** 

VB . 15 -05 .23* 

PRIDE . 11 .11 .13 

AI -.04 .06 .13 

* = p

with) strong platoon performance at the JRTC or NTC. Among all 

the predictor constructs, JRTC/NTC motivation and job motivation 

were the strongest correlates with the OC criterion ratings, .65 

and .63 respectively. Their correlations were also reduced when 

job satisfaction was controlled, to .34 and .27. 

Regardless of the large common variance among the predictor 

construct scales, a critical concern was whether the common 

variance among the predictor scales was due in part to the level 

of task and mission training. If this were the case, then the 

correlations with the criteria ratings could be simply an 

instance of high or low training at pre-rotation resulting in 

high or low performance at JRTC or NTC. To examine this, partial 

correlations were again computed controlling for the squad 

members' pre-rotation estimates of their platoon's level of task 

and mission training (measured using the same response scale as 

the criterion raters). The results are given in Table 3. 

Table 3. Partial Correlations Between Cohesion and Platoon 

Average Mission Performance at JRTC or NTC by Rater of 

Performance, Controlling for Pre-rotation Training Level 


SMHB 

Performance Raters 

oc co PLT 

. 49** .32* .22 

LHB .49* .16 .30* 

VB � 

42* .20 .lEj 

PRIDE . 55** .27* .I7 

AI . 39* .21 .16 -- 

* = pc.05; ** = PC.01 N= 20 39 55 

As Table 3 shows, the partial correlations (with perceived 

training level controlled) are not much different from the direct 

correlations given in Table 1.a. Also, there was no or little 

change from the direct correlations for the other major predictor 

constructs, with training controlled. A fair interpretation of 

Table 3 would be that cohesion adds significantly to the mission 

performance of platoons at training centers such as the JRTC and 

the NTC beyond that portion of performance due to level of 

training. In other words, cohesion and other job related and 

organizational constructs provide a separate, important 

contribution to performance. Speculating from the data in this 

research, one can estimate that separate contribution to be in 

the range of 10 - 40% of the performance variance. Obviously, 

further research remains to be done in sorting out the nature and 

inter-relationships of the predictor constructs and in 

determining the constructs' relationship with performance across 

the range of construct values (e.g., low, medium, and high levels 

of the constructs).

EVALUATION OF THE ARMY’S FINANCE SUPPORT COhlMANlI 

ORGANIZATIONAL CONCEPT 

Raymond 0. Waldkoetter, William R. White, Sr.! tend 

Phillip L. Vandivier 

U.S. Army Soldier Support Center 

Fort Benjamin Harrison, IN 46216-5700 

A new-modular concept of organization was developed for the Finance Support 

Command (FSC) missions/functions to provide direct financial support to commanders. 

units, and activities on an area basis. An Army restructuring initiative resulted in the reorganization 

of the Finance Corps’ force structure with the planned FSC being a modular 

TOE, that is sized depending on the population supported with two to six assigned finance 

detachments. Before implementing the new organizational concept, a decision was later 

made at Headquarters, Department of the Army to conduct an evaluation to determine if 

the modular concept FSC would have the capability to perform the minimum essential 

wartime tasks. Those wartime tasks place the FSC as a focal point for providing commerci:tl 

vendor and contractual payments, various pay and disbursing services, and limited accounting 

on an area basis. Finance units must also be prepared to protect and defend themsrlvcs 

to continue sustainment of the force and maintain battle freedom for combat units to engage 

the enemy. 

A study team identified mission and functions to be performed by the FSCs in w:lrtime. 

The relationships of the FSC with organizations above, below, and parallel were 

outlined along with the interactions between these organizations. The study team estahlished 

criteria to be met with current doctrine and “principles of support” and “standards of 

service” as the foundation for battlefield finance support functions. A notional concept NX 

developed, staffed and evaluated to determine the preferred FSC unit force structure. With 

the assistance of subject-matter experts (SMEs) the capability of the preferred FSC design 

and related functions was analyzed, to address military finance support requirements for 

various theater scenarios, across the spectrum of contlict and in different geographical 

locations. Then, major Army commands (MACOMsj concurred with the recommeIld~ttior1 

to adopt the modular concept FSC organization with the proviso that the concept he dul) 

evaluated prior to implementing actions. 

The Soldier Support Center (SSC) hosted a MACOM level Finance Study Advi?;orF 

Group (SAG), 30 June - 1 July 1988, to further assess the proposed modular organizational 

design. The SAG recommended fielding the modular design and conducting an on-site fiel:i 

evaluation of the design prior to world-wide implementation. The Department of the Arill>.. 

Deputy Chief of Staff for Operations (DCSOPs) concurred with bc.)th recommend~ltii-;ns. 

The views expressed in this paper are those of the authors and do not nzcessitrily reflect tilt 

view of the Soldier Support Center or the Department of the Army.

This field validation complied with mandatory guidance directing field validation for doctrinal, 

training, organizational, leadership, and materiel products before operational use 

(SSC, 1989). The field validation was to determine, then, if the modular organizational 

design was capable of supporting battlefield requirements (SSC, 1990a). Validation methodology 

was based on approved operational and training evaluation procedures and coordination 

of critical issues and criteria (TRADOC, 1987) with MACOMs and G3/J3 staffs, an 

integral part of the training, force structuring, and TOE approval process for finance units. 

METHOD 

The field validation was designed to be a “self-evaluation.” Evaluation materials 

were provided to the participating MACOMs who selected finance SMEs to observe the 

FSC unit structure, while the FSC was conducting an operational exercise or training and 

performing wartime tasks under simulated conditions (Thornton III & Cleveland, 199Oj. 

The SMEs were instructed to identify whether wartime missions/functions were in a category 

of “go”/“no go” or “unobserved,” according to the critical issues and related criteria 

they entered on the field validation data collections sheets. All “no go” situations were to bc 

explained as to which factor caused failure, such as doctrine, leadership, materiel, training, 

or organization. Required guidance and advisory assistance were furnished by the SSC 

throughout the evaluation. 

Major characteristics of the modular concept were to be operationally exercised 

during the field validation. It was to be determined that an acceptable level of wartime taskforcing 

and continuous operation is facilitated. Wartime and peacetime decentralized FSC 

detachment operations were to be effectively exercised with suitable support provided by 

the host unit. The designated issues and criteria were to be evaluated based on systematic 

SME observations during exercises or training. The SME evaluators were required to 

observe one FSC within their MACOM. The FSC and detachments were to be configured 

as described in the validation plan. It was requested every effort be made to control the 

wartime scenario so that realistic combat situations were experienced by the designated 

command and detachment personnel. The MACOM planning for a selected exercise/ 

training sequence ensured that the SMEs were aware of the purpose of validation requirements 

and fully knowledgeable in finance wartime operations. 

The SMEs entered the FSC issues (three) and criteria (35,21, and 2, respectively, 

per issue) on the data collection forms and were instructed to keep the issues and criteria in 

numerical sequence, providing then 58 possible rating observations having “go”/“no go” or 

“unobserved” alternatives. Eleven SME evaluators collected validation data with five participating 

in Korea (26-28 Jun 89) and the other six at Fort Hood, TX (20-22 Sep 89). The 

five SMEs in Korea were from the 175th Theater Finance Command, 176th Finance Support 

Unit, and the six at Fort Hood, TX were composed of two evaluators from Forces 

Command (FORSCOM) Headquarters, Finance and Accounting Division (Fort McYherson, 

GA) and four from Fort Hood, 3rd Finance Group, 502d Finance Support Unit. Three 

SSC referee-observers participated with the provisional FSC units in Korea and at Fort 

Hood to furnish whatever expertise might seem useful without causing any disruptive re;lctions. 

4 5 :

The three critical issues were formulated to cover all major concerns regarding the 

expected operational capability of the FSC organization: 

tiorts? 

3. Carl the modular concept FSC orgunization transitiotl from peace to wartime operll- 

With the criteria requirements subsumed for each issue indicating responses of “go”/ 

“no go” or “unobserved~” and space for comments from the SME evaluators, the collected 

responses were tested for significance using chi-squared. Although simple majority judgments 

are often employed to make decisions when deliberating on courses of action to be 

selected, it was decided that due to the operational consequences, the decision to adopt the 

FSC organization should be based on significant data comparisons to avoid any random or 

possibly biased observations. 


The SME evaluators responded to Issue 1 and its criteria with 150 (Goj, 12 (NO 

GO), and 124 (UNOBS) observations. Compared to the expected distribution of responses. 

it was found these responses were significantly different by chi-squared: X2 (2, N = 286) = 

112.81,~ c.001. While there is definitely a significant difference among the three categories 

of responses, only the difference between the “go” and “no go” would be significant. Even 

though the difference between the “no go” and “unobserved” would be significant, the 

meaning could not be clear since many comments related to the “unobserved” responses 

inferred that the techrzical wartime missions/functions were feasible (“go”). There was an 

overall impression that the technical wartime missions/functions can be performed. thc@~ 

equipment and certain procedures may act as constraints. For Issue 1 there were more 

“unobserved” responses than for the other two issues. Some “no go” responses resulted 

from observations of deficient transportation assets and of lack of sufficient staffing. “Unobserved” 

responses were further attributed to evaluators judging some tasks were feasible. 

but resources were not available to operate during the training and field exercises. Technology 

and staffing shortages were repeatedly cited as cause for non-evaluation (omitted) and 

“unobserved” responses, with some missions/functions tending to become evaluated as “no 

go” without specific available equipment/materiel. Again, many “unobsen?ed” responses 

acknowledged the potential validity of the “go’s: 

Issue 2 and its criteria showed evaluator responses of 146 (GO.), 17 (NO GO). and 3~ 

(UNOBS). Compared to the expected distribution of responses, it was found these responses 

were significantly different by chi-squared: X2 (2, N = 199) = 746.25, p C .(iOl. 

There is definitely a significant difference also among the three categories of rzsponser

here, and the other differences between the “go” and “no go” and “unobserved” are highly 

significant as well. There was a high degree of confidence that the lncticd wartime missions/ 

functions can be performed as a result of the observed field validation training/exercises. 

The “go’s” showed confidence related to maintaining unit strength and adequate logistical 

and communication support. Responses of “no go’s” and “unobserved” pointed up training 

and equipment concerns as did some non-evaluated (omitted) tasks. Company battlefield 

tasks were considered feasible but some evaluators mistakenly omitted replies. Medical 

care in the field, NBC, and Security problems were anticipated with the level of transportation 

serving as a crucial balance between “go” or “no go” decisions. 

Issue 3 and its criteria showed evaluator responses of 5 (GO), 1 (NO GO), and 7 

(UNOBS). Compared to the expected distribution of responses, it was found these responses 

were significantly different by chi-squared: X2 (2, N = 13) = 4.31,~ ~45. There is 

a significant difference among the three categories of responses but not in favor of � ‘go’s.” 

However, there was sufficient reason to conclude the FSC organization will be able to 

transition frompettce to war-rime operations. Some dissonance existed concerning how to 

best prepare for the transition process, but “go” responses from the Korean evaluators in 

the “most like” wartime setting, indicated no difficulties in preparing for the transition. The 

only “no go” reply disagreed with status of the TOE unit garrison structure as a basis from 

which to initiate an effective transition process. Some omitted responses resulted from the 

units not being able to clarify the intent of this issue. Enough observations did result to help 

modify guidelines for transitioning. 

The three issues and related criteria when summed showed evaluator responses of 

301 (GO), 30 (NO GO), and 167 (UNOBS). With 140 SME responses omitted due to 

equipment availability, lack of criteria clarity, or redundancy of meaning, 78% (498) of the 

possible 638 responses were recorded for the field validation. Compared to the expected 

distribution of responses, it was found these responses were significantly different by chisquared: 

X* (2, N = 498) = 221.22,~ c.001. Comparisons showed the SME evaluator “go” 

responses were highly significant exceeding “no go” and “unobserved”, separately and 

combined. These results indicate that responses in favor of “go” judgments could hardly 

occur by chance, or by chance only once per thousand measures in similar data sets. 

CONCLUSIONS 

By aggregating the interrelated SME evaluator responses for the three issues and 

criteria, findings were derived describing useful observations to show the potential operational 

capability of the FSC modular organization. 

Comments from the 175th Theater Finance Command (Korea) forwarding their 

validation data supported the operational capability of the FSC Modular Concept. Suggcstions 

were given to improve operations by 11lanning to solve equipment. personnel, and 

transportation constraints. 

The 3rd Finance Group (Fort Hood, TX) evaluation comments indicated the FSC

. 

operational capability was validated to perform most of the essential techzicul and tacticnl 

battlefield functions. It was noted that some deficiencies in communication equipment, 

transportation, and staffing could limit the FSC in performing its battlefield mission as 

described in the Finance Operations manual (FM 14-7, 1989). Also further noted by the 3rd 

Finance Group were possible problems in the FSC transition from apencefime to wnrtime 

configuration, if it organizes and trains differently duringpeacetime than it is expected to 

operate in wartime. 

Comments submitted with the FORSCOM validation data elements pointed out 

“that the FSC is a sound structure to provide finance support to commanders, units, and 

soldiers.” It was noted, however, “the proposed TOE is not designed with sufficient assets 

(personnel, communication equipment, vehicles) to operate tactically in a dispersed mode.” 

From the visit by three SSC referee-observers in June 1989 to Korea, a trip report 

officially described from that early preview most of the results experienced by units in 

conducting later operational and training exercises to validate the FSC Modular Concept. 

Their findings generally anticipated from their critique and review of training and field 

exercises what other SME evaluators would experience. Based on the review and on-site 

Korea and Fort Hood visits, SSC observers agreed that the FSC can be expected to accomplish 

the minimum essential wartime tasks under the modular concept with minor modifications 

in staffing and equipment (SSC, 1990b). With suggested planning a smoother transition 

can be facilitated by the modular concept frompeac&ne to wartime operations. 

REFERENCES 

1. Finance Operations EM 14-7). (1989) Washington DC: Headquarters, Department of the Army. 

2. Thornton III, G. C., & Cleveland, J. N. (1990). Developing managerial talent through simulation. 

American Psvchologist, 45, 190-199. 

3. U.S. Army Soldier Support Center (SSC). (1990a). Field Validation of the Finance Support Command 

(FSC) Modular Concent. Unpublished manuscript, Directorate of Combat Developments, Fort Harrison. tN 

4. U.S. Army Soldier Support Center (SSC). (1990b). Finance Materiel Requirements Stub. Fort Harrison. 

IN: Directorate of Combat Developments. 

5. U.S. Army Soldier Support Center (SSC). (1989). Personnel Service Command EX3 and Finance 

Support Command (FSC) Field Validation Plan. Fort Harrison, IN: Directorate of Combat Dcvclopmcnts. 

6. U.S. Army Training and Doctrine Command (TRADOC). (1987). Handbook for Onerational Issues and 

Criteria. Fort Monroe, VA: Advanced Technology (Reston, VA). 

_, 

4 5 a

LEADER INITIATIVE: FROM DOCTRINE TO PRACTICE’ 

Alma G. Steinberg and Julia A. Leaman 


for the Behavioral and Social Sciences 

Introduction 

Initiative has been considered to be an important component of good leadership, especially military 

leadership (e.g., Headquarters Department of the Army, 1983; Rogers et al., 1982; Borman et al., 1987). 

However, there has been very little research on the actual practice of initiative by military leaders. This 

paper looks at leader initiative in Army combat units in terms of the relationship between leader initiative 

and unit performance, inhibitors of initiative, and approaches for developing leader initiative. 

Army doctrine is “what is written, approved by an appropriate authority and published concerning the 

conduct of military affairs” (Starry, 1984, p. 88). Two doctrinal publications define and describe leader 

initiative. One focuses on the Army’s doctrine for combat on the modern battlefield and is articulated in 

FM 100-5 (Headquarters Department of the Army, 1982). It reflects “the views of the major commands, 

selected Corps and Divisions and the German and Israeli Armies as well as TRADOC” (DePuy, 1984, 

~86). According to FM 100-5, initiative is something that large unit commanders must encourage in 

their subordinates. Initiative means to “act independently within the context of an overall plan,” “exploit 

successes boldly and take advantage of unforeseen opportunities,’ “deviate from the expected course of 

battle without hesitation when opportunities arise to expedite the overall mission of the higher force,” and 

“take risks” (p. 2-2). 

The second doctrinal publication addressing the importance of leader initiative focuses on militan, 

leadership doctrine (Headquarters Department of the Army, 1983). Here initiative is defined as “the 

ability to take actions that you believe will accomplish unit goals without waiting for orders or 

supervision. It includes boldness” (p. 123). Emphasis is placed on the importance of communicating 

values, goals, and accurate information about the enemy and other factors that affect the mission to 

subordinates so that the subordinates, in turn, can use initiative to accomplish the mission when they 

are out of contact with the leader or higher headquarters. 

The data reported in this paper are from Army combat units. They were collected as part of a larger 

project conducted in support of the Center for Army Leadership and the Combined Arms Training 

Activity; the project focuses on determinants of small unit performance. Thus far, data have been 

collected from five light infantry battalions that went through rotations at the Army’s Combat Training 

Centers (CTCs). The goals of this project are to identify leadership and other factors important to unit 

effectiveness and readiness, and to develop interventions for improving these factors. 

Method 

The data presented here come from several sources. They include data collected from units just 

prior to their participation in a CTC rotation, data collected from units just after their participation in a 

CTC rotation, ratings of observer-controllers (OCs) at a CTC, and written take-home packages that 

provide feedback on unit performance at a CTC, as follows: 

(a) Pre-CTC questionnaire responses by squad members, squad leaders, platoon sergeants, and 

platoon leaders in battalions shortly before their CTC rotations. 

(b) OC Ratings of CTC performance for two battalions, 

‘The views expressed in this paper are those of the authors and do not necessarily reflect the 

views of the U.S. Army Research Institute or the Department of the Army. 

455 

.

(c) Take-home package observations on CTC performance, by 00, for 12 CTC rotations which 

/ took place during 1988, 1989, and 1990. 

(d) Individual and small group interview responses of squad members, squad leaders, platoon 

sergeants, platoon leaders, company commanders, battalion executive officers, and battalion 

commanders in five battalions shortly after they completed their CTC rotation. 

(e) Post-CTC questionnaire responses by squad members, squad leaders, platoon sergeants, 

platoon leaders, and company commanders in five battalions shortly after they completed their 

CTC rotation. 

1. Soldier views of initiative. 

Results 

Pre-CTC Questionnaires. Squad members, squad leaders, platoon sergeants, and platoon leaders 

from two battalions (n=600) were asked, “When the leaders in your unit talk about initiative, what do they 

typically mean?” About 85% of the respondents to this open-ended question indicated that initiative was 

seen as involving the performance of routine or SOP behavior, without being told and/or without being 

supervised. It involved the accomplishment of their own job or the job of the leader in his absence. The 

remaining 15% of the respondents indicated that when leaders encourage them to use initiative, they use 

initiative to mean: do what we tell you to do, do objectionable tasks (e.g.. extra work, unpleasant tasks, 

low-level work), and/or make the leaders look good. 

Post-CTC Interviews. Most of the responses to the post-CTC interviews (from five battalions) 

indicated that the respondents felt that initiative involved the activity of carrying out their jobs or taking 

over for an absent leader. Initiative was seen as the initiation or continuation of behavior without being 

told or without the supervisor’s presence. Several of the examples of initiative that were given involved 

the recognition of a problem and the request to a higher level leader to be permitted to follow a different 

course of action (that was still within the scope of their job). In addition, there were a few incidents of 

initiative reported which involved the recognition that something beyond one’s own immediate 

responsibilities needed to be done, and personally performing the necessary tasks to get it done. When 

asked directly about the importance of encouraging subordinates to carry out the commander’s intent by 

exploiting successes boldly, taking advantage of unforeseen opportunities, deviating from the expected 

course of battle, and taking risks, respondents indicated that these were not really high priorities. From 

battalion commander on down, they said, in essence, the main thing is to have subordinates at each 

level who are well disciplined, technically and tactically competent, and are motivated to do their jobs 

well. 

2. The relationshia between leader initiative and unit combat performan-. 

Interview respondents indicated that they felt initiative (i.e., accomplishing the job without being told 

and/or without being supervised) is very important for success in combat. Pre-CTC questionnaires were 

examined to determine whether leader initiative was, in fact, a predictor of unit performance at a CTC. 

Table 1 shows that pre-CTC squad member ratings of the level of squad leader initiative are significantly 

related to OC ratings of platoon performance at a CTC. Similarly, pre-CTC squad member and squad 

leader ratings of platoon sergeant initiative are significantly related to OC ratings of platoon performance 

at CTC. However, pre-CTC ratings of platoon leader initiative by squad leaders, platoon sergeants, and 

company commanders are not significantly related to OC ratings of platoon performance at a CTC. 

Initiative, in the context of doing one’s job in the absence of being told or supervised, is not 

perceived as the same as motivation. For example, OC ratings of platoon motivation and platoon 

initiative at a CTC were not significantly correlated (r= .12). Neither were OC ratings of how hard ?he 

platoon worked and tried hard to do as good a job as possible significantly correlated with their ratings 

of platoon initiative (r=.19). Furthermore, OC ratings of platoon motivation and how hard the pla?oon 

worked were not significantfy correlated with their ratings of platoon performance, whereas OC ratings or 

platoon initiative were related to OC ratings of performance (r= .45, p < .05). Even more support of the 

relationship between initiative and performance comes from the significant correlation (r = .44. p 6. .35:

etween OC ratings of platoon initiative at a CTC and post-CTC ratings of platoon performance by 

company commanders. 

Table 1. Correlations Between Leader initiative Rated Pre-CTC and Platoon Performance at CTC 

Pre-CTC Initiative Ratings 

Of squad leader by squad members 

Of platoon sergeant by squad members 

Of platoon sergeant by squad leaders 

Of platoon leader by squad leaders 

Of platoon leader by platoon sergeant 

Of platoon leader by company commander 

* p c .05; n = 23 platoons 

3. Inhibitors of initiative. 

Correlations with 

Platoon Performance Rated by OCs 

r = .41* 

r = .44* 

r = .43* 

r = .31 

r = .25 

r = -.13 

In doctrine (Headquarters Department of the Army, 19&X3), identified inhibitors of initiative are: lack of 

understanding the mission, lack of accurate information, and lack of understanding the frame of 

reference (i.e., values, goals, and way of thinking) of the higher level leader and the subordinates. Figure 

1 provides a summary of the inhibitors of initiative mentioned in the CTC take-home packages, the post- 

CTC interviews, and the post-CTC questionnaires. As can be seen from Figure 1, the reported inhibitors 

of initiative cover a broad range of areas and provide additional inhibitors to those identified in doctrine. 

These include micromanagement, unit climate, concern about the reaction of others, fatigue, and lack of 

motivation. 

4. ADDroaches for deVelODinCl initiative in subordinates. 

In post-CTC interviews, leaders (squad leaders, platoon sergeants, platoon leaders, company 

commanders, battalion commanders) indicated that they do try to develop initiative in subordinates. 

They focus primarily at the squad member and squad leader levels and use the following approaches to 

develop initiative: 

(a) They develop the prerequisites. Leaders frequently mentioned three areas that they felt were 

prerequisites for showing good initiative: good discipline, proficiency in performing the job, and 

self-confidence. They try to develop the first two with training and the third, confidench, through 

physical training (PT). 

(b) They tell subordinates to show initiative. 

(c) They provide opportunities for subordinates to perform the role of their leader. Typically 

squad members are told to take over for their squad leaders, either as temporary fill-ins or for 

developmental purposes. 

(d) They reward initiative. Those showing exceptional initiative during training exercises are 

nominated for awards. 

457

Figure 1. 

. 

INHIBITORS OF INITIATIVE 

CTC Take-home Packaaes 

By Source of Information 

- lack of information 

- poor operations orders 

- micromanagement 

Post-CTC Interviews 

- micromanagement and lack of trust 

- lack of information 

- not permitted 

- climate (lack of support for initiative, don’t make mistakes) 

- missions involving the larger unit (e.g., Bn as opposed to Plt) 

. no opportunity (e.g., in dead tent, OC restrictions) 

- it’s safer for your career and shows more loyalty if you don’t 

let the higher leader know his ideas aren’t good 

Post-CTC Questionnaire (n = 322) 

34% Lack of relevant information 

31% Fatigue 

29% Concern about superior’s reaction 

24% Lack of understanding the mission 

20% Lack of a clear solution to the problem 

17% Lack of motivation 

16% Fear of making a mistake 

6% Desire to avoid being noticed 

5% Concern about subordinate’s reaction 

12% Other (e.g., too much changing of missions, inexperienced 

leaders, lack of time due to changes in plans or late 

operations orders, micromanagement) 

(NOTE: Percents do not add up to 100% because respondents were 

instructed to indicate all reasons that applied.) 

458

Conclusion 

This paper focused on leader initiative and looked at the relationship of leader initiative to unit 

performance. In this context, both doctrinal and field views of initiative were presented. Initiative, in the 

sense of doing one’s job without being told and/or being supervised, is seen as very important by 

soldiers and leaders. It was shown that leader initiative is significantly correlated with unit performance, 

and yet is clearly distinguishable from motivation. The fact that field views of the inhibitors of initiative 

are broader than those presented in doctrinal sources suggests that the information gained in this 

research might be of benefit to doctrinal proponents as well as those developing leader training courses 

or conducting field training. 

References 

Borman, W. C., Motowidlo, S. J., Rose, S. R., Hanser, L. M. (1987). Development of a model of soldier 

gffectiveness, (AR1 Technical Report 741). Alexandria, VA: U.S. Army Research. 

Institute. 

DePuy, W. E. (1984). Letter, General W. E. DePuy to General Fred C. Weyand, Chief of Staff, Army, 18 

February 1976. In John L. Romjue, From active defense to airland battle: The develoDment 

gf Armv Doctrine 1973-1982. Fort Monroe, VA: U.S. Army Training and Doctrine Command. 

Headquarters Department of the Army, (1982). Ooerations, (Field Manual 100-5). Washington, DC: 


Headquarters Department of the Army, (1983). Militarv Leadership, (Field Manual 22-100). 

Rogers, R. W., Lllley, L. W., Wellins, R. S., Fischl, M. A., & Burke, W. P. (1982). Development of the 

P), (AR1 Technical Report 560). 

dexandria, VA: U.S. Army Research Institute. 

Starry, D. A. (1984). Commanders Notes No 3, Operational concepts and doctrine, 20 February 1979. In 

John L. Romjue, From active defense to airland battle: The develoDment of Army Doctrine 1973-1982. 

Fort Monroe, VA: U.S. Army Training and Doctrine Command. 

459 

. 

.

STARTING A TQM PROGRAM IN AN R&D ORGANIZATION 

Herbert J. Clark 

Brooks Air Force Base, Texas 

This paper reports the results of implementing a Total Quality 

Management (TQM1 Program in an Air Force research and development 

laboratory. It outlines how the Methodology for Generating 

Efficiency and Effectiveness Measures (MGEEM) was used to 

implement TQM, and describes the lessons learned in the process. 

The paper also gives guidelines for starting a TQM program and 

recommends using Organizational Development (OD) intervention 

techniques to gain acceptance of the program. Lessons learned 

stress the importance of choosing a skilled TQM facilitator, 

adequately training process action teams, and fostering open 

communications and teamwork to reduce resistance to change. 

GETTING STARTED 

People report they read the popular literature or hear a TQM 

briefing and come away with a general understanding of TQM 

philosophy, but no specific directions on how to get started. 

This condition is so common that, according to Kanji (19901, it 

even has a name: ‘Total Quality Paralysis!' 

Kanji's solution for overcoming this problem is to follow a fourstage 

TQM implementation procedure. It consists of collecting 

organizational information, getting top management support for 

TQM, developing an improvement plan, and starting new initiatives. 

Following these four steps, leads to commitment from the top, a 

united and coordinated middle management, and the data to make 

informed decisions -- essential conditions for TQM success. 

Behavioral scientists writing in the OD 1 i terature recommend 

similar procedures and have developed ------------ intervention ------ techniques for 

gaining management support for new initiatives. They consist of 

educational activities, questionnaires, team building exercises. 

and prescriptions of 'things to do' and 'things not to do.' 

French and Bell (19841 describe five types of interventions which 

range from working with whole organizations to working with teams 

and individuals. These interventions can be used in TQM programs 

to -increase participative management and intergroup cooperation. 

Coupled with TQM tools such as statistical process control, they 

can lead to increased productivity, better product quality, and 

enhanced customer satisfaction. Trying to introduce TQM without 

considering the behavioral dynamics of the organization 

Significantly reduces the chances for success, as illustrated 

below. 

460

AN ILLUSTRATION 

In 1988, the Air Force Human Resources Laboratory (AFHRL) started 

a Total Quality Management (TOM) program in the laboratory. The 

technique used to implement T4M was the Methodology for Generating 

Efficiency and Effectiveness Measure6 (MGEEM) described by Tuttle 

and Weaver (1986). MGEEM uses a group decision making technique 

to clarify an organization's mission, identify its customers, 

specify Key Result Areas (KRAs), and measure progress in the KRAs 

using mission effectiveness indicators. Air Force Regulation 25-5 

recommends using MGEEM to do TQM. 

Despite top management support, the reaction to starting MGEEM at 

AFHRL was negative. Had there been a vote at the TQM start-up 

meeting, it is unlikely that a majority of the laboratory staff 

would have endorsed implementing TQM or MGEEM. The commander saw 

no reasonable alternative to MGEEM, however, so he directed its 

implementation. 

Twenty months after the program began, support for MGEEM was still 

weak. Of the 94 (out of 380) people answering a laboratory TOM 

newsletter, 80% said TQM/MGEEM was of 'No Value' or 'Some Value.' 

Only 20% said it was of 'Moderate Value' or 'Significant Value.' 

Several written replies said to stop MaEEM. The attitude toward 

the TQM philosophy was more positive. 

MGEEM was rejected because people in the laboratory did not have a 

sense of ownership in the program. Although division-level 

management participated in selecting the MGEEM KRAs and 

indicators, they did not support using MGEEM in an R&D laboratory. 

Thie attitude was passed on to lower levels of management, so few 

people supported the program. This attitude prevailed, even though 

several of the scientists in the laboratory helped develop MGEEM. 

The finding that MGEEM was not widely accepted at AFHRL does not 

mean that it is an ineffective technique for implementing TOM. 

Some observers felt that MGEEM was rejected prematurely and did 

not receive a fair test. Others felt that its rejection may have 

been more a consequence of how management introduced MGEEM than 

its methodology. 

Had AFHRL used OD intervention techniques while implementing 'NM, 

it is possible that they would have chosen a more acceptable TQM 

approach. Three OD techniques which could have been applied are 

survey feedback, the confrontation meeting (Beckhard, 19671, and 

work teams. Advantages to this approach are that problem 

identification is based on survey data; top management and work 

teams define the problems and propose solutions: middle 

management and workers develop the specific TQM procedures; and 

the survey data provide a reference point for surveys administered 

after changes have been made. 

When employed by a skilled facilitator, these techniques increase 

the chance of everyone developing a sense of ownership in the 

procedures adopted. TQM tools, such as cause and effect diagrams, 

461

are used to examine the processes associated with product quality 

after the group has accepted the need for change. 

LESSONS LEARNED 

In December 1989, Clark (1989) reported several lessons learned 

during the TQM program at AFHRL. The following is a summary of 

additional lessons learned. 

Facilitators. --_-----e--- Facilitators must be familiar with TQM quality 

improvement procedures and with OD techniques for gaining program 

acceptance. Facilitators should also be able to train people in 

TQM and OD. It is best to use facilitators who are not a part of 

the management group that is initiating TQM. Facilitators need 

the independence and authority to run the program as approved. 

Organizations sometimes appoint their own facilitator and conduct 

a do-it-yourself TQM program. An alternative is to hire a fulltime, 

thoroughly trained facilitator from outside the 

organization who can offer TQM alternatives. Facilitators should 

not impose their own philosophy on an organization or direct a 

specific TQM approach. The organization should develop its own TOM 

approach based on its unique requirements. 

Process ------- Action ------ ----- Teams ----_-* (PATS) The role of PATS in a TQM program is 

to examine manufacturing and administrative processes and Improve 

the quality of service to the customer. Twenty months into the 

TQM program at AFHRL, 380 people were asked: How 

valuable are the process action teams at AFHRL? Twenty-one 

percent of the 94 people answering said, 'No Value.' Thirty - 

five percent said, 'Some Value'; 31% said, 'Moderate Value'; and 

13% said, 'Significant Value.' These results were surprising 

because, throughout the TQM program, people said the PATS were the 

most effective and worthwhile part of the program. We expected 

more people to say the PAT's were of significant value. 

Written comments from the survey showed that peopie who said 

PATS were of no value were either not aware of what the PATS were 

doing, felt the PATS created too much bureaucratic busy work, or 

thought the PATS were not addressing the right problems. 

People who rated them highly said the PATS increased 

communications, involved people from lower levels, and proposed 

effective solutions to problems. 

Most PATS at AFHRL worked on improving administrative procedures. 

There was less progress in improving the quality of the laboratory 

R&D product and customer satisfaction. PATS should spend a majcr 

portion of their time working on product improvement and customer 

Satisfaction. Excessive attention to administrative procedures 

can be a symptom of undue concern about management and too 11++"tile 

concern about customer satisfaction and product quality. 

PATS are not the solution to all problems. It is easy t0 defer 

decisions to a committee without exercising leadership. L??me

problems sent to/~+ATs could be easily solved by management in half 

the time. 

Training. Ty>ical TQM training programs consist of lectures on 

-m----w 

the philosop;lies of Deming, Juran, Crosby, and other well-known 

quality advocate 8 . There should be additional training on such 

Subject8 aq participative management, customer interface, process 

control application8 to non-manufacturing activities, and 

statistical analysis. 

Process Action Team8 need training on group participation skills, 

brainstorming, CaU8e and effect diagrams, and other TQM tools. 

Most expert8 orient this training toward8 the task at hand, Tather ., 

than toward8 people'8 feeling8 and personalities. Training in OD 

intervention techniques comes after management ha8 decided which 

OD techniques to apply. Unless people receive specialized 

training in OD and TQM, they do not know how to get underway. 

Communications. Good communication is fundamental to the 8UCCe88 

---------T---- 

O! any TQM program. Yet, many organization8 have poor 

communications. Upward communication is poor because manager8 

fail to listen. Downward communication is poor because managers 

want to protect their worker8 from what they con8ider to be 

irreievant information. The result is a communication gap between 

manager8 and workers. Change8 to this pattern can come about by 

recognizing the problem and training new behavior8 through 

Cla88rOOm di8CU88iOn and leadership example. 

Some organization8 increase communications, openne88, and teamwork 

by uaing newsletters. AFHRL started a newsletter halfway through 

it8 TQM program. The newsletter was distributed each month and 

invited everyone'8 participation. Informal conversations 

indicated there may have been more TQM di8CU8SiOn8 in the 

laboratory because of the newsletter. A newsletter keep8 the 

importance of quality and productivity gain8 visible to management 

and employees. 

Measuring ---a---- Quality ----s e-w and --------M-e Productivity. One way to measure quality and 

productivity in an R&D organization is to establish customer 

requirements, set goals, and measure progress toward8 reaching 

those goal8 in cooperation with the customer. Although this method 

i8' more appropriate for applied R&D project8 than for basic 

research, it can be used for both. In an R&D organization, there 

is usually le8S emphasis on measuring scientific progress through 

use of the traditional TQM statistical process control techniques. 

Surveys can be used to measure customer satisfaction. 

Resistance ------e--m to -- TQM. --- Some people resist any type of organizational 

change. They do not want to start a TQM program or any other 

program. They jU8t want to be left alone to do their work. 

Others fear a 1088 of responsibility while still other8 fear they 

may get some. Reactions range from outright argument against TQM 

to stonewalling and simply waiting out current management. 

463

Management must listen, but also lead. If data show that 

organizational problems exist, open discussions should take place: 

but it is up to top management to lead the organization. This 

does not prevent the use of OD techniques. In fact, the greater 

the problem and resistance, the greater is the need for OD. People 

become be1 fevers based on the enthusiasm, examples, ideas, and 

data presented by management. 

Whatever TOM strategy and tactics are adopted, they must be 

reviewed and updated at least once each year. This action 

accommodates criticism and conveys a sense of continually striving 

for improvement and acceptance of the procedures adopted. 

Labeis --_--- * Labels, such as MGEEM, TQM, MBO, and Zero Defects, can 

easily become scapegoats for people dissatisfied with a new 

management initiative. One way around this is to avoid using 

labels. The NASA Lewis Research Center, for example, calls its 

quality improvement program just that, a quality improvement 

program (Office of Management and Budget, 1990). Although NASA 

uses Deming principles and the ideas of other TQM experts, they 

intentionally avoid referring to their program as a Deming program 

or a TQM program. Their program is a combination of quality 

initiative3 uniquely patterned for their organization. This may 

be a good policy to adopt, since it can be more difficult to argue 

against a quality improvement program than a specific TQM program 

with a label. 

FADS 

Many who read this paper will be familiar with the long list of 

publications which tell how to improve organizational 

productivity, quality, and morale. A particularly good summary of 

fads has been published by John Byrne (1986). He tells rn a very 

entertaining way how fads come and go, and what are the latest 

fads. He says that too many modern managers are like compulsive 

dieters: trying the latest craze for a few days, then moving 0 R 

(P. 58). 

The theme of this paper is that things do not have to be that 

way. An initiative to increase productivity and quality can 

succeed and endure if people in the organization buy .into it. 

First, they have to believe they need a change; then they have %o 

agree to participate in the program. Because people are different 

and organizations are different, the approach must be tailored tc; 

the organization. 

Success requires a qualified facilitator or change agent who can 

teach people how to work as teams. Additionally, all levels of 

management must endorse and actively sponsor the management 

change. Workers must have goals which are consistent with t, h e 

overall goals of management. OD techniques can help gain 7, h t? 

required trust and cooperation needed to sustain a TQM program. 

All this takes time, patience, and considerable skif?. 

464

If TQM does not work as promised. we may have to admit that 

programs which rely on people's good will just won't work. As Ring 

Lardner, Jr. (1990) said about Communism in Eastern Europe: 

Communism like Christianity is good in theory, but given human 

nature, hard to put into practice. Perhaps the same can be said 

about TQM. 

REFERENCES 

Beckhard, R. (1967, March-April). The confrontation meeting. 

Harvard Business Review 45 

--e-e-- -----w-- --,,-,a --' 149-155. 

Byrne, J.A. (1986. January). Business Fads: What's in -and-out? . 

Business Week 

------m- ----AL' pp.52-58. 

Clark, H.J. (1989, December). Total --m-e quality --w-w -e-w management: --a--- an 

$Bpllcation in a research and ___--- development ---- -s------- laboratory (AFHRL-TP-85= 

58,--AD-Azi5-808>T--BFooks-AFB, TX: Special Projects Office, Air 

Force Human Resources Laboratory. 

French, W., & Bell, C.H., Jr. (19841. Organization -- ---e---e- development: ------ ----- 

Behavioral ---------- science --a---- _____-_------ interventions --- for organization ------v-- -- imBrovement ----,-,,A 

Englewood Cliffs, NJ Prentice-Hall. 

Kanji, GI.K. (1990). Total quality management: the second 

industrial revolution. Total quality &n~gement l(l),pp. 3-12. 

----- ---a- -,--,1- 

Lardner, R., Jr. (1990, March 191. --,: U S News ____ & World _ __ Report ,,-1 

Washington, DC. p. 27. 

Office Of Management and Budget. (19891. Quality ------ improvement e-w---prototype. 

----- Unpublished document available through the GASA Lewis 

Research-Center, Cleveland, OH. 

Tuttle, T.C., & Weaver, C.N. (1986, November). Methodology _ ____ for 

generating --v-e--- efficie_ncy --- and ------------- effectiveness measures -w-w---- (ggE_E@)i.a guide ---- 

--- for -a- Air ----- Force ----------- measurement -------we--- facilitators (AFHRL TP-86-36, AD-Al74 

574). Brooks AFB, TX: Manpower and Personnel Division, Air Force 

Human Resources Laboratory. 

465

32nd Conference of the Miiitarv TestinP Association (MTA: 

An officer, a social scientist, (and possibly A gentlematl) in the Royal 

Netherlands Army (R!iLA). 

Presentation by Co1 Dr. G.J.C. Roozendaal 

Head, Behavioural Science Division 

Directorate of Personnel RNLA 

The Royal Netherlands Army 

In peacetime the Royal Netherlands Army has 78,000 employees, consisting of 

23,000 regular servicemen, 43,000 conscript personnel and aimost 12,000 

civilian employees. The RNLA can rapidly reach its wartime strength of 

200,000 men and women by calling up reserve personnel. 

The Royal Netherlands Army is a volunteer-conscript army, with national 

service (for men only) lasting 14 to 16 months. [!I 

Women have the right to serve, but only as volunteers. In principle all 

posts are open to women. 

The Royal Netherlands Army has its own military psycholcgical and social 

service, comprising around 20 regular officers in the ranks from major up to 

and including brigadier-general. 

All these officers have graduated from a Dutch University in either 

psychology or sociology. 

Virtually all of these officers were given their basic training at the Royal 

Military Academy, after which they spent several years in active service as 

a platoon commander, company commander and/or a staff ofticer with an ac,tive 

unit. 

Only then have most officers completed their training as social scientists. 

Military behavioural scientists occupy various posts in different fieids of 

work. 

Allow me to give you some examples: 

1. The personnel manager of the Directorate of Personnel RNLA is a 

brigadier-general psychologist. 

2. There are four colonels who act as, amongst others: 

- Head of the Behavioural Sciences Division; 

- Commander of the Didactics and Military Leadership Training Centrt;: 

Instructor at the Royal Military Academy: 

- Head of the Individual Assistance Section. 

3. One iieutenant-colonel is Commander of the Seiyc:t.:luz Centre of the 

Royal Netherlands Army. 

In addition there are another fourteen officers in the ranks of ma:'-jr 

and lieutenant-colonel who occupy a wide range o!: !r.+se;irch, policy 22.: 

assistance posts. 

I shall now endeavour to use some examples to make .ir. :.Lt?ar tc: yell what 

exactly these officers do.

Research: 

In cooperation with a number of civil institutes, research is conducted 

into: 

a. Job satisfaction 

Every year a sample survey is carried out amongst. 5X of regular 

personnel with regard to their well-being, 

motivation, their opinion on personnel policy and current matters. 

Finally, they are also asked if they contemplate leaving the service. 

This year the questions concerned the military personnel’s opinion on 

the change in East-West relations, and on the planned reductions in . 

personnel. 

I am unfamiliar with your research experiences, but we at the RNLA have 

noticed that personnel have not lost their motivation for their military 

task, although they are uncertain as to whether their jobs will continue 

to exist.. However, they are mainly of the opinion that the reductions 

will not affect them personally, but rather a colleague elsewhere. 

This affects the advisory policy to be pursued with regard to reductions 

in personnel. 

b. Exit interviews 

Exit interviews are held with all personnel leaving the Royal Netherlands 

Army prematurely. This yields information for the organisation as 

to how it is valued as an employer and how personnel policy can best be 

altered. 

In this context, extra measures have been taken in order to increase 

the ability to retain technical personnel, doctors and other highlytrained 

personnel. 

The exit interviews also provide information enabling the policy to 

integrate women and ethnic groups to be adapted accordingly. 

Remarkably enough, the exit interview is now needed in order to 

determine which measures can be taken to achieve increased voluntary 

outflow in the light of the reductions in personnel - an interesting 

change in the use of exit interviews. 

C. Violence in the armed forces 

Research was conducted recently into the occurence of violent incidents 

within the armed forces, covering all types of physical and/or mental 

violence ranging from coarse language, swearing, harassment and physical 

violence, right up to forms of sexual violence. 

The research showed that, fortunately, forms of serious sexual abuse 

are fairly rare. 

However, the research did lead to a series of recommendations 

as to how leadership qualities can be improved. 

These recommendations are now being implemented. 

d. Homosexuality research 

Following on from the above, research is currently being conduc:t.ed into 

the extent to which homosexual soldiers experience forms of discrimination. 

A sample survey is also being conducted amongst all soldiers with 

regard to their attitude towards homosexual colleagues. This survey is 

still in its initial stages, but it will certainly enjoy particularly 

strong political interest. 

467

e. Ceployability research 

On several occasions my division has conducted research into the effect 

of lengthy exercises on the deployability of regular and conscript 

military personnel. 

This research has resulted in the adjustment af the operational plans 

formulated by the Netherlands Army Staff. 

f. Miscellaneous 

Without entering into any detail, I shall just mention research which 

my division has conducted regarding: the use of alcohol, drugs and 

gambling addiction, the integration of women in the RNLA, unit consultations, 

conscript NCCs, and so on. 

Selection of personnel 

a. My division recently developed and implemented a personality test with 

a view to selecting prospective conscript personnel on their suitability 

for compulsory national service. This can limit the number of 

conscripts who dysfunction for psychological reasons. 

b. A procedure has been developed for prospective regular officers and 

NCOs to determine more accurately their suitability for the Royal 

Netherlands Army. This procedure uses personality studies and biographical 

data, compiled through standard interviews. 

All interviewers have been trained at length (apprcximately 2 year-s). 

Validation studies have shown that a well-trained 

interviewer is far more capable of selecting the right man or woman for 

the army than is a study of hisjher abilities. 

C. A computerised lo-task test is currently being developed in cooperatior. 

with the Institute for Perception (TNO). 

The test is designed to gauge both stress tolerance and capacity for 

multiple information processing. For validation purposes, these computer 

tests are now included in the existing test procedures. 

d. In a while I shall discuss the research we have conducted in order tro(:ey

. Our research into battlefield conduct has led to techniques now being 

introduced, which can reduce the effects of battle stress. 

The techniques are applied at individual level in activities which are 

by their very nature stressful, such as parachuting, rock climbing or 

diving. 

Furthermore, commanders are thoroughly prepared for the effects of 

stress and battle stress, and they are taught how to recognise stress 

symptoms and how to act when faced with them. 

C. We have conducted intensive research into the effects of lack of sleep 

over a long period. 

Just one of the things this has revealed is how lack of sleep influen- 

ces leadership. 

After 48 hours' sleep deprivation the effectiveness of decisions taken 

declines dramatically. The factor causing the most concern is that 

commanders often do not realise, or realise only vaguely, that they are 

no longer capable of making responsible decisions. 

Symptoms of this kind have given rise to a great deal of attention 

being paid to the aspect of sleep management. A remarkable fact is that 

many commanders reject the 

implementation of sleep management, as they deem it 

un-military. 

However, we shall persevere. 

d. The RNLA has paid too little attention to Psychological Operations 

(Psycops) for too long. In fact, until recently the subject was not 

open to discussion on a political level, even in today’s free society. 

Psychological defence (preparing oneself for the adversary’s psychological 

operations) was all that was politically acceptable at that time. 

Recently however, the subject of Psycops has been attracting more 

attention, something which has been partly influenced by the attention 

we have paid to the effects of battle stress. 

For us this constitutes an interesting topic for research; one about 

which we think we can learn a lot from others. 

e. The attention given to battlefield behaviour, which I have already 

mentioned, has led to an entirely different structure of the treatment 

of combat stress victims in actual wartime conditions. Based on the 

well-known principles for treating combat stress victims, 

proximity (treatment at the front) 

- immediacy (treatment as soon as the symptoms occur and as quickly as 

possible) 

- expectancy (treatment to return the victims to active service). we 

formed a system of combat stress recovery units for our 1 (NL) Army 

Corps. 

The combat stress recovery units are located in the rear areas of the 

brigades and must be able to operate as mobile units. 

The battalion aid post could serve as a collection point for battle 

stress victims; this is where triage takes place. 

After triage the battle stress victims are treated in the battle stress 

recovery unit. The head of the battle stress recovery unit will be an 

officer from our psychological and social service, one of our trained 

psychotherapists in fact. 

469 

.

--- --- 

We estimate that 25% of all victims will be battle stres,s victims.'Our /'. 

objective is to return 50% of them to active duty withintwo days, and 

ultimately 80% within seven days. 

The heads of the battle stress recovery units will also,,be able to act 

as staff officers to the brigade commander. 

Their responsibilities will include the prevention ::f stress-related 

problems. 

Our battle stress recovery organisation is not yet ready, but we aim :o 

have it operational by 1991. 

In order to prevent symptoms of FTSD (post-traumatic stress disorder), 

analysis is now taking place in order to determine whether, in the 

event of the RNLA being deployed for peace-keeping operations, the 

assignment of military psychologists at a lower level would be worthwhile. 

Assessments 

f. Earlier t'nis year we introduced a new procedure for assessing regular 

personnel. In this system, more emphasis is laid on the influence of 

assessments on the management development system. In order to exclude 

undesired effects such as the unequal distribution of power, stereotyping 

and so on, every battalion now has a so-called assessment advisor. 

We have thoroughly trained these advisors, who are intended to support 

both the commanders and the individual soldiers to be assessed. 

This system was implemented at the beginning of this year. 

Evaluations to see whether the c1)jec:tive has been met will take plac:e 

in 1991. 

Education and training 

a. In 1985 the Didactics and Military Leadership Training Centre was 

established for the RNLA. One of my colleagues, a colonel.and a 

sociologist, is the commander of this centre. 

All future military instructors are trained at the centre, as are the 

so-called didactics specialists who are appointed to ever'y training 

centre. 

b. The centre also offers possibilities for leadership training to military 

commanders of all levels. 

Team-building procedures are developed and distributed from this 

training centre. 

c . Finally, the training csntre plays a leading part in the use of new 

teaching methods such as computer-based instruction, simulators, wargames, 

the training of social skills, and so forth. 

Care of dysfunctioninK oersonnel 

a. One of my colleagues, a colonel and a clinical psychologist/ 

psychotherapist, is head of our Psychological Support Section. 

His section comprises three offices; the head of eac:r? of these offices 

is a military psychologist. 

470

. Every year these offices give psychological (psychotherapeutic) 

aid to some 5,000 soldiers. Sometimes the problems are 

simple, and can be solved simply by advising a transfer. Sometimes, 

however, the problems are related to psychologically more complex 

matters involving alcohol, drug and gambling addiction, family problems 

and individual dysfunctioning. A number of these problems stems from 

traumatic service experiences, accidents, shooting incidents and the 

delayed effects of PTSD or battle stress. 

C. By virtue of this very experience, the psychologists from these offices 

are exceptionally well deployable in the battle stress recovery units I 

described earlier. 

The integration of women in the RNLA 

a. As most of you are probably aware, all posts in the Netherlands armed 

forces have in principle been accessible to women since 1978 (including 

combat duties). 

I need hardly remind you that this does not mean there are no problems 

involved with the integration - on the contrary, there are. 

b. Some of these problems are obviously caused by the 

differences in physical strength and [powers of?] endurance between men 

and women. 

These facts are generally accepted, and are therefore open to discussion. 

Moreover, as is the case in the Royal Netherlands Army, effective 

policy measures can be 

implemented to cope with these differences. 

Allow me to give you an illustration. All military posts are classified 

according to physical demand. 

A method has been developed which measures physical strength in men and 

women. Because the job requirements for men and women are the same in 

principle, this results in there being relatively few women in physically 

demanding posts. 

C. One particular problem is that few women are prepared to enter into 

long-term contracts. This was one of the reasons for our making all 

conscript posts - in principle - accessible to women. 

For many this was a way of getting to know the RNLA. For some this has 

already led to a job with the regular personnel. 

d. Measures have been implemented which until recently were highly 

controversial: parental leave has been introduced (for both men and 

women), day-care centres have been set up, part-time work has been 

introduced for soldiers, and women now have the opportunity (after a 

certain time) to return to the RNLA in order to resume a military 

career interrupted by parental duties. 

Officers from the psychological and social service have played a 

significant role in the implementation of all these measures. 

I myself have been very closely involved, as I am chairman of the 

working group responsible for the preparations for the integration of 

women (and also that of ethnic groups). 

471

e. Furthermore, I have been c:ommander of t.he RNLA’s seiection centre for 

three years - I shall tell. you more a!)out. this an? ;he relevance it 

bears to this conference. 

As you are aware, psychological tests show. on average, differences ill 

scores between men and women. 

In 1985 the tests gave the following differences in percent.ages of men 

and women passing final selection: 

men : 

women : 

40% 

20% 

Further research revealed that these differences in percentages were 

brought about mainly by the original version of our practical technical 

ability test. 

A mere 25% of all women came above the cut.-off score, as opposed TV ‘5% 

of all men. 

f. This was one of the reasons behind our subjecting these test to a 

thorough item-bias study, which led to t.he test being drastically 

adapted. 

Women still score iower results than men in this test, but the differences 

are now much smaller: 50% of women and 75% of men now meet the 

demands set in these test. 

We have also modified the procedures. 

Compensation and more differentiation a,r>cording to position have how 

been introduced, and we have adapted our 

personality test.s and the interview. 

Without. going into too much detail, I can also tell you that we 

currently set the same demands 2nd the same tests for men and women, 

with the same cut-off scores, 

answers. 

based on the same number of correct 

Moreover, the numbers of men and women in percentages passing final 

selection are now almost equal, as . the following illustrates: 

men : 45% 

women : 40% 

g* This is the only way in which we can achieve our objective of -women 

comprising 10% of the army by 1993. You will 

understand that this is no simple task for an organisation in which t.he 

same demands are set. for both men and women, nor for one which is t.i! be 

reduced by 30% over the next f*w years. 

Reductions in personnel 

This brings me to the final point I wish to bring to your attention. Many 

armed forces will have to make considerable reduct.i.ons over the next few 

years; as I have aiready merit.ioned, this will amount to 30% for the RNL4. 

4 ‘7 2

The question now is how to approach this, and how to help in one’s capacity 

as a military psychologist. 

Allow me to give you some examples of our contribution in this matter: 

Exit interviews: determining the ways of promoting the voluntary outflow 

of personnel. 

Out-placement: assisting in the search for a job outside the army. 

Information: advising on the policy to be pursued and its psychological 

consequences for personnel. 

Individual in cases where enforced discharge is 

assistance8 unavoidable (etc. 1. 

This is perhaps a rather gloomy note on which to finish, but it nevertheless 

illustrates how valuable such a widely-deployable military psychological and 

social service can be. 

473

Acceptance of Change 

An Empirical Test of a Causal hlodel 

Edith Lynne Goldberg 

State University of New York, Albany 

John P. Sheposh 

Joyce Shettel-Neuber 

Navy Personnel Research and Development Center, San Diego. CA 

Abstract 

This study examined the effect of climate in combination with other factors on 

perceived value and acceptance of changes in three public sector (Department of 

Defense) organizations that had adopted new approaches to managing human 

resources. A conceptual model was proposed and tested to convey the interactive 

nature of the set of factors selected as important to acceptance of the changes. 

In general the hypothesized ,interrelationships were supported by the data. The 

assessment of the specific changes during the period of implementation was 

influenced by organizational contextual factors (CLIMATE). The assessment of the 

specific changes, in turn, affected perceived consequences of the changes which 

influenced the desire to retain the changes. This last factor, which could be 

construed as intentionality, is considered an important underpinning or precursor 

to the final stage of institutionalization. The combination of predictors in the 

model accounted for 56% of the variance. Theoretical and applied issues were 

discussed and future research suggested. 

In response to need or opportunity, organizations put into place planned changes that alter 

or replace existing procedures, products, processes, and/or policies. The implementation phase- 

-what happens after a decision has been made to adopt the change--is a critical period in the 

success of the change. The research that has been accumulated on implementation has reported 

numerous failures (Bardach, !977; Schultz & Slevin, 1975), therefore, research which could !ead 

to the identification, examination, and better understanding of factors important to the 

successful implementation of organizational change is needed. 

A wide range of factors could plausibly influence the implementation of a change in an 

organization. Certain factors have been identified by most experts on the subject as playing a 

significant role in the adoption and implementation of change (cf. Sheposh, Hulton, 22 Knudsen, 

1983). One of the major factors that has been cited as influencing implementation and 

institutionalization of change is the organizational climate of the adopting unit (Glaser, 19721. 

In general, climate has been regarded as a perception of the organization by its employees which 

is shaped by experiences within the organization. Climate is viewed as influencing the behavior 

of organizational members, distinguishing one organization from another, and enduring 01.e: 

time (Gordon & Cummins, 1979; James Sr Jones, 1979; Schneider & Snyder, 1975). 

This paper reports on research which examined the effect of climate in combination u-irh 

other factors on perceived value and acceptance of changes in three public sector (Departmecr 

of Defense) organizations that had adopted new approaches to managing human resources. *2 

conceptual model was proposed and tested to convey the interactive nature of the set of faciors 

that were selected as important to acceptance of the changes. The model. the variable< 

comprising the model, and the proposed causal linkages are presented in Figure 1. 

The opinions expressed in this paper are those of the authors, are not ~t‘fic~a!. .y. p. :J ;j: .; ‘: : 

necessarily reflect the vie\vs of the Navy Department. 

-m- .-.- . . 

-

Level 

c Consequences 

of Changes 

Fimre 1. Proposed model of acceptance of institutionalizalion of change. 

1 Acceptance of 

b Institutionalization 

of Change 

According to the model, LEVEL in the organization (i.e., first-line supervisors, managers) 

and CLIMATE represent exogenous variables. LEVEL was included because descrjptions of . 

organizational climate differ among hierarchical levels within an organization. Payne and 

Mansfield (1973), for example, reported that those individuals who were higher on the 

organizational hierarchy tended to perceive their organization as more democratic, friendly, and 

ready to innovate than those who were lower. As conveyed in the model, CLIMATE has a direct 

influence on specific aspects of the three changes that were being implemented. Organizational 

climate was expected to affect the extent to which specific changes produce benefits, because a 

change is more likely to succeed in an organization where the climate is open, accommodating 

to change, and in general positive. The combined effects of the specific changes in turn should 

significantly affect (increase or decrease) managers’ and supervisors’ ability to manage 

personnel-related matters in their work (CONSEQUENCES OF CHANGES). These perceived effects 

were expected to have a direct bearing on their willingness to institutionalize the particular set 

of changes that were being implemented (ACCEPTANCE OF INSTITUTIONALIZATION OF CHANGE). 

This index was included because previous research (Berman & McLaughlin, 1978) suggested that 

the question of institutionalization of a change is distinctly separate from that of 

implementation. Berman and McLaughlin concluded that initial adoption of a change does not 

ensure implementation nor does successful implementation necessarily ensure continuation of the 

change. It was hypothesized that in this study the perceived success of the changes during the 

implementation phase, as gauged by assessment of specific aspects of the changes (SPECIFIC 

CHANGES), and the perceived consequences (CONSEQUENCES OF CHANGES), which are to some 

extent determined by the climate of the organization will tend to produce broad based support, 

which would be instrumental in promoting the continuation of the changes (ACCEPTANCE OF 

INSTITUTIONALIZATION OF CHANGE). . 

Method and Procedures 

Oreanizations 

Three Department of Defense (DOD) organizations, which provide logistical support for the 

armed services, served as research sites. Their functions include storing, shipping, and issuing 

materials and monitoring contracts with private sector businesses. They are are staffed by civil 

service employees and a few military officers in top management positions. 

Subiects 

The data in this study were based on the questionnaire responses of a random sample of 

211 supervisors and managers from first-line level and above. 

Innovations 

As part of a 3-year experiment designed to improve human resource management, n 

package of three changes was proposed and implemented at each of the three sites. One change 

involved the Delegation of Classification Authority to line management, allowing those most 

familiar with positions under them to assign series and grades to jobs rather than having 

personnelists do so. The second change, Nonpunitive Discipline, was established to substitu:c 

letters of warning for reprimands and short suspensions. The initiative was intended to imprc:r:e 

475

supervisor-subordinate relations, make employees take responsibility for correcting problem 

behavior, and save money and productivity lost to suspensions. The third initiative, the 

Elimination of Mandatory Interviews, removed an agency requirement that all candidates for a 

job be interviewed and allowed appointing officials to interview some, all, or none of the 

candidates for a position after reviewing their written applications 

Materials 

A questionnaire, developed to measure respondents’ perceptions of climate and the specific 

changes. was administered one year after program implementation began. The first part of the 

instrument included questions regarding demographic characteristics of the respondents and 

perceptions of organizational climate. Organizational climate was adapted from several 

questionnaires (Gordon & Cummins, 1979; Siegel & Kaemmerer, 1978; Mowday & Steers, 1979; 

and Young, Riedel, & Sheposh, 1979). It consisted of 47 items which represented nine _ 

organizational dimensions (e.g. organizational climate, management style, organizationsl 

effectiveness). Seven point scales were used for all dimensions except organizational 

effectiveness which was measured on a nine point scale. 

The second half of the survey assessed the specific. changes and related issues. Three 

aspects of the changes were addressed. First, a set of items using 7 point scales was developed 

to assess the specific initiatives. For example, the ease, efficiency, and fairness of the 

Elimination of Mandatory Interviews initiative was measured by three items employing 7-point 

response scales. Second, perceived consequences resulting from the specific changes (e.g., 

augmented authority and increased ease in carrying out personnel actions) were measured with 7 

point scales. Third, the general acceptance of the initiatives, preference for these changes over 

the old system, and the extent to which respondents wanted the changes to continue were 

assessed with three items employing 5 point scales. 

Results 

The mean responses for the components comprising the model for first-line supervisors and 

managers and for the overall sample are presented in Table 1. In general managers gave a 

slightly more positive assessment of the organization’s climate, the individual changes, the 

perceived consequences, and the acceptance of the institutionalization of the changes. 

Significant differences between supervisors’ and managers’ ratings were obtained for 

ELIMINATION OF MANDATORY INTERVIEWS (Fl ]gg = 7.12, p K .Ol) and for ACCEPTANCE OF THE 

INSTITUTIONALIZATION OF THE CHANGES (Fl ]g; = 13.11 p

As shown in Table 1 the supervisors and managers assessed each of the specific changes 

favorably. For example, they agreed that the elimination Of mandatory interviews is easy to 

carry out, results in fair selection of candidates, and results in positions being rapidly filled. 

Similarly, the supervisors and managers perceived benefits resulting from the combined changes 

(e.g., perceived increases in their authority to influence classification decisions, the overall 

productivity of the work teams). They did not perceive differences with respect to meeting job 

responsibilities or filling positions as a result of the inception of these changes. Finally, 

concerning the institutionalization of the changes, managers and supervisors are positive about 

these changes, prefer the new system over the old, and would like to see the changes continued 

in their work setting. 

I 

B1.12’ 

I 

Blmlmtlon ot 

I 

M8nd8lofy hl- 

, 

I 

- - - -.--,-------------------l9*.11* 

(rJq 

Fieure 2. Model of acceptance based on path analyses. 

A path analysis was applied to determine the correspondence between the data and the 

proposed model as described in Figure 1. The results of the path analysis are presented in 

Figure 2 and the correlation matrix underlying the analysis are presented in Table 2. The 

ordering of the variables and their interrelationships as presented in Figure 2 generally 

correspond to the structure of the proposed model. As hypothesized the LEVEL variable is most 

strongly related to CLIMATE which in turn directly influences the three changes. Assessment of 

the three changes has a significant relationship on CONSEQUENCES BENEFITS but not on 

CONSEQUENCES EFFORT. The differentiation of consequences into two types was made on the 

basis of a factor analysis which generated two independent factors. Both sets of consequences 

are significantly related to ACCEPTANCE OF THE INSTITUTIONALIZATION OF THE CHANGES with the 

CONSEQUENCES BENEFITS clearly the strongest predictor. The model accounted for a good 

amount of the total variance (R2 =.56). In addition to the absence of a significant relationship 

between the specific changes and the CONSEQUENCES EFFORT variable, the ordering of effects 

that were obtained for ELIMINATION OF MANDATORY INTERVIEWS departed from the proposed 

model. The direct relationship between this change and acceptance was stronger than the 

relationship of this variable to consequences. 

477 

I

I_- __-- ____-._ -.- -- -.. .-.-----. -..-- -- 

. 

1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

-8 .r:: .‘. a\ p-;::: 

.; ,. .d, :’ a’ 

P 

Table 2 4 _ 

Zero-Order Correlations for Model Components 

*. .’ 

. ;. 

L 

*. 

I. 

$$f.. 

I. 

: :. 

.A, :.- 

Level 

Climate 

Delegation of Classification Authority 

Letters of Warning 

Elimination of Mandatory Interviews 

Consequences (Effort) 

Consequences (Benefits) 

Acceptance for Innovation 

1 

.20* 

.oo 

.08 

.14* 

.oo 

.09 

.20* 

2 

.28* 

.32* 

.15* 

-.o 1 

.30* 

.2”* 3 

3 

.31* 

.29* 

.oo 

.50* 

.44* 

4 .I 5 

4 

.27*.+ 

.Ol ;:,: 

.37*- ’ .32* 

.44* .55* 

,,. 6 

-08 

.1.1 

.20* 

.a.% 

7 , ,_ i;; J.,;, g & 

. $ .y 3;. 

, ‘; 

Regression anaiyses were replicated by employing LISREL (Jorestog & Sordom, 1984):,. : 

This approach uses equations with more explicit specifications and simultaneous ,estim$es;.,$ , .hypothesized 

underlying relationships and unexplained variance. LISREL provides a more, 2 

holistic approach in comparison to separate regression anaiyses (Bagozzii& Phillips, ;1982). ‘and’ ., “yi Y’ ;,., 

served to test the goodness-of-fit of the model in this study. The variablg; 'LEVEL, did not meet:: .y, ,_ 

the specifications of the model and could not be entered as a component.‘- The mo:!el yielded h ~,~;;t,‘?’ 

goodness-of-fit (GFI) measure of .94 with an adjusted GFI (AGFI) of $3, and a root mean .y*,, ,?t, 

square residual (RMSR) equal to .15. This model appears to be a very reasonable explanation of;;the 

relationships between these variables and their ability to predict acceptance for innovation. : .‘I .i 

WhiIe the data were generally consistent with the model there were some discrepancies. ’ “’ 

Contrary to expectations, all these changes were found to exert a direct effect on,KCEPTANcE 

as well as having a direct effect on CONSEQUENCES. It appears that the changes and , 

CONSEQUENCES are neither empirically distinct nor do they seem to function in an exactly ‘, 

similar fashion. The other not readily explainable departure from the l3roposed model is the 

lack of significant relationships between CONSEQUENCES (EFFORT) and the, factors hypothesized 

as the determinants of this factor. 

Conclusions 

; : 

The present research proposed and tested a model that inc&porated ,‘.cornponents~.;:~~* ‘( 

hypothesized as relevant to the assessment of the status of a set of changes being impleniente@:. +.3,+~ w:~- 

In general the hypothesized interrelationships were supported by the dat& The assessm&t 

. 

;f 

. . . . ( . 

yi.,? * .I L 

the specific changes during the period of implementation was influenced by organizational ‘,’ ;? 

contextual factors (CLIMATE). The assessment of the specific changps, in turn, affected . ., 

perceived consequences of the changes which influenced the desire to retain the changes. This 

last factor could be construed as intentionality, an important underpinning of or precursor to 

the final stage of institutionalization. 

56% of the variance. 

The combination of predictors in the model accounted for 

Several conclusions are evident from results based on using regression and structural’ ,,:e 

equations. Consistent with past research (Glaser, 1973), hierarchical level and organizational ,I’ “.‘;. 

climate were found to be important factors for predicting acceptance of change, but, as the _ 

present results suggest, they operate as indirect rather than direct predictors. The pattern of. 

results, thus, suggest that simple bivariate correlations cannot adequately capture the CLIMATE - 

, 

ACCEPTACE OF CHANGE relationship. In addition the present model suggests that a combination 4a 

of general information about the organization (e.g. LEVEL, CLIMATE) and more specifi? 

information about outcomes brought about by the changes are necessary to better understand.. 

and assess the status of the changes under study and to better predict their future acceptance. 

* 

p < .05, N = 211 

_

-----__ 

The results have several implications from an applied perspective. First, the use of 

measures assessing aspects of the organizational context and its relationship to the perceived 

value of the changes underscores the necessity to consider not only the specific features of the 

changes but also how the organization operates and functions in the cultivation and promotion 

of the changes. Second, the measurement of the changes in terms of their ability to produce 

certain expected outcomes is useful in determining the extent to which the changes are 

operating as intended. This information can be helpful particularly in the formative stage of an 

evaluation when providing feedback to those implementing the changes. Third, as the 

implementation process continues and evolves over time the predictive ability of the model can 

be determined. To the extent this model successfully predicts the status of the changes it can 

then be used during implementation of other changes that are introduced into an organization. 

In summary the proposed model, comprised of variables selected on the basis of theoretical 

considerations as well as the nature of the changes that were introduced and implem,ented, has . 

shown promise as a framework for predicting and understanding the implementation and 

acceptance of change in an organizational setting. There are recognized limitations in this 

study. It is clear additional research is required. Continued assessment of the changes over 

time is needed to ascertain the predictive effectiveness of the model. Additional testing of the 

model on other types of changes and in other organizations is called for in order to determine 

its effectiveness. 

REFERENCES 

Bagozzi, R.P. & Phillips, L.W. (1982). Representing and testing organizational theories: A 

holistic construct. Administrative Science Ouarterly, =,459-489. 

Bardach, E. (1977). The implementation game. Cambridge, MA: MIT Press. 

Berman, P., & McLaughlin, M.W. (1978, May). Federal programs sunporting educational 

change. Vol VIII: Implementinn and sustaining innovations. Santa Monica, CA: Rand. 

Glaser, E.M. (1973). Knowledge transfer and institutional change. Professional Psycholoey, 4. 

434-444. 

Gordon, G.G., & Cummins, W. (1979). Managing management climate. Lexington&l,% 

Lexington Books. 

James, L.R., & Jones, A.P. (1979, April). Perceived iob characteristics and iob satisfaction: An 

examination of reciprocal causation (Report No.79-5). Fort Worth, Texas: Texas 

Christian University, Institute of Behavioral Research. 

Joreskog, K.G., & Sorbom, D. (1984). LTSREL VI: Analysis of linear structural relationshius b\. 

the method of maximum likelihood. Chicago: National Educational Resources. 

Mowday, R.T., Steers, R.M., & Porter, L.W. (1979). The measurement of organizational 

commitment. Journal of Vocational Behavior, 14, 224-247. 

Payne, R.L., & Mansfield, R. (1973). Relationship of perceptions of organizational climate to 

organizational structure, context, and hierarchical position. Administrative Science 

Ouarterlv. 18. 515-526. 

Schneider, B., & Synder, R.A. (1975). Some relationships between job satisfaction and 

organizational climate. Journal of Aoolied PsvcholoPv, a(3), 318-328. 

Schultz, R.L., & Slevin, D.P. (1976). Implementation and organizational validity: An empirical 

investigation. In R.H. Kilman, L.R. Pondy, & D.P. Slevin (Eds.), Management of 

orpanization design. New York: Elsevier North-Holland. 

Siegel, S.M., & Kaemmerer, W.F. (1978). Measuring the perceived support for innovation in 

organizations. Journal of ADDlied Psychology, a(5), 553-562. 

Sheposh, J.P., Hulton, V.N., & Knudsen, G.A. (1983, February). Imolementation of olanned 

change: A review of major issues. (NPRDC TR 83-7). San Diego, CA: Navy Personnel 

Research and Development Center. 

Young, LE., Riedel, J.A., & Sheposh, J.P. (1979). Relationship between Dercwtions of role 

stress and individual, organizational. and environmental variables _-- (NPRDC TR 80-8‘:. 

San Diego, CA: Navy Personnel Research and Development Center. 

479

TWEEDDALE, J. W. (Chair), Chief of Naval Education and Training, 

Pensacola, FL. 

Annually, approximately 40,000 prospective college students request 

information on the NROTC scholarship program. About 12,000 

individuals apply and become finalists for NROTC scholarships. 

Four-year scholarships are ultimately awarded to approximately 1,500 

of the applicants. The scholarship pays for tuition, textbooks, 

instructional fees, and summer training periods, as well as provides 

the selectee with $100 per month (for a maximum of 40 months). 

Selectees may become a member of any of the 66 NROTC units that 

service over 120 colleges and universities located nationwide. 

The presentations in this symposium describe the procedures used to 

select NROTC scholarship recipients. CDR Bob Hawkins of the Na+al 

Education and Training Program Management Support Activity will 

present an overview of the NROTC selection process. Jack Edwards of 

Navy Personnel Research and Development Center will present a paper 

that was coauthored with Regina Burch (Colorado State University) and 

Norman Abrahams (Personnel Decisions Research Institutes, Inc.). He 

will review the steps used to revise the NROTC selection composite. 

Third, Wally Borman from the University of South Florida will discuss 

a recently developed, behaviorally anchored selection interview and a 

newly constructed biodata instrument. Finally, I will highlight the 

current and future research objectives for the NROTC scholarship 

selection system. 

TWEEDDALE, J. W., 

Pensacola, FL 

Chief of Naval Education and Training, 

Improved procedures for the selection of future officers is 

complicated by the longitudinal nature of the research. For 

example, if the criterion is whether an individual will remain 

following completion of obligated duty, it may take 8 to 10 

years for the criterion data to become mature. Also, the 

divergent criteria (college grade point average, grade point 

average in naval science courses, and military performance 

while in NROTC and later in the Navy) used to assess the 

accuracy of the NROTC scholarship selection system may present 

problems. 

The need to monitor the validity of the current predictors and 

develop new predictor and criterion measures are but two of the 

research needs currently confronting NROTC researchers. In 

addition to capturing readily quantifiable information efforts 

have been put forth to capture various other characteristics of 

"the whole person." Now, researchers, CNET staff, and 

Professors of Naval Science are examining ways to operationalize, 

measure, and validate those characteristics. The 

whole-person model will continue to guide NROTC scholarship 

selection research in this time of change for the Navy. 

480

a- 

GATHERING AND USING NAVAL RESERVE OFFICERS TRAINING CORPS 

SCHOLARSHIP INFORMATION 

Robert B. Hawkins, Commander, U.S. Navy 

Naval Education and Training 

Program Management Support Activity 

Naval Air Station, Pensacola, FL 

Introduction 

The responsibility for identifying potential Naval Reserve 

Officers Training Corps (NROTC) Scholarship applicants, 

processing applications, identifying scholarship winners, and 

then placing those selected at one of the 66 host or the more 

than 100 associated crosstown affiliated NROTC universities is 

divided between two separate Navy commands: the Commander, 

Navy Recruiting Command (CNRC) and the Chief of Naval Education 

and Training (CNET). 

Until the 1986/87 NROTC scholarship year, CNRC identified 

applicants, processed applications, and selected NROTC 

Scholarship winners. Scholarship winners were identified 

during two week-long selection board sessions, an early 

selection board held in November, and a second board in 

February. CNET then took the administrative action required to 

determine final program eligibility (physical qualification) 

and provided the authorization for a selectee to attend an 

NROTC university under scholarship. 

Responsibility for selecting NROTC Scholarship recipients was 

transferred from CNRC to CNET after the November 1986 early 

selection board. CNET then instituted weekly selection boards 

to replace the standard two-selection-board process. Selection 

board membership remained essentially the same, with selection 

board members drawn from NROTC units (commanding officers) and 

the NROTC staff, However, unlike the two-selection-board 

system where the same board members evaluated all applicants 

during a single session, a weekly selection board process 

required the use of different selection board members for each 

selection board. Thus, concerns about scoring consistency and 

the equity of evaluation had to be addressed. 

Application Solicitation 

CNRC begins the applicant identification process in March each 

year. The primary target market is the high school junior 

(rising senior) class. Potential applicants are identified 

through a variety of means, but primarily by screening the 

Preliminary Scholastic Aptitude Test (PSAT) and Armed Services 

Vocational Aptitude Battery (ASVAB) high scorers lists. 

Additionally, numerous high school and college fair presentations 

are made to generate interest among the college bound 

high school student population. 

481

____ _.... ___----- .._._ - ___.I.-.__ _.. ~_ 

A student applying for an NROTC Scholarship completes an 

initial applicant questionnaire which establishes his or her 

interest. The data supplied on the applicant questionnaire is 

used to create a file for the student in the NROTC data base. 

The student must then take either the Scholastic Aptitude Test 

(SAT) or American College Test (ACT) and request that his or 

her scores be released to the NROTC Scholarship Program. 

ACT and SAT test data for those students who authorize score 

release to the NROTC Program are periodically received by CNRC 

and matched with the NROTC interested student file. Those 

meeting minimum eligibility scores (presently 450 verbal and 

500 math for SAT; 21 English and 23 math for ACT) are invited 

to complete a scholarship application. Completed applications 

are then compiled by CNRC, forwarded to CNET, and presented to 

the selection board for evaluation. 

Application Evaluation 

__-...-------- 

In the two-selection-board system of evaluation used by CNRC, 

applications were grouped by state, and three- or four-member 

selection committees were established to evaluate applicants in 

each state group. The number to be selected from a particular 

state was provided to the selection committee, and applicants 

were selected to meet that target. The early selection board 

(November) considered all applications received prior to the 

board convening date. The second selection board (February) 

evaluated all applications received by the scholarship 

application submission deadline, including those of individuals 

who were evaluated but not selected by the early selection 

board. 

In the 1986/87 scholarship year, 50 percent of those selected 

to receive an NROTC scholarship were identified by the early 

(November) selection board. The balance of those selected were 

identified through a series of weekly selection boards which 

met from January through March of 1987. 

To ensure consistency of scoring, application evaluation 

procedures used by each of the weekly selection boards were 

similar to those used during the two-selection-board process. 

Under'these procedures, selection board members were given very 

broad guidance and complete discretion in the awarding of 

evaluation points. Selection boards each had up to 100 points 

available to award to each applicant. The points awarded by 

the selection board were then added to a previously calculated 

base score called the applicant Quality Index (QI). The 

applicant QI is an optimally weighted selection composite 

developed by,the Navy Personnel Research and Development Center 

to predict student academic and military performance cr'iteria. 

The sum of the selection board score and the Quality Index 

determined the applicant's rank ordered position in the group 

of all applicants evaluated during the weekly selection board 

process. 

482

necessary to respond to the results of that review. The 

following system, with minor modifications, has been used in 

NROTC selection process, since. 

Current Selection Evaluation System 

The current NROTC Scholarship selection board evaluation system 

uses the Quality Index as a Base Score for each applicant. The 

Quality Index accounts for approximately 66 percent of the 

total applicant selection score. The selection board provides 

the remaining 34 percent. 

The selection board is provided with an evaluation score sheet 

that defines specific areas for selection board scoring 

consideration. The score sheet is divided into three broad 

areas of evaluation: scholarship, military bearing, and 

personal attributes. Contained within each area are specific 

scoring categories with each category assigned a maximum point 

value. Approximately 40 percent of the scoring categories 

include a recommended selection board score previously 

calculated by computer using established algorithms and raw 

data derived through the optical scan process (dot counting). 

Selection board members evaluate the application and assign 

points for each category. They may adjust the computerrecommended 

scores, if desired. The number of points assigned 

by the selection board member, including those recommended by 

computer, is then added to the Quality Index to determine the 

applicant's final standing in the rank ordered list of all 

applicants. 

The categories used by the selection board to evaluate a 

scholarship applicant are: 

Military Potential: 

Is applicant a military dependent? Score recommended 

Athletic participation Score recommended 

JROTWCAP participation Score recommended 

Applicant physical fitness 

Motivation for the NROTC Programs 

Interviewing officer evaluation 

Personal Factors: 

Leadership positions held 

Score recommended 

Involvement in non-school activity 

Did applicant experience adversity? 

Strength of character 

Exceptional achievement 

Potential for graduating with a 

tech degree Score recommended

i 

I 

I 

Scholarship: 

Quality of learning environment 

Transcript evaluation for math/ 

science performance 

Intellectual motivation 

Teacher evaluations 

Evaluation of applicant's statement 

Course difficulty 




Scholarships are offered based upon the rank order of all 

applicants. Adjustments may be necessary to meet specifically 

assigned state scholarship allocation targets, active duty, 

female, and minority targets, or Navy physical qualifications. 

Summarv 

The current selection board process has worked extremely well. 

The structure built into the evaluation system provides the 

consistency of applicant evaluation desired in a 6-month 

selection board cycle with varied selection board membership. 

The cost of that consistency, a limitation in selection board 

flexibility, appears to have had a positive effect as well. 

Selection board members feel comfortable working within the 

more structured system and selection or non-selection decisions 

are much more defensible. More importantly, several measures 

of incoming freshman class performance indicate that the 

process improved the selection board's ability to identify 

those most likely to perform well once enrolled in the NROTC 

Program. The performance of the scholarship students entering 

the program since the revised selection procedures were fully 

implemented has improved, with the average freshman year grade 

point average increasing from 2.89 in 1988 to 3.0 this past 

year. Freshman attrition has also decreased dramatically. 

Twenty-two percent of the freshman class attrited from the 

program during the 1988 academic year. Freshman attrition for 

the 1990 academic year dropped to 14 percent. The selection 

board average applicant score dropped to less than 50 percent 

of the total points available for awarding. This ensures that 

truly exceptional applicants can be awarded enough points for 

scholarship selection. 

References 

Mattson, J.D., Neumann, I., & Abrahams, N.M. (1986). 

Development of a revised composite for NROTC selection 

(NPRDC TN 87-7). San Diego: Navy Personnel Research and 

Development Center. 

Owens-Kurtz, C-K., Borman, W.C., Gialluca, K-A., Abrahams, 

N.M., & Mattson, J.D. (1989). Refinement of the Navy 

Reserve Officer Traininq Corps (NROTC) scholarship 

selection composite (NPRDC Tech. Note TN 90-l). San 

Diego: Navy Personnel Research and Development Center. 

485

Validation of the Naval Reserve Officers Training Corps Quality Index’ 

Jack E. Edwards Regina L. Burch Norman M. Abrahams 

Navy Personnel Research Colorado State University Personnel Decisions 

and Development Center Ft. Collins, CO Research Institute, Inc. 

San Diego, CA Minneapolis, MN 

Using data from Naval Reserve Officers Training Corps (NROTC) entering classes of 1979 and 1980, 

Mattson, Neumann, and Abrabams (1986) optimally weighted six academic and personal factors: Scholastic 

Aptitude Test-Verbal (SATV), Scholastic Aptitude Test-Math (SATM), high school rating (HSR), an interviewer’s 

rating (INTER), the Strong-Campbell Interest Inventory career-tenure scale (SCII), and the Background 

Questionnaire career-tenure scale (BQ), to develop a selection composite for predicting three criteria. Recent 

Navy policy directed toward increasing the proportion of college graduates with technical degrees has made it 

necessary to develop and validate a new selection system that adds a new criterion, choice of technical major 

(TECH) to the three previously investigated criteria: college grade point average (GPA), naval aptitude grades 

(APT), and naval science grades (NSG). 

Obiective 

The objective of this paper is to review the development and validation of the new NROTC schokarship 

selection composite, the 1989 Quality Index (QI-89). Three steps were included: (a) developing the optimally 

weighted QI-89; (b) predicting a new criterion (TECH); and (c) constructing an expectancy table/chart to predict 

TECH using a new predictor, engineering-and-science-interest score (ES). 

Population 

Approach 

The population contained 6,609 individuals who had entered NROTC from 1983 to 1987 and completed at 

least one semester/quarter of the program. Men comprised 96.5% of the population, and 92.6% of the candidates 

were nonminorities. Each person had received a four-year national competition scholarship; had complctc data 

on all seven predictors; had valid scores for GPA, APT, and NSG; were Navy (versus Marine) option; and had 

a selection code of principal selectee, early select, alternate best, or altemab middle. 

Predictors 

Six predictors were used to develop the selection composite. A seventh predictor (ES) was used to dcvclop 

an expectancy chart for predicting TECH. 

SATV and SATM or American College Test (ACT) equivalents. These scores represent the verbal and 

quantitative aptitudes of an individual as measured by a national competitive testing program designed for college 

admissions and scholarship awards. If an individual took the standardized test(s) on multiple occasions, the 

highest score was used in the analyses. ACT scores were translated to equivalent SATV and SATM scores using 

a recently developed conversion table (Owens-Kurtz, Borman, Gialluca, Abraham& & Mattson, 1989). 

HSR. This measure is based on high school rank in class. It was computed with a two-step procedure. 

First,arcentile rank was determined by multiplying high school rank by 2, subtracting 1 from that product, 

and then dividing the difference by the product of class size times 2. Second, each resulting percentile rank was 

converted to an equivalent HSR via tabled values. This second step lessened the effect of the negatively skcwcd 

distribution of percentile ranks. HSR values can range from 0 to 100 in increments of 10. 

INTER. During a 15-minute interview, an officer rated an applicant on factors important to a career as a 

1 This research was supported by the Office of Naval Technology, Program Elcmcnt 0602233N. The 

opinions expressed in this paper are those of the authors, ‘arc not official, and do not ncccssarily rcllcct the views 

of the Navy Department. This paper was presented in Novcmbcr 1990 at the annual meeting of the Military 

Testing Association at Orange Beach, AL as part of J. W. Twecddalc’s (Chair) symposium, The Naval RCSC~VC 

Officers Training Corns (NROTC) Scholarship Selection System. 

486

--- 

-_ ------__ -_-._-_--.-_.--- -__-.___--_ 

naval officer (e.g.. poise and the officer’s willingness to have the individual serve under his/her command). Each 

applicant was assigned an overall rating of very high (1) to very poor (5). For consistency, this scale was rcvcrse 

scored. 

SCII. This scale consists of 76 item-responses from the Strong-Campbell Interest Inventory that predict 

officeztcntion for at lcast one year beyond the minimum obligated service (@cumann & Abrahams, 19781). 

The authors rcportcd a biscrial correlation of .I9 bc(ween the XII and cxtcndcd scrvicc. Scores can range from 

62 to 138. 

B(J. The career tenure scale, developed in 1981, is based on 14 biodata and personality items from 

Rimland’s (1957) Background Questionnaire. Neumann (personal communication, 1989) rcportcd a biscrial 

correlation of .I2 between the BQ and NROTC attrition. Scores can range from 93 to 107. 

ES. Engineering and science interests are idcntificd through 132 item-responses from the Strong-Campbell 

IntcreyInvcnlory (Neumann & Abrahams, 1978b). The authors reported biserial correlations of 56 and 58 

between ES and choice of final major for two cross-validation samples. Scores on this scale can range from 31 

t o 163, 

Criteria 

Four pcrformancc crilcria were used individually or in composites. When the four single-crilcrion regression 

equations wcrc combined into a composile, the following weights were assigned to the criteria: 40% for GPA, 

30% for APT, 20% for NSG, and 10% for TECH. Scores on GPA, APT, and NSG were standardized (using 

x-scores) within each host or cross-enrollment school. For individuals who attrited prior to the end of the first 

academic year, scores were cumulated to the time the individual left the NROTC program. 

First-Year GPA. This mcaswe is the grade-point avcragc obtained from all college courses that were lakcn 

during the first academic year. 

First-Year APT. APT is the first-academic-year, grade-point average in nonacademic military aspects of the 

NROTC program. An individual is assigned a grade of 0 to 4.00 by NROTC instructors on each of 

approximately 20 pcrformancc aspects and personal traits (e.g., goal setting and military bearing). APT is 

primarily used to dctcrminc how well an individual is adapting to the Navy and NROTC. 

First-Year NSG. This measure is the grade-point avcragc for naval science courses taken during the first 

academic year. These courses are Navy-relevant academic classes that include subjects such as navigation and 

seamanship. Students must take eight such courses; most students take one course each semester. 

Final TECH. majors wcrc categorized as either non-technical (1) or technical (2) using categories that wcrc 

obtained from the Chief of Naval Education and Training (CNET). Individuals with valid scores for TECH 

represented a subset of the larger sample. TECH was considcrcd valid if the candidate had cntcred collcgc in 

(a) 1983, 1984, or 1985 or (b) 1986 and had complctcd at ieast one scmcstcr/quarter of hi&r junior year. 

TECH was included as an additional criterion in an attempt to maximize the number of scholarships awxdcd IO 

applicants who would eventually choose a technical college major. 

Procedure 

Development and cross-validation samples. The 5,957 people entering NROTC between 1983 and 1986 wc.rc 

randomly assigned to cithcr a devclopmcnl or cross-validation sample @ = 3,652 and B = 2,305, respectively). 

A third sample, 652 individuals who cntercd NROTC during 1987, was used as a second cross-validation sample 

10 cnsurc that wcighti rcmaincd stable for the most rcccnt year for which crilcrion data wcrc avnilablc. 

Dcvcloping and cross-validating optimally weighted composites. Validity coefficients corrected for rang 

restriction wcrc used in multiple regression analysts to dcvclop optimally wcightcd selection composites for 

picdicting each of the four individual criteria. Although this procedure results in four sepamtc composite scores, 

applicants must ultimately be rank-ordcrcd on a single metric in order to make sclcction decisions. To obtain 

such an overall composite, the single-criterion composites were combined into the QI-89 in order to predict the’ 

four single crilcria simultaneously. Weights were dcrivcd for these overall composites by combining predictor 

weights obtained for the single-criterion rcgrcssion equations. 

487

--__- _.~. 

Each of the composites was then cross-validated on both hold-out samples. The composites were evaluated 

for their ability to predict GPA, APT, NSG, and TECH in the 1983-1986 sample, and their ability to predict 

GPA, APT, and NSG in the 1987 sample. 

Determining effective weights. To assess the percentage of weight that each predictor received in the 

selection composites, effective weights scaled to 100% were computed. To compute the effective weights, the 

unstandardized b weights for each predictor within a composite were first multiplied by the corresponding standard 

deviation for that predictor. The products of b times SD were then summed across all the predictors included 

in the composite. The & weight for each prec&tor wazhen divided by that sum and multiplied by 100. 

Constructing the exnectancv table. The expectancy table for using the ES scale was developed by first rankordering 

the scores of midshipmen. Then, the distribution was divided as equally as possible into five groups. 

For each fifth, ranging from high to low, the percentage of ES majors was computed. 

Develonmenl Sample 


Table 1 shows the means and standard deviations for the predictors and criteria, and the correlations for 

those two sets of variables. This information is provided for both the entire devclopmcnt sample and the 

development subsample that had valid scores on the TECH criterion. The validities were corrected for restriction 

in range prior to performing the regressions. Validities increased approximately .02 to .03 after corrections. 

The SATM means indicate that the average NROTC scholarship student scored at approximately the 89th 

percentile in mathematics aptitude. The SATV and HSR means are also above average. For SATV, the avcragc 

NROTC scholarship student outperforms approximately 87 percent of the college-bound seniors taking the test. 

Finally, the average NROTC scholarship student had an HSR of 73.55. That HSR value indicates that the avcragc 

NROTC scholarship recipient graduated in the top 10% of his/her high school class. 

There were negligible differences between the predictor and criterion means and standard deviations for the 

full development sample and its subsample. The intercorrelations among GPA, APT, and NSG were slightly 

lower for the dcvclopment subsample than for the full development sample. The three criteria were moderately 

correlated, with NSG and GPA being the most highly rclatcd. This result would be expected because these two 

criteria measure academic factors; furthermore, NSG is computed from a subset of the courses included in GPA. 

These relationships are also consistent with Mattson et al’s findings (1986). In that study, intercorrelations varied 

from .40 to .54, and GPA and NSG were the most highly intercorrelated of the three criteria. GPA showed the 

highest relationship with TECH, a criter!on not included in earlier composites. The correlations between the three 

criteria and TECH were, however, much smaller than the intcrcorrclations among GPA, APT, and NSG. 

The predictor-criterion correlations varied little in magnitude between the full development sample and its 

subsample. For both groups, HSR was the variable most highly correlated with GPA and APT. SATV and HSR 

showed the strongest relationships with NSG. Although SATM and SATV showed strong relationships with GPA 

and NSG, respectively, they showed virtually no relationship with APT. The interview rating, however, was 

related to APT. These latter two sets of findings are consistent with the observation that. GPA and NSG measure 

the academic performance of NROTC participants while APT mcasurcs military characteristics of the future 

officers. ES was the predictor most highly correlated with TECH. This outcome was cxpectcd because the ES 

scale was specifically developed to predict final major. Finally, SATM also had a strong association with TECH. 

Cross-Validation 

-.. ,-.. ...__.l__r ,__. 

Predictor scores were computed for each of the five composites (i.c., the four single-criterion composites 

and the QI-89) using data from the hold-out samples. These scores wcrc then correlated with each criterion. 

Single-criterion composites. Table 2 shows the cross-validity coefficients that were obtained for the four 

single-criterion composites. The cross-validities for the single-criterion composites were provided principally to 

show the upper limit of prediction for a given criterion since each composite should predict its own criterion 

better than any of the other composites. To use the table, the criterion of interest is located in a right-hand 

column, and the predictor composite is located in the left-hand column. The cross-validity for that predictorcriterion 

combination is found at the intersection of the corresponding column and row. For example, the .I22 

shown in the first row, second column indicates the cross-validity estimate that was obtained when weights that 

488

I 

were dcrivcd lo optimally prcdicl CPA (for tbc devc.lopment sample) were used LO predict APT in the 1983- 

1986 hold-out sample. 

Table 1 

Descriptive Statistics for the Full Development Sample and the Development Subsample 

Variable Mean 

-___ 

Predictor 

SATV, 

SATV, 

558.5 1 76.78 .124 

560.08 76.98 .lOl 

SATM, 642.91 64.35 .I87 

SATM, 642.85 63.99 .I83 

HSR, 73.55 16.81 .272 

HSR, 74.45 16.34 280 

INTER, 4.82 .48 .035 

INTER, 4.83 .46 .030 

SCII, 

SCIL 

105.36 5.96 -.076 

105.49 5.88 -.069 

BQ, 100.97 2.32 .007 

BQ, 101.09 2.30 -.006 

ES, 110.19 13.55 .013 

ES, 1 IO.62 13.28 .023 

Criterion 

CPA, 

GPA, 

49.88 9.66 1 .ooo 

51.72 8.16 1 .ooo 

A=, 49.71 9.77 .425 

APT, 52.15 8.43 .363 

NSG, 49.80 9.77 .562 

NSG, 51.66 8.57 .546 

--e-s- 

Correlations with Criterion 

CPA APT NSG TECH 

-. --.-- .-.- ____ 

.027 

.013 

.036 

.047 

.132 

.I05 

.093 

.Ohl 

-.OlO 

-.008 

.047 

.OlO 

.022 

.033 

1 .OOO 

1 .ooo 

.419 

.363 

-lJ=-b 

TECH, 

-- 

1.59 

-- 

.49 

--- 

.243 

_-- 

.104 

- - 

f as a subscript dcnotcs Lhc full development sample @ = 3,652). 

r as a subscript dcnotcs the reduced dcvclopmcnt sub.sample @ = 2,077). 

.192 

-193 

.OY2 

A?54 

.171 

.I56 

419 

.016 

-.021 

-.o 14 

--_ 

-.OSl 

--- 

.221 

--- 

.093 

--- 

-.oos 

-.v 

.06h 

.067 -me 

.052 -330 

.083 

AI94 

1 .ooo 

1 .oOo 

--_ 

.399 

--- --- 

.161 1 .ooo 

Of primary interest arc Lhc bold-faced values shown on the diagonal. These V~~UCS reflect the prcdiclivc 

abiliry for each composilc’s target criterion; for example, the GPA composile rcvcals a cross-validity of .289 with 

the GPA criterion. These diagonal values may bc compared with the four corresponding validities observed for 

these composites in the developmental sample: .327 for GPA; .175 for APT; .297 for NSG; and .455 for TECH. 

As expected, development-sample validities wcrc sBghtly higher than the corresponding cross-validities. The GPA, 

NSG, AFT, and TECH composites each predicted its target criterion better than any of the other composilcs. 

Surprisingly, lhe APT composite was a better predictor of GPA and NSG than of APT. Overall, three of t!~ 

four. composks (CPA, APT, and NSG) were bctlcr prcdiclors of GPA and N.SG than of APT and TECH. 

489 

- - -

Single-Criterion Predictor Composite 

GPA, 

GPA, 

Table 2 

Cross-Validity Coefficients for Single-Criterion and QI-89 Composites 

GPA 

- 

Criteria NSG TECH 

.289 .122 .237 .138, 

.304 -.oo I .228 --- 

APT, .219 .137 .230 .I08 

Aflb .220 .073 .193 --- 

NSG, .220 .I18 .291 .12l, 

NSG, .227 .009 .286 --- 

TECH, .102 .036 .137 .430, 

TECH, .I23 .056 .063 --- 

______________-__--_----------------- ________________________________________-------------------------------------------------------------- 

QI-89. .282 .I26 .246 .131, 

QI-89, 296 .Oll .238 --- 

a as a subscript denotes rhe 1983-1986 hold-out sample @ = 2,305). 

b as a subscript denotes the 1987 hold-out sample (I’J = 652). 

c as a subscript dcnotcs Lhc 1983-1986 rcduccd hold-out sample @ = 1,313) wilh a valid TECH score. 

The cocfficicnts oblaincd on Ihe second cross-validation sample arc shown directly under the bold-faccd 

cross-validilics. Across all four single-criterion composites, the cross-validilics obtained on the 1987 sample varied 

litllc from l.hosc ob@ined on the 1983-1986 sample when GPA and NSG were predicted. All of the composiWs 

prcdictcd GPA slightly bcWr in the 1987 sample and NSG slightly bcUer in the 1983-1986 sample. Somewhat 

larger differe.nces wcrc found when predicting APT, The GPA, APT, and NSG composites predicted APT bcltcr 

in the 1983-1986 sample than in tie 1987 sample. The cross-validities for Ihe two samples were both near .OO 

when the’ TECH-derived composite was used to predict APT. Inspection of the correlations between APT and 

several highly-weighted predictors (i.e., HSR, SATM, and SATV) rcvealcd that diffcrcnccs in zero-order validities 

for the two samples appcarcd to account for the subscqucnt diffcrcnces in predictive abilily for lhcsc composites. 

QI-89. The bottom portion of Table 2 contains cross-validity coefficients for QI-89. In gcncral, the crossvalidity 

coefricicnts obtained with Ihc QI-89 showed little shrinkage from those obtained when each singlccriterion 

predictor composile was used to predict itself. The one exception occurred when TECH was the. 

criterion. This finding is logical because CNET assigned a relatively small importance rating to TECH when it 

was combined with the other three criteria. Although the QI-89 was only marginally useful for prcdicling APT 

and TECH, it retained a moderate level of predictive ability when used to predict GPA and NSG. 

Predicting Technical Maiors 

As shown in Figure 1, Lhosc midshipmen in the upper 20% on the ES scale were more than twice as likely 

to choose technical majors than those in Ihe lower 20%. To use the table, an individual’s ES score is locntccl 

in tic table, and Lhc likelihood of that individual sclccling a technical final major can bc determined. An adjunct. 

table for estimating ES was used (rather than incorporating ES into the optimally wcightcd selection composilc) 

so as lo avoid eliminating applicants with outstanding crcdcntials who might not rcccive NROTC scholarships if 

their intcrcsls tcndcd toward non-technical fields of study. 

Conclusions 

1. Although ES is dcrivcd from an instrument (i.c., Strong-Campbell Interest Inventory) that is susccptiblc 

10 distortion, results showed that ES can significantly incrcasc the proportion of technical majors. 

490 

- -

2. While the academically oriented criteria (Le., GPA and NSG) are predicted reasonably well, there is 

room for improvement for the military-performance criterion (APT). 

121 andatum 

114thu 120 

707 thfu 113 

98UtnrlO6 

97andMow 

Figure 1 

Expected Percentages of Midshipmen Selecting Technical Majors 

Next% 

Next 20% 

Mattson: i.D., Neumann, I., & Abrahams, N.M. (1986). Development of a revised comoosite for NROTC 

selection (NPRDC TN 87-7). San Diego: Navy Personnel Research and Development Center. 

Neumann, I., & Abrahams. N.M. (1978a). Construction and validation of a Strong Carn~bell Interest Inventory 

career tenure scale for use in selectinn NROTC midshipmen (NPRDC Letter Rep.). San Diego: Navy 

Personnel Research and Development Center. 

Neumann, I., & Abmhams, N.M. (1978b). Identification of NROTC apulicants with engineerinn and science 

interests (NPRDC Tech. Rep. 78-31). San Diego: Navy Personnel Research and Development Center. 

Owens-Kurtz, C.K., Borman, W.C., Gialluca, K.A., Abraham% N.M., & Mattson, J.D. (1989). Refinement of 

the ,Navq, Reserve Officer Training Corns (NROTC) scholarship selection composite (NPRDC Tech. Note 

TN 90-I). San Diego: Navy Personnel Research and Development Center. 

491

DEVELOPMENT AND IMPLEMENTATION OF A STRUCTURED 

INTERVIEW PROGRAM FOR NROTC SELECTION 

Walter C. Borman 

University of South Florida 

and Personnel Decisions Research Institutes, Inc. 

and 

Cynthia K. Owens-Kurt2 

and Teresa L. Russell 


The Navy Reserve Officer Training Corps (NROTC) is one of the majc-: 

sources of Navy and Marine Corps officers. Presently, 40,000 young men 3.!: ' 

women apply for a 4-year NROTC scholarship each year. Approximately 40% (:f 

this total pass an ,initial screen based on college board scores (minim:ca 

430 verbal and 520 math on SAT or equivalent ACT scores), proper age 

(between 17 and 21 when school,starfs, and no more than 25 at estimated 

time of college graduation), and, acceptable progress through high school. 

Those passing the screen (called Board Eligibles) are required to complete 

an application blank and to interview with a Naval officer, typically at 

one of the 43 recruiting.district headquarters. The focus of this paper ir 

on this officer interview. 

As conducted, previously, an officer interviewed each Board Eligible 

applicant, usually for 15-4-O minutes depending upon the personal style (-;f 

the interviewer and on the interview load (i.e., the number of NROTC Bo~I:'~; 

Eligibles that must be interviewed that day). Interviews were unstructuj-: 

in that interviewers were free to ask any questions they believed were 

relevant. After completion of the interview, the interviewer completed .? 

brie,f rating form. 

Experience with the previous NROTC interview showed that ratings wereoften 

at or near the top (most effective) end. For example, the mean 

rating on the Overall Potential scale for Board Eligibles in the most 

recent class for which data were available (class entering NROTC 1985) L:,:!:, 

4.68 on the 5-point scale. Further, when interview ratings (on the 

Potential scale) were correlated with the NROTC performance criteria, gra.i: 

point average (GPA), Naval science grades (NSG), and an aptitude rating 

(APT) , results were near zero (Owens-Kurtz, Borman, Gialluca, Abrahams, & 

Mattson, 1988). Finally, the effective weights for the interview when USC? 

along with SAT scores, high school rank, and SCII/BQ scores in regression 

analyses against these criteria were very low for GPA and NSG and only the 

third highest contributor to pyedic tion 

of APT (Owens-Kurt2 et al., 19ZSj. 

--------___-a--------m-w-------- 

This research was supported by funds from the office of Naval Technolo9:; 

Program Element 0602233N. The opinions expressed are those of the authcrc; 

and do not necessarily reflect those of the U. S. Mavy. 

492

Accordingly, the NROTC selection interview program appeared to need 

improvement. The ratings on the intewiew form showed little 

differentiation between applicants, and the validity of the interview 

ratings was low. 

One plausible reason for problems with this interview is its 

unstructured format. Reviews of the employment interview (Arvey & Campig: 

1982; Schmitt, 1976) indicate that structured interviews generally provid! 

more valid prediction of performance than do unstructured interviews. A 

recent meta-analysis found a .35 mean uncorrected validity coefficient fo: 

structuied interviews, whereas unstructured interviews had a mean 

uncorrected validity of .11 for the studies included in the analysis 

(Cronshaw & Wiesner, 1989). It is possible that a structured interview fc 

NROTC selection might improve the interview's validity for identifying 

applicants likely to succeed in the NROTC program. 

This paper first describes development of the structured interview 

materials and then an evaluation of interview ratings made during pilot 

tests of these materials. 

METHOD 

Identifyinq Tarqet Predictor Constructs 

The first step in developing a structured interview protocol was t

i 

-_ll--ll-._l_--___----II----ll - ._.~_. - 

Accordingly, meetings with officer staff members in five NROTC units 

were conducted to generate ideas for these predictor constructs. A 

preliminary list of constructs emerqed from sessions with primarily COs and 

Class Advisors in these units. This list was briefed to the Chief of Navaj. 

Education and Training (CNET) staff and to Selection Board members and was 

then revised based on their feedback. The constructs are: NROTC Interest 

and Motivation; Leadership Potential; Responsibilities; Organization of 

Tasks and Activities; and Communication. 

Preparing Behavioral Statements for the Ratins Scales 

At this point, we prepared preliminary behavioral statements to 

reflect effective, average, and ineffective interviewee responses in each 

one of the five construct areas. The behavioral statements were based on 

what recruiters wit-n considerable experience in NROTC selection interviec;s 

had observed in actual interviews. We also received feedback from CNET and 

Selection Board members, and made final revisions. One of the resulting 

rating scales is shown below, with its behavioral standards. 

, cxprare, ~~~~-~tu?din~ duix to tx Naval . Ice1 4.ycrr Ccmrr~lTCJ Lg muonabl: ex- � r~arr~ohrvcnorcsl~tuu~inbcfn~r 

officn; would pobJbly rccept COll~:gC c~ngcfur~rcholur~ip;~urcrluul pluu h'nvy/hirvincCorprofiicrr;m~yprcfcr 

plogram ifrcjcud for r:holrrhhip ifrcjukd for rclmlurhip civilian r~l~olarrhip 

. rhowvi strong inlcrcrr in he Navy/hfarLu: ’ LC~ qu=tio~ aku md appcm maombly � �� ␛ �� 

Corps tluough impressive knowledge about incues~dinN~~alScrvi~~/~KO~~~~~~~~ Cups; mry LliUS wlrly 0" rclloidlip 

the Nsval Sc~jcJEIHOTC, d~ouglllful qua- money 

GO- rbo~~Ih~p~ogram.mdenlhusiaslic aIdmd,dcmurdh'ROTC 

_ ,_ . . . . . _. ._ . . . . . . . . . . . . . ._ 

preparins Interview Ouestions 

After the interview rating scales were developed, we began preparinq 

questions designed to probe for reports of past behavior relevant to 

effectiveness in each area. Several questions for each rating category 

were developed and tried out with recruiters. The recruiters used 

different questions with different applicants, and noted those that seeme3 

to be most and least effective at eliciting responses useful for making 

ratings on each scale. The three to four questions that appeared most 

effective for each category were then presented to CNET, and final 

revisions to the questions were made. 

Prepariw Interview Instructions, a Training Videotape, and the Intervietd 

Worksheet 

---_ 

In addition to development of the interview protocol rating scales an..; 

the interview questions, it was necessary to prepare instructions and an 

interviewer training videotape to ensure the structured interview is 

conducted properly. Thus, instructions and the videotape were prepared, 

along with an interview worksheet, with the interview questions present& 

and space provided for the interviewer to take organized notes of 

interviewee responses. The instructions and accompanying videotape prov:i':a 

brief training program on structured interviewing, explain proper use OL 

4 9 4

the behavioral statements for guiding interview ratings, and orient the 

interviewer to use the questions, the worksheet, and the interview rating 

form. 

&lot Testinq the Structured Interview 

The interview protocol and rating form were pilot tested in two wave: 

with a total of 31 officer interviewers and 93 applicants in seven 

different locations. Means and standard deviations of the ratings provid. 

data on their spread and overall distribution. 

As part of the pilot testing, an interrater reliability study was 

conducted. One way to assess the quality of data emerging from the ne:q 

structured ,interview is to determine how closely two interviewers agree i 

their independent ratings of the same interviewees. Thus, we initiated a 

interrater reliability study with 10 officers interviewing a total of 24 

applicants. All interviewers were trained to do the structured intervie,.+ 

and to use the rating form. Each applicant was interviewed by two office 

recruiters. 

After each interview session, the interviewer completed the rating 

form and provided a copy to the researcher. Officers interviewing the s: 

applicant never discussed that applicant before making their ratings, so 

the interview judgments were generated totally independently. Intraclaorcorrelations 

were computed for each dimension separately and for the sum 

the dimension ratings. This provides an estimate of the across-intervie-consistency 

of ratings made using the new interview protocol and rating 

form. 

RESULTS 

Means and standard deviations for Wave 2 interview ratings, gathered 

after major revisions of the interview protocol (after Wave 1 pilot 

testing), appear in Table 1. For these 59 applicants, means are close TV 

4.0 (on a 5-point scale) and standard deviations are approximately 1.0. 

Further, these means compare favorably with data for the previous inter:;; 

(M=4.68 in 1985). Of course, this is not a very fair comparison because 

ratings on the new format were gathered for research, whereas the 4.68 r:t; 

is based on operational ratings. Nonetheless, applicant ratings using th 

new protocol appear to provide reasonable spread for the interview rati.n!:l. 

of these typically high quality NROTC applicants. 

Table 1 also contains the interrater reliability coefficients for 

ratings made of the 24 applicants evaluated by two independent 

interviewers. These are very high reliabilities, with considerable 

agreement shown on the part of the interviewers. 

495

. 

TABLE 1 

Means, Standard Deviations, and Interrater 

Reliability Coefficients for New Interview 

Protocol Ratings 

(N=59) 

Dimension -a.- S D Reliabilitya* 

NROTC Interest and Motivation 3.95 

Leadership Potential 3.78 

Responsibilities 4.00 

Organization of Tasks & Activities 4.20 

Communication 4.24 

Overall Evaluation 3.98 

Sum of First Five Dimensions 20.17 

1.07 . 81 

1.26 . 8 7 

. 95 . 81 

. 87 . 83 

. 99 . 93 

. 97 . 93 

4.35 . 95 

a. These are 2-rater intraclass correlations; N=24 

In addition, officer interviewers who used the new interview 

procedures were asked their opinions about this protocol compared to the 

previous one. Their comments are summarized in Table 2. 

TABLE 2 

Summary of Comments on the New 

Interview Protocol Rating Form 

a Big improvement over old form 

a Not too long or burdensome (interviews timed at 12-30 minutes 

including answering candidate questions, not filling out form; aver;rc:;. 

about 15-18 minutes) 

a Concept of behavioral standards well understood and accepted 

a Worksheet especially helpful when doing several interviews back-toback 

without completing the ratings 

a Videotape seen as very clear and useful 

a Interview program takes pressure off interviewer by providing good 

questions to ask 

a Interview program gives diverse interviewer types (e.g., NROTC staff, 

officer recruiters, Reservists, etc.) more common frame of reference 

DISCUSSION AND CONCLUSIONS 

Evaluations of the new interview materials and procedures by NROTC 

Selection Board members, NROTC officers, and officer recruiters responsiblk:, 

for interviewing NROTC applicants (as well as data from field tests of th? 

interview), suggest that these materials and procedures are ready for 

implementation. What is most urgently needed to evaluate the usefulness of 

496

- .__ 

the new interview is criterion-related validity information. Future 

validation efforts will be important in evaluating the value of interview 

ratings by themselves and in combination with other measures (e.g., colleq 

board scores), in predicting important NROTC criteria such as GPA, NSG, 21: 

APT, and perhaps attrition from the scholarship program. The interrater 

reliability study on the new interview (Borman & Owens-Kurtz, 1989) and 

data from Table 1 suggest that the interview has qood potential for 

improving the prediction of NROTC student performance. However, validity 

data are needed to assess its usefulness in actual practice. 

REFERENCES 

. 

Borman, W. C., & Owens-Kurtz, C. K. (1989). Development and field test __ .-._ of 

a structured interview protocol for NROTC selection (Institute Repcr? 

178). Minneapolis, MN: Personnel Decisions Research Institutes, 13~. 

Cronshaw, S. F., & Wiesner, W. H. (1989). The validity of the employmenk 

interview: Models for research and practice. In G. R. Ferris, and I: 

W. Eder (Eds.), The employment interview: _ Theorv, research and 

practice. Beverly Hills, CA: Sage. 

Owens-Kurtz, C. K., Borman, W. C., Gialluca, K. A., Abrahams, M. M., h 

Mattson, J. D. (1988). Refinem_ent of the Navy Reserve Officer 

Traininy Corps (NROTC) scholarship selection composite (Institute 

Report 144). Minneapolis, MN: Personnel Decisions Research 

Institutes, Inc. 

Schmitt, N. (1976). Social and situational determinants of interview 

decisions: Implications for the employment interview. Personnel 

Psycholoqy, 22, 79-101. 

497 

I 

I 

j 

!

I__--.. 

_ , _.. .~ -.._- --.. 

Development of an Experimental Biodataemperament Inventory for NROTC Selection1 

Mary Ann Hanson and Cheryl Paullin 


Walter C. Barman 

University of South Florida and Personnel Decisions Research Institutes, Inc. 

One component of the Naval Reserve Officer Training Corps (NROTC) Scholarship program selection 

process in need of revision or replacement is the Biographical Questionnaire (BQ). The BQ key 

(Neumann, Githens, & Abrahams, 1967), which was developed to predict officer retention beyond initial 

obligated service, is somewhat dated and does not correlate well with NROTC performance criteria,. In 

addition, the BQ itself was developed over forty years ago (Rimland, 1957). Much has been learned in 

the meantime about the development of biodata items, and many of the BQ items appear dated. Thus, the 

development of a new biodata instrument seemed in order. This paper will describe the development, 

preliminary evaluation, and refinement of an experimental biographical data and temperament inventory 

designed to predict NROTC performance and attrition. 

Method 

Developing the pilot Profile of Exoeriences and Characteristics (PEC) 

A rational, construct-based approach was taken, both to develop and to refine this new experimental 

inventory, The first step in developing the inventory was to more clearly specify the criterion constructs 

it is designed to predict. Performance measures currently used by the NROTC were identified (e.g., 

Naval Science Grades), and the constructs that underlie these performance measures were specified. The 

underlying performance constructs identified were academic achievement, leadership, military bearing, 

and goal setting. Attrition from the NROTC program occurs for a variety of reasons, and the underlying 

causes of attrition include academic failure, inaptitude, and dislike for the military (see Owens-Km% 

Gialluca, & Bonnan, 1989). The present research focused on identifying predictors of the performance 

and attrition constructs for which prediction is presently poor. Because academic achievement and academic 

failure are predicted at least moderately well by existing predictors, less emphasis was placed on 

identifying predictors of these criteria. 

A literature review was conducted to identify individual differences constructs, especially biographical 

and temperament constructs. that have shown empirical links with criteria similar to the NROTC performance 

and attrition constructs in past research. Item-level validities for several other inventories were 

also reviewed. Eight individual differences constructs wete identified that have been found, in past research, 

to be valid predictors of criteria similar to the NROTC performance/attrition constructs. These 

eight constructs were labeled: (1) Achievement Motivation; (2) Team Orientation; (3) Dominance; (4) 

Sociability; (5) Leadership Orientation; (6) NROTC/Military Interest and Motivation; (7) Organization 

and Planning; and (8) Responsibility. 

Items were written to tap each of these eight constructs. Past research (e.g., Doll, 1971) has shown 

that responses to verifiable items (i.e., items for which the truthfulness of responses can be checked using 

an external source) are less often distorted, Because biodata items typically deal with observable behav- 

1 This research was supported by funds from the Office of Naval Technology, Program Element 062233N. 

The opinions expressed are those of the authors, and do not necessarily reflect those of the U.S. Navy. 

498

iors, these items are more likely to be verifiable. Thus, an effort was made to include as many biodata 

items as possible in the pilot version of the PEC. However, when sufficient numbers of biodata items 

could not be written to adequately cover a construct, temperament items were also included. Between 13 

and 21 items’were written to tap each of the eight predictor constructs. In order to detect response distortion 

by applicants if it occurs, a ten item response validity scale (called the Unlikely Virtues scale) was 

also developed and included in the inventory. Thus, the pilot version of the inventory, called the Profile 

of Experiences and Characteristics (PEC), contained 151 items. 

Evaluating the PEC 

Both rational and empirical approaches were taken in evaluating and refining the PEC. The rational 

approach was a retranslation exercise in which researchers independently categorized the PEC items into 

the eight biodata/temperament constructs. The empirical approach involved administering the PEC to a 

large sample of NROTC applicants. The inventory was also administered to a comparison sample of 

NROTC scholarship students. 

Retranslation Exercise 

The retranslation exercise had two purposes: (1) to determine whether researchers would agree concerning 

the placement of items on constructs; and (2) to obtain information that could be used to further 

revise and refine the composition of the constructs and their definitions. Seven researchers who were 

knowledgeable about biodata and/or personality research were asked to independently sort each of the 

PEC items into one of the construct categories according to the perceived match between item and category 

content. The degree of agreement among these researchers concerning the placement of items was 

then evaluated. 

Pilot Test 

The PEC was administered to all Board Eligible NROTC applicants who were processed for the 

1990 NROTC scholarship program between 18 December 1989 and 30 January 1990 as part of their application 

process. Completed PEC inventories were obtained for 972 NROTC applicants from nearly all 

of the 41 Navy Recruiting Districts. About 90 percent of the respondents in this pilot test sample were 

either 18 or 19 years old, and 91 percent were male. 

Frequency counts were conducted to identify and eliminate items where the vast majority of respondents 

marked the same response alternative. Next, a rational scoring scheme was developed so that a 

preliminary set of item- and scale-level scores could be computed. When cntenon data become available, 

this scoring system may need to be modified. The item-level scores that were computed were intercorrelated 

and factor analyzed. 

Comparison with “Honest” Sample 

In order to obtain some base rate information regarding how “honest” respondents (i.e., respondents 

who have little to gain by distorting their responses) score on the PEC, a comparison sample of students 

already enrolled in the NROTC scholarship program was administered the PEC under instructions to respond 

as honestly as possible, A total of 175 first-year NROTC scholarship students from the University 

of Minnesota, Notre Dame University, and Carnegie-Mellon University completed the PEC in January 

1990. This sample was 93 percent male.

-. 

_.---___---- 

Data from this comparison sample were scored using the sanmprocedures that were used in the applicant 

sample. Mean item-level scores from the NROTC student sample were compared with those from 

the pilot-test sample in order to identify items with substantially different base rates. If an item’s mean 

score in the applicant group is slanted considerably more in the socially desirable direction than that of 

the student sample, it suggests that the item is relatively easily distorted by applicants. 

Refining the PEC 

Results from the pilot test data analyses, along with the information from the retranslation exercise, 

were used to revise the composition of the PEC constructs and their definitions. The inventory was then 

refined and shortened for future administrations. Descriptive statistics, internal consistency reliabilities, 

and scale score intercorrelations were computed for the final shortened scales. 

Evaluating the PEC 

Retranslation Results 


Seventy-seven percent of the PEC items were sorted into the same predictor construct scale by five 

or more of the seven researchers who participated in the retranslation exercise. Seven of the remaining 

items were from the Unlikely Virtues (response validity) scale. It is not particularly surprising that some 

researchers sorted the Unlikely Virtues items into the construct categories. The Unlikely Virtues items 

were specifically written to resemble the eight original construct categories (so they would be subtle). 

The fact that some of the judges mistakenly sorted the Unlikely Virtues items into the construct categories 

suggests that the items are indeed subtle. In general, however, there was good agreement among the 

researchers concerning the placement of PEC items on constructs. 

Pilot-Test Results 

Frequency counts revealed that the vast majority of the PEC items had an adequate spread of responses 

across the response alternatives. Only a few items had response distributions that were considered 

unacceptable (e.g., over 90 percent of the respondents chose the most desirable response alternative). 

However, for some items the response distributions were much better than for others. This information 

was taken into account in refining the PEC, particularly in making decisions concerning which items to 

drop. 

The item-level intercorrelations were factor analyzed, and rotated principal factor solutions containing 

from 2 to 12 factors were examined. Based on a parallel analysis (Montanelli & Humphreys, 1976), 

the amount of variance accounted for by each factor, and the interpretability of the solutions, the eight 

factor solution was selected for further consideration, 

The amount of overlap between the results of the retranslation exercise and the factor analysis was 

encouraging. Items from the Leadership Orientation scale defined a factor, and nearly all of the items 

that were retranslated into this scale had their highest loading on that factor (8 of 11). Similarly, Organization 

and Planning and NROTC/Military Interest and Motivation also defined their own factors. 

Achievement Motivation defined a factor, but most of the Responsibility (7 or 12) items also loaded on 

this factor. Dominance and Team Orientation each defined a factor, and the Sociability items were split 

between these two factors, with the Sociability items involving friendliness loading on the Team Orientation 

factor and those involving talkativeness and assertiveness loading on the Dominance factor. Clearly the 

retranslation and the factor analysis results converged on very similar sets of constructs. 

500

Comuarison with “Honest” Sample 

The applicant sample generally chose more desirable response options (i.e., response options that led 

to higher scores) than the NROTC sample. For most of the PEC items, the difference between the mean 

item-level scores for the two groups was quite small. However, for a few items the difference was large, 

especially when the “correct” response was fairly obvious. These latter items are probably the most 

susceptible to distortion, and this information was considered in deciding which items to drop. 

Refining the PEC 

The results of the factor analysis and the retranslation exercise were both taken into account in defining 

the final set of PEC constructs. Where the retranslation and the factor analysis suggested a slightly 

different set of constructs, rational considerations guided formation of the fmal constructs. For example, 

although the Responsibility items were grouped with the Achievement LMotivation items in the factor 

analysis, the literature review suggested that these two predictor constructs would be rela:ed to somewhat 

different criterion constructs. Therefore, Achievement Motivation and Responsibility were kept separate, 

Revisions were made to many of the PEC constructs based on the retranslation and the pilot test analyses, 

resulting in a final set of seven “revised” biodata/temperament constructs. These revised constructs are 

listed on the left side of Table 1, 

Achievement Motivation 

Dependability 

Social Comfort 

Dominance 

Leadership Orientation 

NROTC/Military Interest and Motivation 

Organization and Planning 

Unlikely Virtues 

Miscellaneous 

Table 1 

Descriptive Statistics for the Final (Shortened) PEC Scales 

# of Items Mean 

.80 

1.29 

1.02 

.87 

51 

.68 

.58 

2 

.49 

.43 

.47 

A4 

.68 

.56 

.56 

23 iii 

Standard 

Deviation Reliability1 

NOIS. NC range from 962 to 964 for means and standard deviations; from 898 to 953 for the reliabilities. (Computation of coefficient alpha 

required complete data.) 

l Coefficient Alpha 

2 Descriptive statistics an not presented for the fiial Unlikely Vhues scale because it contains new items. 

3 The Miscellaneous category is not a scale, so descriptive statistics are not appropriate. 

501 

.82 

.59 

.73 

.82 

.78 

.73 

.80 

n/a 

n/a

After the new construct structure was delineated, each PEC item was assigned to a construct/scale 

according to its factor loadings and item content. Items that did not fit well into any construct were 

placed in a “Miscellaneous” category. Item-total correlations were then computed for these revised construct 

scales, and these were used to help guide decisions concerning which items to retain m the final 

(shortened) PEC. 

The final step in the present research was to shorten and refine the PEC. Decisions concerning 

which items to drop took into account the pilot test results, the comparison sample results, the retranslation 

results, and the item content. A few promising items were retained that did not fit welI into any of 

the predictor constructs and placed in a “Miscellaneous” category. A total of thirty-two content scale 

items were dropped. In addition, several of the Unlikely Virtues scale items were revised or replaced 

based on results from the comparison and pilot sample data analyses. 

The final (shortened) version of the PEC contains 116 items distributed across the final construct 

scales as shown in Table 1, Table 1 also presents descriptive statistics for these scales. All of the scales 

except Dependability have very good internal consistency reliability. The internal consistency of the 

Dependability scale is comparatively low, and the mean score on this scale is also quite high in the applicant 

sample. This scale was retained for further study in spite of these problems because, based on the 

literature review, it is expected to predict attrition. Descriptive statistics are not reported on Table 1 for 

the final Unlikely Virtues scale, because this scale contains new and revised items. Table 2 presents the 

intercorn3lations among these final construct scale scores. 

Achievement Motivation (AM) 

Table 2 

Scale Intercorrelations for the Final (Shortened) PEC Scales 

Dependability (DP) .56 

Social Comfort (SC) .32 .I8 

Dominance (DM) .42 .29 .47 

AM DP SC DM LO NR OP 

Leadership Orientation (LO) A4 .28 .42 .56 

NROTC/Military Interest and Motivation (NR) .42 .35 .25 .33 .31 

Organization and Planning (OP) -58 -42 .19 .27 .31 .34 

Unlikely Virtues (XIV) l .45 .30 .25 .33 .25 .35 .33 

NOW Ns range from 962 to 964. 

l Sum of 7 idned items. 

502

Conclusions 

The final experimental PEC measures seven biodata/temperament con~tn~ct~. Each of the construct 

scales seems to be reasonably homogeneous and focused on the intended personal characteristics, experiences, 

and motivation constructs. The inventory has good potential for enhancing the prediction of 

NROTC performance and attrition, Further research is needed to evaluate validity of the PEC for predicting 

perfonance and attrition. 

REFERENCES 

Doll, R. E. (1971). Item susceptibility to attempted fakiig as related to item characteristics and adopted 

fake set. Journal of Psychology, 77,9-16. 

Montanelli, R. G., Jr., & Humphreys, L. G. (1976). Latent roots of random data correlation matrices 

with squared multiple correlations on the diagonal: A Monte Carlo study. Psychometrika. 41,341- 

348. 

Neumann, I., Githens, W. H., & Abraham& N. M. (1967). Development and evaluation of an o&?&r 

potential composite (NPRDC TR 98- 18). San Diego: Navy Personnel Research and Development 

Center. 

Owens-Kuxtz, C. K,, Gialluca, K. A., & Borman, W. C. (1989). Exumination of the attrition coding systern 

and development of potential attrition predictors for the Navy Reserve Oflcer Training Corps 

(NROTC)program (Institute Report No. 179). Minneapolis: Personnel Dectstons Research Institute. 

Rlland, B. (1957). The development of a fake-resistant testfor selecting career-motivoted NROTC 

scholarship recipients (PRFASD Report No. 112). San Diego: U.S. Naval Personnel Research 

Field Activity. 

503

803 

PSYCHOLOGICAL APPLICATIONS TO ENSURING PERSONNEL SECURITY: 

A SYMPOSIUM 

BORMAN, W., University of South Florida and Personnel Decisions Research 

Institutes, Inc.; 

BOSSHARDT, M., DUBOIS, D., and HOUSTON, J., Personnel Decisions Research 

Institutes, Inc., Mpls., MN; 

CRAWFORD, K., Defense Personnel Security Research and Education Center, 

Monterey, CA; 

WISKOFF, M., and ZIMMERMAN, R., BDM International, Inc., Monterey, CA; 

SHERMAN, F., Marine Security Guard Battalion, Quantico, VA. 

The national security and financial consequences of unsuitable conduct 

and compromise of classified information by persons in sensitive or 

high security risk jobs are enormous. To protect against these types 

of unreliable behavior, a,.personnel security program is utilized by 

the Department of Defense. This program has two major emphases. The 

first involves screening individuals who are being considered for 

initial clearances. The second emphasis is the ongoing or continuing 

assessment of cleared personnel. With respect to initial screening, 

investigative interview procedures and background questionnaires to 

screen applicants are discussed. Regarding continuing assessment, 

current military service programs and an approach to assessing Marine 

Security Guard behavior are described. The symposium concludes with a 

discussion of some of the difficulties encountered by practitioners 

and new approaches to improve personnel security practices. 

504

THE INVESTIGATIVE INTERVIEW> 

A REVIEW OF PRACTICE AND RESEARCiI 

David A. DuBois and Michael J. Bosshardt 


Martin F. Wiskoff 

BDM International, Inc. 

Introduction 

An important element in safeguarding national security is maintaining personnel security. 

Each year thousands of individuals are assigned to jobs that provide them with access to 

extremely sensitive or classified information that could adversely affect national security. The 

Background Investigation (BI) is the primary method for screening personnel for positions 

requiring a Top Secret clearance. This method relies principally on obtaining information from 

an interview with the subject. Supplementary information is gathered through self-report 

background questionnaires, loca1 agency checks, national agency checks, credit checks, and 

interviews with character and employment references. This information is evaluated against 

various administrative criteria by adjudicators. 

Objectives 

The objectives of this paper are to (1) describe the investigative interview, (2) review 

research related to the investigative interview, and (3) identify directions for future research in 

this area. 

Research Approach 

A literature review was conducted to identify empirical and descriptive studies related to 

the investigative interview. Specifically, computerized and manual literature searches were 

performed, as well as a telephone survey of experts from academia, industry, professional 

associations, and integrity test publishing companies. In addition, detailed information on 

investigative interview practices within the federal government was obtained through site visit 

interviews with 10 senior officials at five federal agencies. 

What is the investigative interview? 

The investigative interview is a method used for gathering information to determine the 

reliability of individuals for working in positions of trust or positions that provide access to 

extremely sensitive or classified information. Most investigative interviews are conducted by 

,organizations within the military services, federal government, and defense industry. These 

interviews can involve either the subject or persons who know the subject (e.g., references, past 

employers). They are typically conducted by interviewers who are trained in interviewing 

505 

.

-- ------- 

methods and nonverbal communication techniques (e.g., kinesics, proxemics). The interview, 

which may be conducted using a variety of formats (e.g., structured, semi-structured, 

unstructured), typically covers topics such as honesty, substance abuse, emotional stability, and 

financial irresponsibility. This interview information is then summarized in narrative or rating 

form, and combined with other information about the subject (e.g., from self-report background 

questionnaires, local and national agency checks, credit checks). Senior adjudicators then make 

final screening decisions. 

Current Investigative Interview Practice 

Ten senior officials from five government organizations [Defense Investigative Service 

(DIS), the Office of Personnel Management (OPM), the Federal Bureau of Investigation (FBI), 

the Central Intelligence Agency (CIA), and the Defense Intelligence Agency (DIA)] were 

interviewed to obtain detailed information regarding current investigative interviewing practice 

for individuals being considered for Top Secret personnel security clearances. Each interview 

lasted 1 to 3 hours. A composite description of the major features of both subject and nonsubject 

investigative interviews is presented below. 

Preparation 

Overall, the interview procedures followed by these agencies are remarkably similar in 

many respects. The interviewer generally prepares for the interview by reviewing available 

background information about the subject for missing, discrepant, and issue-oriented 

information. From this background information, specific interview questions are developed. 

Setting 

Subject and non-subject interviews are often conducted in different settings. Subject 

interviews are usually conducted in a government office setting, whereas non-subject interviews 

are less likely to be held in an office. In both types of interviews, privacy and freedom from 

distractions are the principaI requirements for the interview setting. 

Conduct 

Guidelines for interviewer conduct are similar across agencies. These guidelines include 

acting in a professional mann,er, dressing in a businesslike manner, and being courteous, 

respectful, and non-judgmental. 

Format 

The investigative interview is conducted in four phases: introduction, background form 

review, issue development, and conclusion. Each phase is different in content and tone. The 

entire subject interview typically takes from one half to one hour in length. 

Introduction. During the introduction, the interviewer usually (although not always) 

shows credentials and positively identifies the subject. In this phase, the interviewer develops 

506

7B---.-- -- 

rapport with the subject, explains the interview purpose and format, and secures a verbal 

commitment from the subject to provide truthful and complete information. 

At some point during the subject interview, the interviewer informs the subject of the 

privacy act. This may be done at the beginning of the interview (e.g., DIS, OPM) or near the 

end of the interview (e.g., FBI, CIA). OPM subject interviews are conducted under oath. None 

of the other agency officials mentioned use of an oath, although DIS interviewers seek written 

signed statements when the subject provides significant derogatory information. 

Background Review. Following the introduction to the subject interview, the interviewer 

generally reviews the subject’s background history form. During this phase of the interview, the 

interviewer questions the subject about specific items on the form, emphasizing items that ‘have 

been identified as omitted or discrepant during the preparation phase. A review of each item on 

the form is generally not undertaken. 

Issue Development. In the issue development phase, the interviewer systematically 

questions the subject on a range of topics, In most agencies, a standard list of topics is covered. 

These topics, which are similar across agencies, include education, employment, residence, 

alcohol, drugs, mental treatment, moral behavior, family and associates, foreign connections, 

foreign travel, financial responsibi!ity, organizations, loyalty, criminal history, handling 

information, and trust. Coverage of interview topics generally begins with questions on the 

subject’s background (e.g., education, employment) and later proceeds into the more sensitive 

areas. 

Conclusion. The concluding phase of the interview is focused on answering any 

concerns of the interviewee. The next steps of the security clearance process are also explained 

at this time. 

Interview Procedures 

A variety of techniques are used to facilitate the investigative interview process. 

Interviewers are typically trained in four general categories of interviewing skills: motivation, 

questioning, observation, and listening. 

Motivation. Subjects are motivated in disclosing sensitive information to the interviewer 

in several ways. The interviewer ensures that the interviewee understands the purpose, format, 

and content of the interview. The “whole person” concept of adjudication is explained so that 

the interviewee understands that negative information is judged in terms of the circumstances of 

the situation, how long ago it happened, etc., and in terms of the positive qualities of the person. 

The interviewee is informed of the consequences of omitting or providing misleading 

information. The interviewer typically secures a verbal commitment to provide complete and 

truthful information. 

Rapport is maintained by displaying a non-judgmental attitude, fairness, and respect. 

Objections are managed by clearly identifying the nature of the objection or hesitation; re-stating 

507

it the interviewee, and addressing concerns directly. 

Ouestioninq. Several questioning approaches are used in conducting subject interviews. 

Although the topic areas are generally structured, only DIS emphasizes use of a structured set of 

questions for each topic. DIS interviewers typically ask four to seven short, direct questions 

regarding a subject area, followed by summarizing questions. Other agencies use more openended 

questions, followed with summary or verification questions. Interrogative questioning 

methods are not generally used by DIS, but are occasionally used by FBI interviewers. 

Observation. All of the agencies visited train their interviewers to look for possible 

verbal and nonverbal cues to deception on the part of the interviewee. Most of these indicators 

are based on noticing patterns of various verbal, paralinguistic, and nonverbal (body gestures, 

facial expression) indicators. When possible deception is detected, the interviewer may remind 

the subject of the importance of honesty, and that confidentiality is maintained. 

Listening. Interviewers are trained to listen to the whole response, to use active listening 

procedures, and to follow-up vague responses with questions that draw out details. Techniques 

such as re-statement and paraphrasing am used to encourage elaboration. 

Documentation 

Investigators normally take only limited (or no) notes during the interview. OPM 

interviewers tend to take the most extensive notes, while FBI interviewers generally take fewer 

notes. Upon completion of the interview, interviewers write or dictate a short report 

summarizing the results of the interview. 

Decision-Making 

In all agencies, interviewers obtain the interview information but adjudicators make the 

clearance decisions. OPM is unique in that it conducts interviews on a contract basis for over 90 

Federal agencies. 

Empirical Research 

No published empirical studies were found regarding the use of investigative or integrity 

interviews. The literature search did identify five unpublished studies, most of which were pilot 

studies. 

The most relevant of these compared the relative effectiveness of two types of 

background investigations--one with a subject interview and one without. Conducted by the 

Defense Investigative Service (Office of Personnel Investigations, 1986) the study involved a 

random sample of 47 1 military members, contractor employees, and DOD civilian personnel. 

For the 186 cases in which significant adverse information was identified, the background 

investigation which included the investigative interview developed significant information in 

164 of these cases. Furthermore, the procedure which included the investigative interview 

yielded 72 cases not identified by the traditional procedure. Based on these results, the research 

508

staff concluded that inclusion of the subject interview resulted in a significant improvement in 

the background investigation procedure. 

A survey by the Director of the Central Intelligence (Office of Personnel Investigations, 

1986) of 12 government agencies examined the productivity of various sources for the purposes 

of applicant screening and security clearances. Background investigation sources included in 

this study were subject interviews, neighbor interviews, education and employment record 

checks, national agency checks, and the polygraph. The results of the study suggested that the 

subject interview was the second most productive source for identifying serious adverse 

information. 

Flyer (1986) summarized much of the early personnel security screening literature 

conducted in the military. Although no data were presented, he noted that the most important 

finding of Air Force research on personnel security screening was “the unique and considerable 

value of the subject interview. 

In summary, the limited research on investigative interviews suggests that they may be 

useful personnel security screening devices. 

Related Research 

Although the research on investigative interviews is scarce, there is a wealth of research 

on interviewing in other contexts (e.g., employment, survey research). This research is useful to 

the extent that it suggests additional techniques to apply in the investigative interview setting or 

provides a theoretical model that explains interviewing behavior. 

For example, with respect to question characteristics, research examining eyewitness 

testimony (Lipton, 1977) compared the relative effectiveness of open-ended vs. close-ended 

questions. The results indicated that narrative, open-ended formats tend to produce very 

accurate, but incomplete information, Close-ended, interrogatory formats, on the other hand, 

tend to produce more complete, but less accurate information. This led one researcher (Loftus, 

1982) to suggest that open-ended questions be used first, followed by specific (close-ended) 

questions to ensure that complete information is obtained. 

The decades of research on employment interviews and the more recent research on the 

detection of deception provide a rich source of ideas for improving investigative interviewing 

procedures. Many of these ideas have been recently summarized in a review of investigative 

interviewing and related research (Bosshardt, DuBois, Carter, & Paullin, 1989). 

While these large scientific literatures on related interviewing techniques can provide 

many ideas, there is a strong need to thoroughly investigate the utility of these ideas in the 

investigative interview setting before adopting them in practice. A careful consideration of the 

very different contexts that exist between the investigative interview and other interview settings 

suggests that results may not generalize, or that the effects may not be the same. 

For example, the purpose of the investigative interview is to screen out people, while the 

509

purpose of the employment interview is to select in personnel. The rejection rate for 

investigative interviews is about 1% to 5%, while the selection ratio for employment interviews 

is typically about 20% to 60%. The focus of investigative interviews is on behavioral constructs 

such as behavior unreliability and unsuitability, while employment interviews focus on cognitive 

ability, motivation, and communication skills. Perhaps most importantly, the motivational 

approach used is very different for these two settings. The consequences of providing good 

information in an investigative interview is the avoidance of punishment and the reward of a job 

for the employment interviewee. 

Needed Research 

Although the interview has been extensively studied as a method of gathering 

information, little research is available regarding its use in the investigative interview setting. A 

variety of investigative interviewing procedures are currently in use and the large literature on 

other interview settings suggest additional procedures to consider. Research is needed to 

systematically evaluate the effectiveness of these various investigative interview methods. 

One major finding from employment interview research can probably be generalized to 

the investigative interview--the most impressive gains in interview validity result from the 

systematic study of the performance criteria that is to be predicted. Research that defines the 

psychological dimensions and behavioral detail of security relevant performance can contribute 

to significant improvements improving interviewer training, assessing personnel security risks, 

and predicting unreliable behavior. 

REFERENCES 

Bosshardt, M.J., DuBois, D.A., Carter, G.W., & Paullin, C. (1989). The investigative interview: 

A review of practice and related research (Technical report No. 160). Minneapolis, MN: 

Personnel Decisions Research Institute. 

Flyer, ES. (1986). Personnel securitv research: Prescreening and background investigations 

(Report No. HUMMRRO-FR-86-01). Alexandria, VA: HUMRRO International, Inc. 

Lipton, J.P. (1977). On the psychology of eyewitness testimony. Journal of Applied 

Psvchologv, a(l), 90-95. 

Loftus, E. (1982). Interrogating witnesses--good questions and bad. In R. M. Hogarth (Ed.), 

Guestion framing and response consistency. San Francisco: Jossey-Rass. 

Office of Personnel Investigations. (1986). Subiect interview study: Phase I report. 

Washington, D.C.: U.S. Office of Personnel Management. 

510

Backaround 

Each of the military services prescreens enlisted 

applicants for sensitive occupations, i.e., those that 

require a Top Secret clearance, access to Sensitive, 

Companmented Information, or are included in the 

Nuclear Weapons Personnel Reliability Program. The 

prescreening occurs prior to the initiation of the 

Personnel Security Investigation (PSI) and is designed 

to: (a) reduce the probability of assigning unreliable 

individuals to sensitive positions and (b) cull out 

individuals who are likely to be denied a security 

clearance. Crawford and Wiskoff (1988) in their review 

of the prescreening procedures used by the military 

services, found that they had been developed without 

empirical assessment of their validity and utility. As an 

example, they pointed out that despite intensive 

prescreening, the discharge rate from military service 

for reasons of unsuitability was not much lower for 

high-security occupations than that for other military 

jobs. 

The security interview at the Military Entrance 

Processing Station (MEPS) is the first step in the 

prescreening process for enlisted Army applicants to a 

sensitive job. Prior to the interview, applicants 

complete the Army Security Screening Questionnaire 

(DAPC-EPMD FORM 169-R). Responses to the 

questionnaire are examined by a security interviewer 

and explored further during an interview with the 

applicant. For those applicants who are accepted into 

a sensitive job and placed into the Delayed Entry 

Program (DEP), a second 169-R is completed and an 

interview conducted upon completion of the DEP. 

Puroose 

This purpose of this investigation was to explore 

the effectiveness of the 169-R as a security 

prescreening instrument, in terms of: (a) the degree to 

which it is able to predict two operational screening 

decisions and a measure of personnel reliability and 

(b) the utility or impact of using the infoimation it 

provides, along with other applicant data. The study 

was preliminary in that only a small sample was 

.analyzed to determine whether it would be fruitful to 

conduct a large scale study. A more complete 

discussion of the study and the results is available in 

Zimmerman, Fitz, Wiskoff, and Parker (in press). 

UTILIW OF A SCREENING QUESTIONNAIRE FOR 

SENSIT-IVE MILITARY OCCUPATIONS 

Ray A. Zimmerman 


BDM international, Inc. 

511 

Sample 

Army Security Screening Questionnaires filled out 

by applicants from 1981 through 1986 were collected 

from MEPS throughout the country. Only the 

questionnaires completed during 1984 were used 

because: (a) the questionnaire had been revised 

several times during the years prior to 1984 and 

(b) individuals completing questionnaires after 1984 

would not have had the opportunity to finish their first 

term of service. Questionnaires were available for 

2,870 applicants. From these a random sample of 281 

non-prior service males was drawn. Analyses 

indicated that the sample appears to match the 

population of 1984 applicants to high security 

occupations fairly well in terms of Armed Forces 

Qualification Test (AFQT) scores and demographic 

variables such as race, age at service entry and level 

of education. 

Predictor Measures 

The Army 169-R administered in 1984 consists of 

a series of 45 questions which can be answered “yes” 

or “no,” relating to: (a) Prior Military and Federal 

Service, (b) Foreign Connections, (c) Drug Use, 

(d) Alcohol Use, (e) Emotional Stability, (9 Sexual 

Misconduct, (g) Financial Problems, (h) Employment 

Problems, (i) Delinquency, and (i) Legal Offenses. For 

each affirmative response, the applicant must provide 

details of the specific incidents or experiences. !n 

addition, applicants must supply detailed information 

about current financial obligations and any previous 

arrests, citations, or other types of contact with the 

legal system. Most applicants can complete the 169-R 

in approximately one-half hour. 

For this study, two classes of predictors were 

taken from the 169-R: (a) yes/no items and 

(b) detailed information that was transformed into 

coded items. There were 50 coded items analyzed as 

predictors. 

Other applicant data that are available at the time 

of the security interview were examined in conjunction 

with 169-R responses. These additional predictors 

included AFQT category, age at entry into the Army 

and level of education. The data were obtained frcm 

personnel records available at the Defense Manpower 

Data Center (DMDC).

Criterion Measures 

Crawford and Trent (1987) note that in personnel 

security research, the focus is on whether an individual 

demonstrates reliability, trustworthiness, good judgment 

and loyalty in the actual handling and us8 of Classified 

information. Failure of the individual could be 

manifested at one level in excessive security violations 

and at the extreme in the deliberate compromis8 of 

classified information, including espionage. 

Fortunately, compromise and espionage exhibit 

a very low base rate. Security violations, while more 

frequent, also show a low base rate, and in addition 

information on commission of violations is not available 

in centralized data baS8.S. 

Three alternative criteria were used in this study: 

1. Prescreening adjudication decision. This 

decision is made at the MEPS after the applicant has 

completed the 169-R and the security interview. The 

security interviewer, after consultation with security 

personnel within his/hsr chain of command, determines 

whether the applicant should be allowed to continue 

processing for a sensitive occupation. Many of the 

rejected applicants enter the Army in non-sensitive 

occupations and receive lower level security 

clearances. Historically, approximately 33 to 47% of 

applicants are rejected at this stage of processing. 

2. Issue Case status. If derogatory information 

is discovered during the course of the PSI, the 

investigation is expanded and designated as an “issue 

case.” This designation indicates, in most instances, 

that there is some evidence of a blemish in an 

individual’s behavior, associations, etc. that may be a 

cause to question his/her qualifications to hand18 

classified material. Issue case status has been 

employed as an operational criterion in previous 

studies (Crawford and Trent, 1987; Wiskoff and 

Dunipace, 1988). Data Concerning iSSue case Status 

W8r8 obtained from the Defense Central Index Of 

Investigations (DCII), a copy of which is maintained for 

research purposes at DMDC. 

3. Type of discharge. This variable refers to 

whether or not the individual was discharged from the 

Army for reasons of unsuitability. Unsuitability attrition 

is operationally defined as those acc8ssions listed on 

the DMDC Cohort File having inter-service separation 

codes 60-87 for failure to meet minimum behavioral or 

performance standards. Type of discharge has been 

used in many studies of military service attrition. 

Analvses 

Only the data from the S8COnd administration of 

the 169-R were used for individuals who had 

completed the form twice, i.e. entering and leaving 

DEP. This was necessary, because for these 

individuals, the final prescreening adjudication measure 

represents a decision that is based on information from 

the second set of responses. 

512 

The first set of analyses focused on the validity of 

the instrument. First, a series of correlational analyses 

was conducted to examine the relationship between 

each of the yes/no and coded items and the criierion 

measures. Next, empirical scoring keys for each of the 

criieria were developed using the horizontal percent 

method (Guion, 1965). The total score for each key 

was subsequently Correlated with each criterion 

measure. In addition, AFQT category and age at entry 

into th8 Army were examined for their inCrem8ntal 

validity in predicting issue case Status and type of 

discharge. Level of education could not be used 

because there were too few individuals who did not 

have a high school diploma. 

The second set of analyses examined the utility of 

decisions based on cutoff scores for the empirical 

scoring keys. Utility was assessed by examining the 

percentage of individuals that can be identified and 

Screened out using the empirical scoring keys and their 

associated cutoff scores, for different combinations of 

AFQT and age at entry categories. 

Results 

It is important, in examining the findings of this 

study, to note that the data for the three criterion 

measures do not represent the progression of a single 

cohort through th8 screening process. That is, each 

applicant’s predictor data were matched to his/her 

criterion data without regard for how the person fared 

on the other criteria. For instance, it is possible for an 

applicant to have been screened out of a sensitive job 

during th8 prescreening adjudication and still have 

criierion data on type of discharge, as long as the 

person did enlist in the Army in a non-sensitive 

occupation. 

In reviewing the relationships of the individual 

items to the criteria, it should be remembered that 

some types of negative behavior are r8latiV8ly rare or 

are not often admitted. This low baS8 rat8 for an item 

serves to restrict the variance of the variable and 

attenuate its correlation with the criterion. Overall, 11 

items showed statistically significant relationships with 

prescreening adjudication, thr88 with issue case status, 

and only one with type of discharge. Drug use and 

financial problems were the two content areas with the 

most significant relationships. 

The validity coefficients for the empirical scoring 

keys and the regression models (including the 

empirical keys and additional applicant data) are 

displayed in Table 1. Each key shows a significant 

correlation with th8 criterion it was designed to predict. 

Both the prescreening adjudication key and the issue 

case status key had fairly strong correlations with 

prescreening adjudication and issue case status. Only 

the type of discharge scoring key was significantly 

correlated with all three criteria, although the r’s only 

ranged from .12 to .15.

Empirical Scorinq Key 

Prescreening Adjudication 

Issue Case Status 

Type of Discharge 

Reqression Model 

ISSUe Case Status key, AFQT, and Age 

Type of Discharge key, AFOT, and Age 

� p c .05 � * p < .Ol 

Regression analyses were performed to examine 

the incremental validity of the additional applicant data. 

AFQT was collapsed into high (I-IIIA) and low (IIIB or 

below) categories, Age was collapsed into three 

categories: (a) 17 year olds, (b) 18-20 year olds, and 

(c) 21 year olds or older. In Table 1 it is seen that 

there is no evidence of incremental validity in predicting 

issue case status by including AFQT category or age 

at entry. However, for type of discharge, the validity 

coefficient increases from .15 to 22 with the addition of 

these variables. 

169-R Item 

Table 1 

Validity Coefficients for Empirical Scoring Keys 

and Regression Models 

Prescreening 

Adiudication 

a** 

.25** 

.12* 

Issue Case 

Status 

.21** 

.27** 

.15* 

.27** 

Type of 

Discharqe 

-.02 

-.02 

.15* 

.22** 

Figure 1 displays the 169-R items that are 

included within the scoring keys for each of the criteria. 

Four of the items, i.e. times marijuana use, times 

intoxicated, visits for nervous, emotional, mental 

counseling and suspended/expelled from School 

appear in all three scoring keys. Three other items are 

in two of the criteria while the remaining nine are only 

found in one of the keys. 

Prescreening Issue Case Type of 

. . . 

l&&cxl Status 

Times marijuana use J J J 

Frequency of marijuana use J J 

Used hard drugs J 

Possessed, transported, grown, produced, etc., drugs J J 

Transpotied, sold, etc., alcohol J 

Times intoxicated J J J 

Frequency of alcohol usage J 

Visits for nervous, emotional, mental counseling J / J 

Pregnant or caused pregnancy J 

Written bad checks J 

Made delinquent payments J 

Experienced financial problems J 

Left job under less than favorable conditions J J 

Suspended/expelled from school J J J 

Unsafe vehicle/licensing violations / 

Ran awav or considered runnina from home J 

Figure 1. Form 169-R items included in empirical scoring keys 

513

The final analysis looked at the utility of the 

scoring keys as defined as a reduced risk of: 

(a) having a security clearance denied to an individual 

who has been assigned to a sensitive duty position 

and (b) assigning unreliable individuals to sensitive 

duty positions. The utility was evaluated by first 

establishing cutoff scores and then determining what 

the impact would have been if the empirical keys and 

cutoff scores had been used in prescreening. 

The goal, in setting the cutoff scores, was to 

screen out individuals with low scores on the empirical 

keys and yet fulfill existing manpower requirements. In 

this sample, 19% of the non-prior service male 

applicants were rejected in the prescreening 

adjudication phase. Thus, cutoff scores were 

established for the three scoring keys at the point 

closest to the 19%/81% split. 2 

Empirical 

Scorinq Key 

Prescreening 

Adjudication 

score 

Below 

cutoff 

Issue Case Status Below 

cutoff 

Type of Discharge 

Regression model 

with Type of 

Discharge 

Below 

cutoff 

Above 

cutoff 

Below 

cutoff 

Above 

cutoff 

Table2 

impact of Using Cutoff Scores on the Issue Case 

and Unsuitability Dkharge Rates 

Table 2 shows the impact of using the three keys 

in terms of reducing the issue case and unsuitability 

discharge rates. The base rate for issue cases in this 

sample was 8.0% The percentages of issue cases 

above the cutoff was lower than the base rate for all 

three keys, with the issue case status key showing the 

lowest percentage (5.3%). Thus, the issue case rate 

could be reduced by approximately three percentage 

points by using this key. Analysis of DCII data 

revealed that 289 of the non-prior service males who 

entered high security occupations in the Army in 1984 

became classified as issue cases. Thus, approximately 

98 of these individuals would not have been allowed 

into high security occupations if the issue case status 

scoring key and its cutoff had been used for 

prescreening. 

Issue Case Status Tvoe of Discharqe 

Percent Percent Percent Percent 

w No Issue Unsuitable Normal 

25.9 74.1 11.5 88.5 

5.6 94.4 14.0 86.0 

22.2 77.8 9.4 90.6 

5.3 94.7 14.5 85.5 

16.1 83.9 24.4 75.6 

6.7 93.3 11.7 88.3 

28.0 72.0 

10.4 89.6 

Base Rate 8.0 92.0 13.5 86.5 

514

The base rate for applicants who received 

unsuitability discharges was 13.5%. Table 2 shows 

that the greatest reduction in this rate occurs with the 

use of the Type of Discharge key plus the 

supplementary predictors, i.e. AFQT category and age 

at service entry. At a cutoff score closest to the 

19%/81% split, the percentage of unsuitability 

discharges would have been reduced to 10.4%, slightly 

more than three percentage points below the base rate. 

This translates into 99 unreliable individuals who would 

have been screened out. 

Conclusions and Recommendations 

The major caveat in deriving operational 

conclusions from the findings of this study was the 

relatively small sample size. Other problems which are 

discussed in Zimmerman, et. al. (in press) are: 

(a) criterion issues such as the relevance of the criteria 

to personnel security decisions and the existence of 

false negatives (e.g., individuals classified as issue 

cases who are granted their securg clearances) and 

false positives (e.g., individuals who are never 

classified as an issue case yet they are turned down 

for a security clearance) and (b) impact of low base 

rates in both the predictors and criteria. 

Despite these caveats, further research on the 

169-R, using a large data sample, seems to be 

warranted for two reasons. First, the findings of this 

report clearly indicate the utility or benefit of using 

empirical scoring keys to supplement existing 

prescreening procedures based on the 169-R. 

Second, for many predictor variables from the 169-R, 

cell sizes were too small to compute a valid measure 

of association. If all available data for an entire year 

were analyzed, more definitive results could be 

obtained. 

In addition to analyzing a larger data sample, a 

potentially fruitful avenue is the revision of the 169-R to 

increase its validity. 

515 

Note: Since the completion of this study, research has 

been initiated on a much larger sample of 169-R forms 

completed by applicants in 1986. In addition, a 

revision of the 169-R has been developed jointly by the 

Defense Personnel Security Research and Education 

Center and the U. S. Total Army Personnel Command, 

and was operationally implemented on 1 October 1990. 

References 

Crawford, K. S. & Trent, T. (1987). Personnel security 

prescreenino: An application of the Armed 

Services Applicant Profile fASAP) (PERS-TR-87- 

003). Monterey, CA: Defense Personnel Security 

Research and Education Center. 

Crawford, K S., & Wiskoff, M. F. (1988). Screening 

enlisted accessions for sensitive military iobs 

(PERS-TR-89-001). Monterey, CA: Defense 

Personnel Security Research and Education Center. 

Guion, R. M. (1965) Personnel testing. New York: 

McGraw-Hill. 

Wiskoff, M. F. & Dunipace, N. E. (1988). Moral waivers 

and suitabilitv for hiah securitv militarv lobs (PERS- 

TR-88-011). Monterey, CA: Defense Personnel 

Security Research and Education Center. 

Zimmerman, R. A., Fitz, C. C., Wiskoff, M. F., & Parker, 

J. P. (in press). Preliminarv analvsis of the U. S. 

Armv Security Screening Questionnaire (PERS-TN- 

90-008). Monterey, CA: Defense Personnel 

Security Research and Education Center.

Continuing Assessment of Cleared Personnel in the Military Services 

Michael 3. Bosshardt 

David A. DuBois 


Kent S. Crawford 

The Defense Personnel Security Research and Education Center 

Problem and Backqround 

Examination of recent espionage cases suggests that few spies enter 

government service with the intent to commit espionage. Instead, most 

individuals become spies as a result of personal and situational factors 

that occur after they receive a personnel security clearance. This 

suggests that an ongoing program of continuing assessment (CA) for cleared 

personnel should be an important component of the personnel security 

process. 

Two other factors underscore the importance of the CA program. First, 

initial clearance screening procedures tend to be costly, involve 

conditions of very low base rates, and have unknown validity. Second, 

hostile intelligence activities probably focus more effort on currently 

cleared personnel than on uncleared individuals. 

Despite its importance and the fact that formal CA programs have been in 

existence for a number of years, little is known about operational CA 

programs (DOD Security Review Commission, 1985). In order to address this 

deficiency, a project was initiated to evaluate how well CA programs are 

operating in the military services. The principal activities in this 

project included a review of regulations and literature related to CA 

(DuBois & Bosshardt, 1990), a survey of personnel at 60 Army, Air Force, 

Navy, and Marines Corps installations world-wide to obtain detailed 

information about CA programs (Bosshardt, DuBois, Crawford, & McGuire, 

1990; Bosshardt, DuBois, & Crawford, 1990a), and an analysis of systems 

issues related to CA (Bosshardt, DuBois, & Crawford, 1990b). 

Objectives 

The objectives of this paper are to (1) present some of the key findings of 

this survey of CA programs and (2) provide a preliminary assessment of the 

effectiveness of these programs. 

Approach 

The initial step in the study involved a review of regulations and 

literature related to CA. We then conducted a series of meetings with 

service branch headquarters and adjudication officials to gain a further 

understanding of CA policies and programs. Following this, nine military 

516

installations were visited to obtain an understanding of operational CA 

programs in the military and to gather information necessary for developing 

the survey research approach. 

These research activities led to the development of three preliminary 

survey forms. The principal form was a structured interview protocol for 

installation security office representatives. Two shorter survey forms 

were also developed for unit security managers and unit commanders. 

Preliminary versions of these forms were reviewed by several CA experts and 

pilot tested prior to actual survey administration. 

The survey forms were administered between September, 1989 and January, 

1990. The sample included 60 sites (21 Air Force, 19 Army, 18 Navy, and 2 

Marine Corps). Forty-eight were sites where individuals primarily had 

collateral access (i.e., top secret, secret, or confidential access) and 12 

were sites where individuals primarily had SC1 access; ten were overseas 

sites. Overall, completed survey forms were received from 60 installation 

security managers, 126 unit security managers, and 88 unit commanders. 


The structured interview protocol for installation security managers 

included approximately 60 open-ended questions and numerous rating items. 

Two key issues concern the best sources of CA-relevant information and the 

most frequently reported types of CA information. Data concerning both 

issues are presented below. 

Sources of CA Information. Installation security managers were asked to 

rate the willingness of various groups to share derogatory information of 

security relevance with the security office. The results indicated that 

the military police, the clearance adjudication facility, and the 

investigations office are among the most willing to share information with 

the security office. Several types of installation personnel (e.g., 

installation commanders, unit commanders, unit security managers) received 

moderate to high ratings. Most installation departments (e.g., medical, 

personnel, legal) and non-installation groups (e.g., local civilian police, 

federal agencies) were perceived as only moderately willing to share 

derogatory CA information. Employee assistance groups received relatively 

low ratings. Not surprisingly, coworkers and subjects were rated as least 

willing to share derogatory information. 

Tvoes of CA Information Reported. Installation security managers estimated 

the number of valid derogatory incidents reported to their security office 

during the past year for each of 12 types of information. The mean number 

of reported incidents (per 1000 cleared individuals) for various areas is 

shown in Table 1. 

' A complete summary of all results is presented in Bosshardt, OuBois, 

Crawford, and McGuire (1990). 

517

Table 1 

Mean Estimated Number of Valid Derogatory Incidents Reported to Collateral 

and SC1 Installation Security Offices During the Past Twelve Months 

(Per 1000 Cleared Individuals) 

Tvoe of Reoorted Incident 

Alcohol abuse 12.1 

Other incidents (e.g., non-judicial punishments) 9.5 

Drug abuse 6.6 

Criminal felony acts not covered in other categories 3.4 

Financial problems 3.1 

Court martials/desertions 3.1 

Falsification of information acts 

EmotionaJ/mental/family problems K 

Security violation incidents 2:1 

Sexual misconduct 1.6 

Foreign associations/travel incidents 

Disloyalty to the U.S. :: 

Collateral SC1 

Sites Sites 

Note. The samples include 43 collateral sites and 12 SC1 sites. 

The results in Table 1 suggest that alcohol abuse and other incidents 

(e.g., NJPs) are the most frequently reported areas at both collateral and 

SC1 sites. Overall, the average number of reported incidents across all 

incident categories (per 1000 cleared individuals) is 46.9 for collateral 

sites and 42.3 for SC1 sites. 

The CA survey yielded a considerable amount of quantitative and qualitative 

data. In addition to the interview data provided by installation security 

managers, four types of data were gathered: (1) ratings by installation 

security managers, unit security managers, and unit commanders of 136 

obstacles in maintaining an effective CA program, (2) write-in responses 

(n=684) by these three groups regarding the major CA problems, (3) ratings 

by installation security managers of 143 suggestions for improving CA, and 

(4) write-in suggestions (n = 636) by installation security managers, unit 

security managers, and unit commanders for improving CA. 

In order to have a common basis for comparing the quantitative and 

qualitative data and to facilitate the interpretation of the survey 

results, a taxonomy of CA problem/recommendation (or "finding") areas was 

developed. This taxonomy included eight general categories: (1) security 

education for cleared personnel; (2) training for security personnel; (3) 

derogatory information indicators, sources, and methods; (4) clearance 

adjudication procedures; (5) accountability for CA; (6) CA regulations; (7) 

CA emphasis; and (8) CA system considerations (e.g., legal issues, number 

of cleared personnel). 

518 

9.3 

9.8 

3.2 

::: 

:1 

2.8 

6.8 

2: 

.2

Obstacles in Maintainina an Effective CA Proaram. In general, analyses of 

the quantitative and qualitative survey data indicated that security 

education for cleared personnel, training of security personnel, and 

derogatory indicators, sources, and methods are the biggest obstacles in 

maintaining an effective CA program across the eight taxonomy areas. The 

clearance adjudication process, the emphasis on CA, and CA system 

considerations received moderately high rankings across the "CA obstacles" 

data sets. CA regulations and accountability for CA received the lowest 

overall rankings. 

Ratings of 136 specific obstacles to maintaining an effective CA program 

were provided by all survey respondents. The six most highly rated items 

by collateral site respondents (N=224) are listed below: 

- Reluctance of individuals to self-report derogatory information. 

- Too much time is taken by central adjudication facility to make 

clearance suspensioh/revocation decisions. 

- Lack of/inadequacy of training modules to instruct commanders and 

supervisors on how to on how to spot, interpret, and manage the 

early warning indicators of personnel security risks. 

- Reluctance of coworkers to report derogatory information. 

- Lack of standard training modules for unit commanders, supervisors, 

and cleared individuals which describe their continuing assessment 

responsibilities. 

- Delays in obtaining replacement personnel for individuals who lose 

security clearances. 

Recommendations for Imorovinq CA. Installation security managers rated 143 

suggestions for improving CA using a IO-point rating scale. Those items 

receiving the highest mean ratings are listed below: 

- Develop training modules to instruct commanders and supervisors on 

how to spot and manage the early warning indicators of personnel 

security risks and personnel problems. 

- Modify the regulations to direct other installation groups to 

provide more information to the security office. 

- Create a separate, full-time position for personnel security 

officers. 

- Improve continuing assessment training for supervisors. 

- Develop formal reporting procedures and written standards for the 

personnel, medical, legal, and other departments which define the 

types of information to be shared with the security office. 

- Increase/improve continuing assessment training for security 

managers. 

519

Effectiveness of CA. There is limited data for assessing the effectiveness 

of current CA programs. Findings from the survey indicated that (I) 

approximately 80 percent of the installations surveyed maintain some 

statistics relevant to CA (e.g., numbers and types of clearances, numbers 

of clearance suspensions and revocations, numbers of security violations, 

or numbers of reported derogatory incidents), (2) relatively few derogatory 

incidents are reported to the security office (see Table l), and (3) the 

number of clearance suspensions and revocations is very small. Table 2 

shows the number of clearance suspensions and revocations during the past 

12 months for sites in the survey sample. 

Table 2 

Approximate Types and Numbers of Clearances, Numbers of Clearance 

Suspensions, Numbers of Clearance Revocations for Survey Sites 

Confidential 

Clearances 

Secret 

Clearances 

Top Secret 

Clearances 

Top Secret 

Clearance with 

SC1 Access 

Mean Estimated 

Total Number 

(per site) 

295 

2847 

583 

678 

Mean Estimated Mean Estimated 

Number of Clearances Number of Clearances 

Suspended Per 1000 Revoked Per 1000 

During Past 12 Months During Past 12 Months 

(per site) (per site) 

0.8 0.3 

4.2 0.5 

1.2 

0.1 

2.4 1.8 

Notes. Estimates are based on information provided by installation 

security officers. 

The sample sizes for these analyses ranged from 48 to 54. 

Ratings of overall program effectiveness by installation security managers 

indicated that CA programs are moderately effective. The mean 

effectiveness ratings were quite similar across service branches, with the 

Air Force receiving the highest effectiveness among collateral sites and 

the Navy receiving the highest effectiveness rating among SC1 sites. The 

mean effectiveness ratings of SC1 and collateral programs were nearly 

identical within the Army and within the Air Force, but within the Navy SC1 

sites received much higher ratings than collateral sites. 

520

Installation security managers also rated the effectiveness of several 

aspects of the CA program. The results indicated that the clearance 

suspension/revocation process, sources of derogatory information, service 

branch regulations, indicators of security risk, reporting procedures, and 

security education are considered most effective. In contrast, the two 

lowest rated program aspects were performance appraisal information and 

incentives for reporting. The mean ratings were generally similar across 

service branches and for collateral and SC1 sites. 

In summary, little is known about the effectiveness of existing CA programs 

in the military services. The limited data suggests that these programs 

moderately effective, although they could be improved. 

Future Research 

Overall, the project resulted in 52 recommendations for improving CA 

programs (see Bosshardt, DuBois, Crawford, & McGuire, 1990; Bosshardt, 

DuBois, & Crawford, 1990b). The next step in this research program is to 

have personnel security experts from DOD, service branch headquarters, 

field installations, and the adjudication facilities prioritize these 

recommendations. Future research will focus on the highest priority items. 

References 

Bosshardt, M.J., DuBois, D., and Crawford, K. (1990a). Survev of 

continuinq assessment oroqrams in the militarv services: 

Recommendations. Monterey, CA: Defense Personnel Security Research and 

Education Center. 

Bosshardt, M.J., DuBois, D., and Crawford, K. (1990b). Survev of 

continuins assessment proqrams in the militarv services: Svstems issues, 

recommendations and program effectiveness. Monterey, CA: Defense 

Personnel Security Research and Education Center. 

Bosshardt, M.J., DuBois, D., Crawford, K., and McGuire, D. (1990). Survey 

of continuinq assessment oroqrams in the militarv services: Methodoloqv, 

analvses. and results. Monterey, CA: Defense Personnel Security 

Research and Education Center. 

DOD Security Review Commission, General Richard Stilwell (Chairman), 

(1985). Keepina the nation's secrets: A reoort to the Secretarv of 

Defense bv the Commission to Review DOD Securitv Policies and Practices, 

Washington, D.C.: Office of the Secretary of Defense. 

DuBois, D., Bosshardt, M. J., and Crawford, K. (1990). Continuinq 

assessment of cleared oersonnel in the militarv services: A conceptual 

analvsis and literature review. Monterey, CA: Defense Personnel 

Security Research and Education Center. 

521

Problem and Background 

A MEASURE OF BEHAVIORAL RELIABILITY 

FOR MARINE SECURITY GUARDS 

Janis S. Houston 

Personnel Decisions Research Institutes 


BDM International, Inc. 

and 

Forrest Sherman 

Marine Security Guard Battalion 

The United States Marine Corps provides security guard services to meet the Department 

of State requirements at Foreign Service posts throughout the world. This use of Marines 

as security guards at Embassies, Legations, and Consulates was initiated in 1948 by a 

formal Memorandum of Understanding between the Department of State and the Secretary 

of the Navy. The primary mission of the Marine Security Guards is to protect the 

personnel, property, and classified and administratively controlled material and equipment 

within these premises. 

There are approximately 1300 Marine Security Guards (MSGs) currently serving at 140 

foreign posts in over 100 countries. These detachments range in size from five to thirtyeight 

Marines, and each is commanded by a senior non-commissioned officer, referred to 

as the “Detachment Commander”. 

The work described here was the fourth phase of a research effort undertaken jointly by 

the Marine Security Guard Battalion and the Defense Personnel Security Research and 

Education Center. Prior phases of this effort focused on improving the procedures used 

for pre-screening and selecting Marines for MSG duty, and are described in Parker, 

Wiskoff, McDaniel, Zimmerman, and Sherman (1989) and in Wiskoff, Parker, Zimmerman, 

and Sherman (1989). 

Obiective 

The primary objective of this work was to develop a system for the continuing evaluation 

(CVAL) of MSG performance and behavioral reliability. As has been pointed out 

(DuBois, Bosshardt, and Crawford, 1990), recent espionage cases suggest that individuals 

become spies as a result of personal and situational factors that occur after they receive 

personnel security clearances and are performing in sensitive or high security risk 

jobs. The importance of having a continuing assessment program for MSGs, in addition 

to the very careful selection procedures, was highlighted in December 1986, when Sgt. 

Lonetree admitted to providing information to the Soviet Union while serving in 

Moscow as an MSG. 

522

The goal for the CVAL system was to reduce the risk of personnel security incidents and 

improve the ability of Detachment Commanders to anticipate personnel problems before 

they became major disruptions. Thus, there was some emphasis on being able to use 

CVAL as a kind of warning system, one which would indicate when there was a need to 

intervene, either with informal counseling or disciplinary action short of judicial punishment. 

In this context, then, there were several ancillary objectives for the development of 

CVAL: (1) to provide an early warning indicator, with suggestions for intervention; (2) 

to provide a leadership, counseling, and training tool for Detachment Commanders; and 

(3) to minimize personnel turbulence and facilitate/document personnel decisions made 

concerning the reliability of MSGs. 

Method of Development 

General Orientation. It was felt&om the outset that some kind of behavioral checklist 

would be an appropriate format for the cornerstone of CVAL. In a recent review of 

personnel reliability programs (Bosshardt, DuBois, and Crawford, 1990), the need was 

pointed out for more careful definition of the factors that may indicate an individual has 

become a security risk. In the current project, we wanted to produce a checklist of 

observable behaviors that could indicate when an MSG’s performance was beginning to 

exhibit signs of unreliability. This checklist could then be completed by the Detachment 

Commander on a regular basis for each MSG, and appropriate action taken. 

Sources of Information. The primary source of information for the development of a 

CVAL checklist was the huge collection of written examples of MSG 

performance/behavior generated in a prior phase of this research effort. These performance 

examples were used in the prior research to develop behaviorally-anchored rating 

scales that could serve as criteria for validity investigations of the screening procedures 

(Houston, 1989). 

To obtain the performance examples, workshops were conducted with MSGs, Detachment 

Commanders, and the Instructors/Advisors at MSG School, all of whom had prior 

experience as MSGs and/or Detachment Commanders. Participants in the workshops 

were asked to write (in a structured format) examples of MSG behaviors that were indicative 

of extremely effective, average, and extremely ineffective performance. This technique 

yielded over 300 examples of behavior that realistically portrayed both highly 

effective and highly ineffective MSG performance. The examples were then sorted into 

categories that represented important dimensions of the MSG job. The set of dimensions, 

and the list of behaviors in each dimension, was the starting point in the development 

of a CVAL measure. 

Other sources of information included: evaluation forms that had been developed for use 

at MSG School, e.g., Peer Evaluation Forms and Screening Board Evaluation Forms; 

checklists developed for use as indicators of chemical dependency and emotional instability; 

and reports of existing personnel reliability programs, e.g., the Air Force’s Nuclear 

Weapons Personnel Reliability Program (PRP), the Department of Energy’s Human 

Reliability Program (HRP), and the Navy’s Security Access Eligibility Report (SAER). 

Another helpful source of information was the record of MSG Non Judicial Punishments 

and Reliefs For Cause kept at MSG Battalion Headquarters. A content analysis of these 

records was performed, to determine what types of behavior problems seemed to be the 

523

most common. Finally, MSG Battalion personnel were extensively interviewed, to solicit 

their ideas on what behaviors represented potential reliability problems. 

Preparation of Behavior Indicators Checklist. All of the information described above 

was converted to a list of discrete behaviors that could indicate a potential personnel 

security risk. These behaviors were sorted, where possible, into the categories used for 

the MSG performance rating scales developed in the prior phase of this research. New 

categories were formed where the pre-existing system did not seem to cover clusters of 

behaviors, and a number of the pm-existing categories were combined and/or renamed, 

as appropriate. 

The first draft of the Behavior Indicators Checklist contained 61 behaviors, grouped into 

10 clusters or behavior categories. Each of these behaviors was considered to be an 

indication that an MSG might be headed for, if not already in, some kind of trouble, 

ranging from emotional instability to drinking problems, or simply not realizing the 

dangers of becoming too friendly with Foreign Service Nationals about whom little was 

known. 

Examples of checklist behaviors are: “MSG often becomes disorderly or violent when 

drinking” and “A Foreign Service National shows a sudden increase of favors towards 

this MSG.” There were a number of behaviors that, while not particularly desirable, may 

not indicate a real problem if the behavior is relatively short in duration, for example, 

“MSG frequently asks to get off duty early or switch duty assignments.” There might be 

an acceptable reason for the latter example, e.g., visiting relatives or a special, detachment-related 

project. The important point here is that the Detachment Commander 

should be aware of the reason for these behaviors, and, if appropriate, take action to 

decrease undesirable or dangerous behaviors. 

Field Review: An Iterative Process. There were two rounds of field review of the 

Behavior Indicators Checklist. In both cases, the checklist was taken out to MSG detachments 

and feedback was obtained in small group (or one-on-one) structured interviews 

with incumbent MSGs, Detachment Commanders, and a number of the Department 

of State officials who work with MSGs in the detachments. Sites were selected 

with the following criteria in mind: (1) detachments with Commanders who had a fair 

amount of experience in the MSG program; (2) as much geographical dispersion as 

possible, within the constraints of our budget; (3) sites that varied in terms of their perceived 

desirability (a function of potential threat and of general desirability and hospitality 

of the location); (4) detachments that varied in terms of their size, i.e., number of 

MSGs; and (5) at least some detachments where there was an obviously high threat of 

counter intelligence activity (e.g., Eastern Bloc countries). 

The first round of site visits included Vienna, Prague, Belgrade, and Athens. In the 

interviews at each detachment, the draft checklist was discussed, item by item, to address 

the following issues: 

(1) the appropriateness and clarity of the wording; 

(2) the extent to which each behavior did, in fact, indicate a potential personnel problem; 

(3) the comprehensiveness of the list of behaviors, i.e., whether there were any behavior 

indicators that we had overlooked; and 

(4) the response format that should be used for the checklist. 

524

Based on the feedback received from the first round of site visits, a specific response 

format was selected, and a number of revisions were made to the checklist, including 

specific wording changes to increase clarity or applicability, the addition of several 

behaviors and the deletion of a few, and the combining of two categories that were seen 

as overlapping. This draft was reviewed by MSG Battalion personnel, including the 

MSG School Instructors/Advisors, and the second round of site visits was scheduled. 

There were six detachments visited in the second field review, one in the Middle East, 

four in SubSaharan Africa, and one in Western Europe. The same format was followed 

for these site visits in terms of the individuals interviewed and the topics covered. There 

were several more suggestions for additions and deletions, and a number of further 

wording changes recommended. The checklist was again revised, based on these 

recommendations, and was again reviewed by MSG Battalion personnel. The final set of 

categories were entitled: 

A. Job Performance E. Social Behavior 

B. Liberty Behavior z F. Emotional Behavior 

C. Drinking Behavior G. Money-Related Behavior 

D. Personal Relations/Associations H. Physical Health and Appearance 

Each category had a list of relevant behaviors, an Overall Rating Scale, and a space to 

write comments related to that category of behavior. There were four response options 

for each behavior: “Definitely Yes”, “Yes Somewhat”, “Definitely No”, and “Not Relevant”. 

Every “Yes” response required a written comment in the space provided for that 

category. The Overall Rating Scale for each category was a seven-point scale, where the 

lowest rating indicated that there were “Definite Problems”, and the highest rating indicated 

that the MSG’s “Behavior [was] Always Exemplary”. 

Since a number of the behaviors in the checklist were most appropriate or most critical 

for countries with a high threat of counter intelligence activity (e.g., Eastern Bloc countries), 

these behaviors were identified as such. Examples are: behaviors related to “fraternization” 

and behaviors related to using the “buddy system” whenever leaving the 

Embassy compound. 

Trial Usage and Evaluation. As an additional check on the readiness of the Behavior 

Indicators Checklist, Detachment Commanders were asked to use it for several months 

on a “For Research Only” basis. Commanders were briefed on the purpose of the checklist 

and were instructed to fill one out for each MSG in their detachment, after the MSG 

had been with the detachment for 90 days. They were further instructed to mail completed 

checklists directly to the researchers. 

A total of 792 completed checklists were received. These were reviewed for response 

errors, e.g., checking two response options for one behavior; and for illogical patterns of 

responding, e.g., many negative behaviors checked in a category, with a very high overall 

rating for that category. Additionally, all written comments were reviewed. Based on 

these investigations, there did not appear to be any problems with the response format or 

with the overall clarity and understandability of the checklist. 

There was also an attempt made to gather some other criterion data for the MSGs for 

whom we had completed checklists, to see if the patterns of response on the checklist 

made at least intuitive sense when compared to another performance/reliability measure. 

525

_II_ -..-.- -_- -..__.. -. ..~ ..---- _ . ..-- --. .~ 

The most logical criterion data for this purpose were the available records of Non Judicial 

Punishments (NJP) and Reliefs for Cause (RFC). The base rates for these criteria, 

however, are so low that there were very few cases (N=40) where we had both a completed 

checklist and a record of either NJP or RFC. All 40 “matches” were with NJPs; 

there were no matches with RFC. In over half of these 40 cases, the NJPpredafed the 

completion of the checklist, so they were of no use in determining if the checklist could 

predict personnel problems. Checklists for the remaining few matches were examined 

and the response patterns did indeed seem to indicate that behavior problems were detected 

prior to the incident that incurred the Non Judicial Punishment. 

Near the end of the “For Research Only” usage period, a questionnaire was sent to all 

140 detachments, asking for an evaluation of the checklist and of the User’s Guide that 

accompanied it. Subjects covered by this questionnaire included: 

(1) Clarity of User’s Guide and checklist content; 

(2) Clarity of format (“user friendliness”); 

(3) Ease/Difficulty of making accurate ratings; 

(4) Time to complete the checklist/extent of administrative burden; 

(5) Completeness of checklist; 

(6) Usefulness of checklist; and 

(7) Recommendation for continued use. 

There were 106 questionnaires returned; a 76% return rate. The results can be summarized 

as follows. Both the User’s Guide and the checklist itself were reported to be clear, 

understandable, and “user friendly”. It was “fairly” to “very” easy to make accurate 

ratings. It took an average of 28 minutes to complete the checklist, and was considered 

to be a “reasonable” to “minimal” administrative burden (versus “excessive”). The list of 

behavior indicators on the checklist was considered to be “very complete“, and it was 

reported to be “pretty useful” (this was the second to highest usefulness response option; 

the highest was “extremely useful”). Recommendations regarding continued use of the 

checklist were: 

Yes, as it stands 76 

Yes, with revisions 20 

No 7 

No response 3 

106 

Of the twenty Detachment Commanders that indicated “Yes, with revisions”, most did 

not make specific recommendations for revision. Those who did comment on this 

recommendation referred to more procedural revisions, rather than revisions to the 

checklist items (e.g., use the checklist as a formal Counseling Sheet). 

Final Implementation of Checklist 

The final draft of the CVAL Behavior Indicators Checklist is now ready for implementation. 

An outline of the guidelines recommended for its use follows: 

526

(1) The keynote in interpreting the CVAL checklist is to look for behavioral change over 

time; to look for patterns that are out of character for that individual. For example, if 

a Marine is typically ,fairly quiet, then it should be of little concern that he doesn’t 

engage in a lot of casual conversation with his fellow Marines. If, on the other hand, 

a Marine is usually very outgoing and talkative, and he suddenly “goes quiet”, there 

may be a problem. 

(2) Not all behaviors on the checklist are particularly damning in and of themselves. 

Although there are no items on the checklist that represent perfectly healthy behavior 

for someone in the MSG position, there may be a reasonable explanation for an MSG 

exhibiting a particular behavior. Virtually every behavior on the checklist, however, 

should motivate the Detachment Commander to ask “Why?“. If there is no apparent 

reason for the behavior, attempts should be made to find out what the trouble is, for 

example, by observing the MSG more closely, or talking with him about the behavior. 

(3 ) The severity of some of the checklist behaviors depends significantly upon detachment 

location, For example, there are obvious differences in the implications some 

behaviors have for Eastern Bloc countries versus other countries. 

References 

Bosshardt, M. J., DuBois, D. A., & Crawford, K. (1990). Continuing assessment of 

cleared personnel in the militarv services: findings and recommendations (Institute 

Report No. 193). Minneapolis, MN: Personnel Decisions Research Institutes. 

DuBois, D. A., Bosshardt, M. J., & Crawford, K. (1990). Continuing assessment of 

cleared personnel in the military services: a conceptual analvsis and literature 

review (Institute Report No. 190). Minneapolis, MN: Personnel Decisions Research 

Institutes. 

Houston, J. S. (1989). Development of measures of Marine Securitv Guard performance 

and behavioral reliabilitv (Institute Report No. 171). Minneapolis, MN: Personnel 

Decisions Research Institutes. 

. 

Houston J. S., Wiskoff, M. F., 8z Sherman, F. (In press). A measure of behavioral reliability 

for Marine Security Guards: A final report (PERSEREC-SR-90-m). 

Monterey, CA: Defense Personnel Security Research and Education Center. 

Parker, J. P., Wiskoff, M. F., McDaniel, M. A., Zimmerman, R. A., & Sherman, F. 

(1989). Development of the Marine Security Guard Life Experiences Questionnaire 

(PERSEREC-SR-89408). Monterey, CA: Defense Personnel Security Research 

and Education Center. 

Wiskoff, M. F., Parker, J. P., Zimmerman, R. A., 8z Sherman, F. (1989). Predicting 

school and job performance of Marine Security Guards (PERSEREC-SR-89-013). 

Monterey, CA: Defense Personnel Security Research and Education Center. 

527

. 

SYMPOSIUM: JOB PERFORMANCE TESTING FOR ENLISTED PERSONNEL 

J. H. Harris (Chair), Charlotte H. Campbell, 

and Roy C. Campbell 

NO ABSTRACT RECEIVED 

528

NAVY ': HANDS-ON AND KNOWLEDGE TESTS FOR THE NAVY RADIOMAN 

Earl L. Doyle and Roy C. Campbell 


Introduction 

The Navy approach to the Job Performance Measurement Project focussed on 

the development of benchmark hands-on job proficiency tests which would, in 

turn, guide the development of written task-specific tests and written general 

knowledge tests that could be used as substitute measures of job performance, 

One of the jobs selected for this effort was the entry level Radioman (RM). 

These individuals qualify for their rating by graduating from the Navy Class A 

Radioman school at San Diego, California. After qualification they typically 

serve in one of two types of&facilities--either a shore-based installation or 

on board ship, 

This paper will review the major steps in development, the highlights of 

field test administration, and the principal findings of this research. 

Hands-On Tests 

Test Development 

Tasks to be tested were selected by a panel of experts consisting 

primarily of Senior Radiomen from the Navy Class A Radioman School (Lammlein, 

1987). Twenty-two tasks were initially identified. Project test developers, 

working with Radioman School instructional staff, integrated those tasks that 

are normally closely associated when performed on the job. This resulted in 

the development of the 14 tests shown in Table 1. 

Table 1 

Radioman Tasks for Hands-On Tests 

*Act as a Broadcast Operator 

*File Messages 

Change Paper/Ribbons on Teletype 

Establish System - November 

Perform Maintenance on Receiver 

*Prepare Message - DO173 

*Type/Format/Edit Message 

*Indicates product scored test. 

*Log Incoming Messages 

*Manually Route Messages 

Establish System - Golf 

*Inventory Classified Documents 

Perform Maintenance on Transmitter 

*Verify Outgoing Message 

*Prioritize Outgoing Messages by 

Precedence and Time 

The developed tests were based on an analysis of the individual and 

component tasks and consisted of dichotomously scored (GO/NO-GO) performance 

measures corresponding to steps done or characteristics of products produced. 

529

The performance tests utilized product scoring wherever a product was produced 

as a complete or partial result of the performance. Where feasible, product 

scoring is desirable because, correctly administered, it can enhance 

reliability. The nine tests that utilized at least partial product scoring 

are identified as such in Table 1. 

In addition to the scoresheets, developers prepared equipment setup 

instructions, instructions to the examinees, and scoring instructions. The 

entire test was designed to be administered at a single station using either 

an actual or a simulated ship's radio shack. Although the 14 tests were 

independent, they were operationally interconnected so they fit logically and 

sequentially into the test situation and location. 

Written Tests 

Written tests were developed that corresponded to the 22 tasks covered 

in the hands-on tests. Three features characterized these tests: 

. The tests were performance or performance-based. Items were 

based on either performing the same steps required in the 

hands-on test or in answering a question of how a step is done. 

� The tests were founded on performance errors. To insure items 

were performance oriented, the causes of error in performance 

were identified. Error was identified as having four origins: 

The Radioman did not know where to perform (location), did not 

know when to perform (sequence), did not know what the product 

of correct performance was (recognition), or did not know how 

to perform (technique). 

� The tests provided likely behavioral alternatives. Incorrect 

alternatives were based on likely errors that were possible and 

do occur on the job. Incorrect alternatives also had to be 

wrong, not merely less desirable than the correct alternative. 

The development result was an 87 item test in a multiple choice format 

that was organized into 11 topical, functional task areas that generally 

corresponded to the 14 hands-on test areas. (Several of the hands-on test 

areas that needed to be treated separately for administrative and equipment 

set-up requirements were combined for the written test, and one written test 

task area did not survive validation.) These 11 written test areas were 

organized so they could be administered and analyzed independently. 

General Knowledae Test 

The third area of RM testing was a written general knowledge test. Like 

the written performance test, this was a multiple choice test and was based on 

the same tasks that generated the hands-on tests. The difference between the 

two written tests was that the written performance test was specifically 

designed to measure performance while the general knowledge test measured the 

application of knowledge to the task subject--which may not necessarily 

reflect performance. For example, the written performance test might describe 

a situation and ask what EMCOM condition should be imposed under those 

circumstances; the general knowledge test might ask what EMCOM is. 

530

The general knowledge test consisted of 98 items. It was not separated 

by task or functional area, and in administration and analysis was treated as 

a single test. 

Test Administration 

The field tests were administered to 61 Radiomen, all of whom were 

graduates of the Class A Radioman School, were in paygrades E-2, E-3, and E-4, 

and had graduated from the School between 1 and 59 months prior to testing. 

(Of the tested population, 79% were in paygrade E-3 and 60% were in the 12 

months to 35 months experience window.) Twenty-eight of the 61 sailors tested 

were assigned to shore installations at the time of testing and 33 were aboard 

ships. 

Testing was conducted at two locations and about a month apart. Testing 

lasted for 8 hours for each 'examinee and the three components of the test were 

sequentially counterbalanced. Five hands-on scorers were used. All scorers 

were project staff and had received extensive task/test training and 

calibration. Each Radioman was scored independently by at least two scorers 

for each hands-on test. 

Field Test Results 

Although a wide variety of analyses were conducted (Ford, Doyle, 

Schultz, & Hoffman, 1987), this paper will focus on four main areas of 

interest. Specifically: 

� Interrater reliability of the hands-on tests. 

� Internal consistency within test methods. 

� Intercorrelations among test methods. 

� Assignment effect (ship vs. shore). 

Interrater Reliabilitvsf the Hands-On Tests 

Interrater reliability estimates were computed from a generalizability 

theory in which absolute generalizability coefficients were produced (SAS, 

1982; Brennan, Jarjoura & Deaton, 1980). Generalizability estimates were 

obtained as if only one rater score were produced and for an average of the 

two raters, as shown in Table 2. 

The reliabilities are exceptionally high. This is attributed to the 

influence of the firm control over the scorers that was possible because they 

were members of project staff and, secondly, due to the high incidence of 

product scoring among the tested tasks. 

Internal Consistencv Within Test Methods 

Intertask correlations were conducted for the hands-on, written, and 

general knowledge tests (the general knowledge test was analyzed for interitem 

correlations since it was treated as a single test) and are presented in 

Table 3. The obtained coefficients demonstrate acceptable levels of internal 

consistency. 

531

_--. _-...- __--- . . . . .__.. -. - _- __ 

Table 2 

Generalizability Coefficients for Hands-On Tests 

Task One Rater Two Raters 

*Broadcast Operator 0.96 0.98 

*Log Messages 0.96 0.98 

*File Messages 0.95 0.98 

*Manually Route Messages 0.95 0.98 

Change Paper/Ribbons 0.60 0.75 

Establish System - Golf 0.91 0.95 

Establish System - November 0.94 0.97 

*Inventory Classified Documents 0.69 0.82 

Preventive Maintenance - Receiver 0.93 0.96 

Preventive Maintenance - Transmitter 0.95 0.98 

*Prepare Message DD173 0.98 0.99 

*Prioritize Outgoing Messages 0.90 0.95 

*Type/Format/Edit 0.96 0.98 

*Verify Outgoing Messages 0.97 0.98 

*Indicates primarily product scored tests. 

Table 3 

Intertask/Item Correlations by Test Component 

Component Correlation 

Hands-On 0.89 

Written Performance 0.74 

General Knowledge 0.71 

Intercorrelations Amona Test Methods 

Correlations, particularly between hands-on and written performance 

tests, are important because of the possibility of substituting written tests 

for resource-demanding hands-on tests. The correlations between written and 

hands-on task tests are shown in Table 4 and the overall correlations between 

test methods is shown in Table 5. 

532

I 

Table 4 

Correlations Between Written Tests and Hands-On Tests 

Written Tests 

Broadcast Operator 

Maintain Comm Center File 

Manually Route Messages 

Establish Systems - Golf/November 

Inventory Classified Documents 

Preventive Maintenance - Receiver 

Preventive Maintenance - Transmitter 

Verify Outgoing Message , 

Prioritize Messages 

Type/Format/Edit 

Prepare Message DD173 

Correlation 

.228 

.370** & 

.282* � 

422** 

.596** & 

.lOl - 557** 

.523** 

.234* 

.299* 

.375** 

.093 

.447** 

Note. Double correlation figures indicate a single written test covered two 

hands-on tests. 

*Significance: pc.05. **Significance: PC.01 

Table 5 

Correlations Among Scores by Test Method 

Test Method Hands-On Written Performance General Knowledge 

Hands-On 

Written Performance 

General Knowledge 

*Significance: px.01 

.71* .61* 

.71* .68" 

.61* .68* 

These correlations are very high. In a previous study (Rumsey, Osborn, 

&,Ford 1985) the authors looked at correlations between hands-on and written 

tests for 28 occupations in which the overall correlation for the 28 jobs is 

.41. Selecting the eight occupations that are similar to the Radioman, the 

hands-on--written correlation is .45, and for the military job most like the 

Radioman's --the Army Radio-Teletype Operator--the correlation is .37. Again, 

much of the notable results for the Radioman in this area is believed to be 

directly a result of the high rater reliability performance. 

533

I 

Assianment Effect (Ship vs. Shore) 

A comparison of the performance of Radiomen on all test methods revealed 

marked differences depending on whethe** the sailors were shore-base;o;rt;;ipbased, 

with the ship-based examinees cl)nsistently scoring higher. 

hands-on tests, this difference was significant (at p

1 

Interrater Reliability as an Indicator of 

HOPT Quality Control EffeCtiVeAeSs 

Major P. J Exner, USMC 

HQ USMC 

Jennifer L. Crafts 

Daniel B. Felker 

Edmund C. Bowler 

American Institutes for Research 

Paul W. Mayberry 

Center for Naval Analyses 

The United States Marine Corps Job Performance Measurement 

Project is attempting to validate enlistment quality requirements 

against actual on-the-job requirements. Since there are nearly 

500 Military Occupation Specialties (MOSS), developing hands-on 

performance tests (HOPTS) for each MOS is impractical. Therefore 

the Marine Corps has elected to test relatively large numbers of 

Marines in a few critical MOSS in each of the four Armed Services 

Vocational Aptitude Battery composites used for classification. 

Testing began with the General Technical (GT) composite in 

1986-87 for the infantry occupational field. In 1989-90 tests for 

Mechanical Maintenance (MM) composite MOSS were developed and 

administered. In August, 1990, hands-on testing was completed on 

approximately 1900 Marine automotive and helicopter mechanics. 

Because of the many possible sources of error in the 

development and administration of HOPTS, quality control is 

critical at every step. Poor test design or execution can 

significantly reduce validities and diminish the value of the 

results. In a preliminary Marine Corps study Maier (1988) 

reported a large reduction in validities due to various errors. 

Such errors can include content, test design, test administrator 

(TA) training, environmental, temporal, and other effects. One 

indicator of possible problems is interrater reliability,.. or TA 

agreement. 

In this paper, we will'review the quality control 

measures used in MM testing and examine preliminary reliability 

results across task, test site, MOS, and time. 

A series of quality control measures were used to ensure the 

quality of hands-on performance data. They include: recruitment 

of former or retired Marines to serve as TAs; selection of TA 

applicants based on scores on structured interviews; standardized 

test site setup;- extensive and ongoing training of TAs; rotation 

This research was funded by Contract No. N00014-87-C-0001 and by 

subcontract CNA 4-89. ~11 statements expressed in this paper are 

those of the authors and do not necessarily reflect the official 

views or policies of the Department of the Navy or the U.S. Marine 

Corps. 

535

- 

t 

of TAs across tasks: shadow scoring: on-site data entry; and 

ongoing counselling of TAs. 

Recruiting of Former/Retired Marines as TAs 

We sought former or retired Marines to serve 'as TAs, 

preferably those with experience in the MOSS which were tested. 

This offered several advantages over using civilians or active 

duty Marines. Their Marine Corps background enabled them to 

relate better to the examinees and promoted a more realistic 

testing atmosphere. Also, using former rather than active duty 

Marines eliminated a possible bias of Staff Non-Commissioned 

Officers toward their troops. Former Marines would have no 

vested interest in seeing that "their" mechanics performed well. 

Selection of TAs Based on Structured Interviews 

All TA candidates were screened using a structured interview 

which evaluated their suitability in several categories. 

Applicants were questioned concerning their previous experience 

in six areas: performance of mechanical tasks; test 

administration; administrative duties: planning and organization: 

public speaking; and vehicle maintenance. For each dimension, 

applicants were evaluated using a three point scale indicating 

noI moderate, or high familiarity. There were more applicants 

at the East Coast test sites, but overall TA quality was high at 

all locations. West Coast TAs for helicopter testing tended to 

be less experienced former Marines than at all other locations. 

Standardized Test Site Set Un 

Testing was conducted at five test sites. There was one site 

for automotive testing on each coast, and a single test site for 

helicopter testing on the East Coast. Due to the wide separation 

of helicopter assets on the West Coast, it was necessary to set 

up two test sites there. To reduce site differences, the same 

people were involved in establishing the site requirements and 

setup procedures at all test sites for air or ground. Where more 

than one test site was set up simultaneously, individuals 

directing the set up had previous experience at another test 

site. Site directors at all sites were involved in the site 

requirements determination from beginning to end. Standardized 

aircraft/vehicle, test equipment, parts, tools, publications, and 

other requirements lists were prepared for all sites. Local 

variations in equipment brands, procedures, and facilities were 

carefully analyzed for their possible impact and eliminated or 

minimized across all sites. 

Extensive and Onqoing TA Traininq 

TAs underwent a thorough week-long training program. Most 

had served in the Marine Corps where training had been an 

integral part of their responsibilities for years. We stressed 

the requirement to avoid giving feedback to the examinee which 

536

might influence task performance. TAs were trained on how to 

perform each task they were to evaluate and practiced them under 

the supervision of active duty subject matter experts. This 

included role playing and deliberate errors on the part of the 

V'examineelt to check TA consistency and develop standardized 

scoring of irregular responses. Once test administration was 

begun, there were periodic review of steps with low interrater 

reliabilities, with retraining where necessary. 

Rotation of TAs Across Tasks 

TAs were trained in multiple tasks to allow them to rotate 

among test stations. This lessened the effect of boredom, 

provided a cross check on the standardization of scoring in each 

task, and reduced the impact of TA differences on scoring. 

Shadow Scorinq 

Perhaps the most important quality control procedure, shadow 

scoring involved independent evaluation of an individual's task 

performance by two TAs simultaneously. Shadow scorers were used 

to monitor TA performance and test reliability, and were 

systematically scheduled to capture interactions among testing 

order and individual TA characteristics. 

On-Site Data Entrv Trend Analvsis 

A Hands-On Score Entry System (HOSES) was developed to enter, 

verify, and report analyses-of collected data. Daily on-site 

data entry enhanced completeness of data and allowed for early 

identification of problems with the tests, TA consistency, and 

score drift over time. HOSES generated three reports which were 

used by site hands-on managers to improve scoring reliability. 

1. Data Entry Report. All data were entered twice. This report 

verified that there were no discrepancies between the two 

entries. It also reported any missing data so the information 

could be tracked down on the day of original testing. This 

greatly reduced the amount of missing data. 

2. The Detailed Discrepancy Report listed al'1 steps where 

primary and shadow scorers disagreed. It also gave percent 

disagreement for each task, and overall daily total by TA. 

3. The Summary Report presented cumulative historical summaries 

by TA and task. TA summaries showed leniency and reliability 

information for each task administered by the TA. Leniency was 

measured as a deviation from the mean percentage of "GOATS for 

all TAs on each task. Reliability indicated disagreement with 

all other TA's on each task. These were valuable in identifying 

individual TA problems. Since this report could be broken out by 

time, it also provided trend information. Task summaries showed 

percent "Go" and disagreement for each step. This helped focus 

on test effect problems, i.e. those common across all TAs. 

537

Hands-on managers used these reports extensively. Differences 

among TAs were discussed, and ambiguities in interpretation of 

scoring rules were resolved through discussion and, if required, 

additional training. Individual and group trends could be 

detected. Individual TA counselling focused on adherence to the 

original training standards and the definition and interpretation 

of scoreable steps. Hands-on managers avoided overemphasis 

on consistency to prevent artificially high levels of agreement. 

Interrater Reliability Results 

Interrater reliability, or TA agreement, can indicate the 

presence of several possible error sources: test design, time, 

t environmental, or other effects. Interrater reliability is the 

percentage agreement between primary and shadow scorers on 

individual task steps, It is computed by dividing the number of 

steps on which the primary and shadow scorer agreed by the total 

number they both graded, summed across all examinees and all 

tasks. It was calculated using all obsenrations where both 

primary and shadow step scores were available. 

Fig. 1: Agreement by task 

Age-t bctrrem ,nimwy md rhodow test odminirtrotor, 

b y time intcrvd 

Fig. 2: Agreement by Time 

Period 

Figure 1 shows scorer agreement across tasks for automotive 

mechanics. Agreement ranged from .873 to .971 indicating that 

TAs could reliably differentiate tlG~t' and "No Go" performance. 

The lowest reliabilities at both sites were on troubleshooting 

tasks, indicating some ambiguity in scoring the steps on those 

tasks. Three of the lowest four reliabilities occurred on tasks 

which were hard to observe because of confined spaces. The fact 

that the relative reliabilities among tasks were the same between 

sites also indicates a good training program and suggests that 

reliability differences were due to test effects. 

538

Figure 2 shows temporal effects at auto mechanic test sites. 

Site A experienced a slight drop in agreement in the middle time 

period. The decline in reliability was noted at the time, so 

counselling and retraining were conducted, resulting in an 

increase during the final period. Site B agreement increased 

during all three periods. The overall increase in agreement at 

both sites is natural given increasing familiarity of the TAs 

with the scoring standards over time. The fact that the increase 

is relatively small indicates that the initial training program 

prepared the TAS very well. Again, this points to test effects 

as a likely cause of the differences in reliabilities. 

This same trend carries over into the helicopter mechanic 

t reliabilities despite some differences in their HOPTs. Whereas 

all automotive mechanics were given the same HOPT, each 

helicopter mechanic MOS had its own test. Helicopter MOSS were 

tested sequentially, so temporal effects are evident across 

aircraft type, as shown in Figure 3. Since test order varied 

across site, increased reliability is not indicated left to right 

for both sites. Test order for Site C was CH-53A/D, CH-46, 

CH-53E, and UH/AH-1. Site D order was UH/AH-1, CH-46, and CH- 

53E. No CH-53A/D were tested at Site D. Taking this test order 

into account we see that agreement increased over time at both 

sites, except for the CH-46 at Site C. 

Agasmcnt between +-7wy md shodor ted odnhistrotorr b y oicroft 

Fig. 3: Agreement by Aircraft Fig. 4: Agreement by Interview 

Ratings 

The drop in CH-46 agreement is explainable in terms of 

variation in test conditions. At Site D, all examinees in 2 

particular MOS were tested on the same aircraft. At Site C, each 

unit set up its own aircraft, resulting in changing conditi

c 

mechanic among the TAs (the rest were from the other three 

aircraft). The reduced agreement may partly reflect this 

diminished commonality of experience among the TAs. Even so, 

over time, the continuing training program and skill transfer 

across aircraft resulted in overall increased reliability. 

Figure 4 plots agreement versus the initial TA interview 

ratings. The strongest correlation with agreement was for TA 

applicants who rated high in test administration, public 

speaking, and administrative experience. The negative effect of 

maintenance and mechanical familiarity may indicate a bias 

resulting from experience. Yet in all cases, reliabilities were 

acceptable. Interestingly, among all MM TAs, there was no 

significant difference in between TAs who had also senred on the 

infantry project several years earlier and the mechanics hired 

for this project. 

Conclusion 

The high reliabilities found in the preliminary analysis are 

encouraging. They indicate that the TA training program was 

sound, scoring was well standardized across sites, and that the 

HOPT steps were discrete, and consistently measurable. There 

were also no indicators. of any significant test effects or other 

systematic problems with the test that would preclude achieving 

the high validities obtained in the infantry study. Finally, the 

results have implications for HOPT Test Administrator selection. 

This analysis seems to indicate that such qualities as previous 

test administration experience and public speaking are more 

important than experience in the particular field being tested. 

Reference 

Maier, M. ,H. (1988). On the Need for Quality Control in 

Validation Research. Personnel Psvcholosv, 41, 497-502. 

540

ARMY: JOB PERFORMANCE MEASURES FOR NON-COMMISSIONED OFFICERS 

Charlotte H. Campbell and Roy C. Campbell 


The Army approach to criterion measurement for the JPM project focuses 

on two stages in the enlisted person's service time: after about two years in 

service, and after three to five years, as a non-commissioned officer (NCO, 

corporal E4 or sergeant Es). In this presentation, we report on the job 

analysis, development of written test, job sample test, and rating scale 

instruments, and testing results for NCOs. The analysis and testing were 

conducted on nine jobs, or Military Occupational Specialties (MOS), listed in 

Table 1. 

Table 1 

Army Military Occupational Specialties (MOS) 

11B 

13B 

19E 

31c 

63B 

71L 

aaM 

91B 

95B 

Infantryman 

Cannon Crewmember 

Armor Crewman 

Single Channel Radio Operator 

Light Wheel Vehicle Mechanic 

Administrative Specialist 

Motor Transport Operator 

Medical NC0 

Military Specialist 

-Job Analysis 

For-each M?S,-a job analysis was perforped by aggregat - . . 

ing al 1 availab le 

information to define a population of tasks.' Squrces ot Job- and _ - _ task- . - 

analytic information included Soldier's Manuals (both MOS-specific and Common 

Task), Army Occupational Survey Program data on performance frequency, data on 

IJob analysis details may be found in J. P. Campbell (Ed.), Improvina the 

Selection, Classification, and Utilization of Armv Enlisted Personnel: Annual 

Report, 1987 Fiscal Year (HumRRO Report IR-PRD-88-18), October 1987. 

This research was funded by the Army Research Institute on two projects: Improvirrq 

Classification, and Utilization of Army Enlisted Personnel (Project A) (Project No. MDA903-82-C-0531), and 

Building the Career Force (Project No. MDA903-89-C-0202). Project Director is J. H. Harris, and Principai 

Scientist is J. C. Campbell, both of Human Resources Research Organization. Contracting Officer's Technica! 

Representative is Dr. M. 6. Rumsey, who is the Chief of the Selection and Classification Technical Area of 

the Army Research Institute for the Behavioral and Social Sciences. The views expressed herein are those of 

the authors and do not necessarily represent the official position of the Army Research Institute or the 


541

, 

frequency and importance of supervisory tasks from a special administration of 

the Leader Requirements Survey, collection and content analysis of critical 

incidents, and interviews with MOS incumbents. 

The resulting job domain included supervisory, common, and MOS-specific 

tasks and behaviors. Army policy designates certain tasks as being part of 

the job for corporals and sergeants; tasks at lower skill levels were included 

in the domain because of the Army's policy that soldiers are responsible for 

such tasks, and tasks at higher skill levels were included if there was 

evidence that soldiers in fact performed such tasks. 

Instrument Development 2 

Information collected using the critical incident methodology was used 

to construct a series of rating scales for each MOS, as well as scales that 

were not specific to any one MOS but rather reflected Army-wide behaviors. 

These scales were used to measure behaviors on all three components of the job 

domain -- supervisory, common, and MOS-specific -- by means of ratings 

collected from soldiers' supervisors. The 7-point rating scales were 

behaviorally-anchored, that is, short descriptions of behaviors that 

characterize the low, middle, and high points of each of the scales were 

provided. Army-wide supervisory behaviors (e.g., Monitoring, Organizing 

Missions and Operations) were addressed by 12 of the scales, 9 scales were 

Army-wide and non-supervisory (or common, e.g., Following Regulations and 

Orders, Physical Fitness), and for each MOS there were between 7 and 14 MOSspecific 

dimensions. 

For the task-based information, judgments were obtained from subject 

matter experts (SMES) on several task parameters, including performance 

difficulty, performance variability, and criticality. The task list for each 

MOS was clustered into functional areas, and a second panel of SMEs selected 

proportional systematic samples from the task population. These task samples 

were subjected to formal reviews by the proponent. 

At this point, the task-based instrument development process diverged 

into four separate approaches: Job knowledge (written) tests, hands-on job 

sample tests, role-play simulations, and written situational judgment tests. 

Multiple-choice job knowledge test items were constructed for all of the MOSspecific 

and common tasks selected for each MOS. These tests are 

characterized by their orientation on task performance and by the extensive 

use of graphics and job-relevant contextual information. For each MOS, a onehour 

test of both common and MOS-specific tasks was prepared, comprising 

approximately 120 items. Two scores were constructed, for common tasks and 

*Details of instrument development are presented in J. P. Campbell (Ed.), 

Buildina the Career Force, First Year Report (in preparation). Rating scales 

development and Situational Judgment Test development were directed by W. C. 

Borman and M. Hanson of Personnel Designs Research Institute, Inc. Role-play 

development was directed by E. D. Pulakos of Human Resources Research 

Organization and D. Whetzel of the American Institutes for Research. 

Development of hands-on and job knowledge tests was directed by C. H. Campbell 

and R. C. Campbell of Human Resources Research Organization, and D. C. Felker 

of the American Institutes for Research. 

542

for MOS-specific tasks, as the percent of items answered correctly on tasks in 

each area. 

Hands-on job sample tests were developed to test performance on 8-14 of 

the tasks selected for each MOS. The tasks that were allocated to the handson 

component included, by design, both common and MOS-specific tasks, at the 

target skill level as well as lower and higher skill levels, and from as many 

functional areas as was feasible for testing. Scores were constructed as the 

percent of steps performed correctly for a given task, averaged across the 

common or MOS-specific tasks. 

Examination of the supervisory tasks selected for each MOS revealed a 

common structure of three areas of supervisory behaviors across the nine MOS: 

Personal Counseling, Disciplinary Counseling, and Training. To measure these 

three aspects of the job, simulation exercises (role-plays) were developed. 

The role of a private was played by a trained civilian test scorer (three 

different scorers performed the three roles for a given soldier). At the 

conclusion of a role-play, the actor/scorer rated the soldier on 12-18 aspects 

of behavior during the exercise. Each aspect was rated by means of 3-point 

behaviorally-anchored rating scale, and an overall score was computed as the 

average across the three role-plays of the mean rating on items within the 

role-play. 

The written situational judgment tests were designed to tap those areas 

of supervisory behaviors that could not be included in the role-plays. They 

were intended to evaluate the effectiveness of the NCO's judgments about what 

to do in difficult supervisory situations, and were meant to tap the cognitive 

aspects of first-line supervisory practice in the Army. The test contained 35 

items, consisting of a situation and 3-5 alternative courses of action; 

soldiers indicated which response alternatives they believed to be the most 

and the least effective. Effectiveness weights were assigned to each response 

of each item with the assistance of the Sergeants-Major Academy, and item 

scores were computed as the weight of the soldier's "Most Effective" response 

minus the weight of the soldier's "Least Effective" response. The total score 

was the mean of the item scores. 

Figure 1 portrays the test mode (written, job sample, and ratings) by 

job component (supervisory, common task, and MOS-specific) coverage among the 

testing instruments. 

Test Administration and Results 

Data were collected from 1009 soldiers and their supervisors (rating 

scales only) in the nine MOS at 13 Army posts CONUS and in Germany. The 

hands-on tests were administered by NC0 scorers under the supervision of 

trained civilian staff; all other instruments were administered by trained 

members of the project staff. 

Table 2 gives the basic statistical characteristics for each instrument, 

across the nine MOS. For every instrument, the mean scores are above the 

midpoint. However, there is no great evidence of skew in the data, and the 

reliability estimates are satisfactory. 

543

Test Mode Supervisory 

WRITTEN TESTS Situational Judgment Test 

(Mean of effectiveness 

weight for "M' responses 

minus effectiveness weight 

for "L" responses) 

- 

Job Components 

CommxT MOS-Specific 

Job Knowledge Tests Job Knowledge Tests 

of Comnon Tasks of MOS-Specific Tasks 

(Percent items correct) (Percent items correct) 

JOB SAMPLE TESTS Supervisory Role-Plays Hands-On Tests Hands-On Tests 

(Mean across role-plays of Cornnon Tasks of MOS-Specific Tasks 

of ratings on 3-point (Mean across tasks of (Mean across tasks of 

effective behavior scales) percent steps passed) percent steps passed) 

RATINGS Rating Scales - Army Wide Rating Scales - Army Wide Rating Scales - 

Supervisory Dimensions Non-Supervisory Dimensions MOS-Specific Scales 

(Mean across dimensions (Mean across dimensions (Mean across dimensions 

of supervisor ratings on of supervisor ratings on of supervisor ratings on 

7-point rating scales) 7-point rating scales) 7-point rating scales) 

Figure 1. Testing instruments providing coverage of each job component, by 

test mode. 

Table 2 

Statistical Characteristics of Test Instruments Across Nine MOS 

Supervisory Comnon MOS-Specific 

Mean SD Rel. Mean SD Rel. Mean SD Rel. 

Situational Judgment Tests 1.37 .60 .75 

Job Knowledge Tests 65.4 12.5 .79 64.9 13.5 .73 

Supervisory Role-Plays 2.26 .42 .71 

Hands-On Tests 72.6 15.4 .46 69.4 19.5 .44 

Rating Scales - Army-Wide 4.49 1.06 .50 5.13 1.13 .48 

Rating Scales - MOS-Specific 5.19 0.97 .43 

Note. Situational judgment test results ranged from -.77 to 2.57 (thus the mean score of 1.37 is roughly 

equivalent to a score of 4.46 on a 7-point scale, with a standard deviation of 1.26): reliability estimate 

is split-half on items, corrected to test length. 

Job knowledge tests and hands-on test scores are proportions correct: reliability estimate for job knowledge 

tests is the median across MOS of a split-half on odd-even items, corrected to test length; the reliability 

estimate for hands-on tests is the median across MOS of the split-half on task scores, corrected to number 

of tasks. 

Ratings were made on a 7-point scale, where a 1 represents poor performance; reliability estimates are onerater 

reliabilities across dimensions, using the median across MOS for MOS-specific ratings. 

Role-play ratings were made on a 3-point scale, where a 1 represents less effective supervision; reliability 

eStim&eS are the median one-rater reliability across items, averaged across the three role-plays. 

544

Table 3 shows the intercorrelations among the instruments across the 

nine MOS. For the rating scales and, to a lesser degree, for the written 

tests, there are high correlations across the different job components. This 

may indicate that the test mode itself is responsible for much of the observed 

variance. Because the raters for a soldier were the same individuals for all 

three sets of scales, we would expect the results to be correlated; likewise, 

we expect scores on written multiple-choice tests to be correlated simply 

because of the cognitive processing burden imposed by the written material. 

The job sample tests, on the other hand, are less affected by the similarity 

of method, not surprising in view of the fact that nearly every job sample 

exercise (hands-on task test or role-play situation) is conducted and scored 

by a different administrator. 

Table 3 

Intercorrelations (Uncorrected) Among Test Modes (Written, Job Sample, and 

Rating Scales) and Job Components (Supervisory, Common Task, and MOS-Specific) 

WRITTEN MODE 

Supervisory (Situational Test) 

Cornnon Task (Job Knowledge Test) 

MOS-Specific (Job Knowledge Test) 

JOB SAMPLE MODE 

Supervisory (Role-Plays) 

Comnon Task (Hands-On Test) 

MOS-Specific (Hands-On Test) 

RATINGS MODE 

Supervisory (Army-Wide Ratings) 

Comn Task (Army-Wide Ratings) 

MOS-Specific (MOS Ratings) 

Written Mode Job Sample Mode 

Sup. Comn. MOS Sup. Comn. MOS 

1.00 

.40 1.00 

.34 .48 1.00 

.12 .19 .13 1.00 

.09 .30 .20 .I0 1.00 

.ll .23 .42 .06 .17 1.00 

.I7 .13 .13 -10 .08 .08 I..00 

Ratinqs Mode 

Sup. Comn. MOS 

.13 -12 .09 .07 -06 .09 .71 1.00 

.ll .15 .12 .05 .07 -09 .74 .64 1.00 

The correlations between different test modes measuring the same job 

components are highlighted in the table. The correlations between the two 

task-based instruments (job knowledge tests and hands-on tests) are relatively 

high even across the job components of common tasks and MOS-specific tasks. 

At the same time, the cognitive aspects of supervisory activities seem to be 

related to observed supervisory skill (ratings) to a greater degree than to 

job samples of supervisory behaviors. It appears that, for common and MOS 

tasks, knowins how to perform and beinq able to perform are more highly 

related than either of those is to actuallv performinq on the job. However, 

545

for the less easily defined and analyzed supervisory performance, knowinq 

effective wavs to supervise and beinq rated as a qood supervisor are more 

highly related than either of those is to demonstratinq supervisory skills on 

a role-play. 

Discussion 

Hands-on job sample tests and written job knowledge tests are frequently 

used in military performance measurement situations. Whenever we have welldefined 

tasks, with unequivocal task analyses that include the initiating cues 

and performance standards and that permit the identification of correct and 

incorrect actions, we can construct job knowledge tests or job sample tests. 

(Whether or not the tests are administrable within available or reasonable 

resources is another issue.) These types of tests are widely used because 

what they measure -- declarative and procedural knowledge, ability to 

perform -- is fairly well-understood. However, the assessment of "typical" 

performance (as opposed to ability or knowledge) is more difficult, and the 

use of anchored rating scales provides us a method that is arguably less 

precise -- but so is the target behavior less precise. Measurement of 

supervisory skills has long been regarded as difficult at best. Like 

"leadership," these skills are often referred to as "intangible," as though we 

are unsure of their existence, The situational judgment tests and the roleplays 

are, however, measuring something, and with a respectable degree of 

reliability. Continued attention to the development of these instruments, and 

to ways of assessing their dimensionality, should yield useful information to 

the military testing community. 

References 

Campbell, J. P., Ed. (October 1987). Improvinq the Selection, Classification, 

and Utilization of Army Enlisted Personnel: Annual Report, 1987 Fiscal 

Year (HumRRO Report IR-PRD-88-18). Alexandria, VA: Human Resources 


Campbell, J. P., Ed. (in preparation), Buildinq the Career Force: First 

Year Report. Alexandria, VA: Human Resources Research Organization. 

546

The USAF Occupational Measurement Squadron: 

Its Organization, Products, and Impact 

Joan T. Brooks 

William J. Carle 

Johnnie C. Harris 

Paul P. Stanley II 

Joseph S. Tartell 


The- USAF Occupational Measurement Squadron (USAFOMS) represents the operational 

application of two major thrusts in industrial psychology in the Air 

Force : personnel testing and occupational analysis. Each of the USAFOMS’s 

four major programs reflects in its own way how these important technologies, 

which began as research efforts, have been applied to real-world problems to 

support Air Force mission accomplishment. Out of personnel testing grew the 

USAFOMS’s Occupational Test Development Program and the Professional Development 

Program. Out of occupational analysis grew the Occupational Analysis 

Program and the Training Development Services Program. 

A Brief History of the Squadron 

In 1970, the implementation of the Weighted Airman Promotion System (WAPS) 

triggered the establishment of a new organization within the headquarters of 

the Air Training Command (ATC), with the cryptic title of “Detachment 17.” 

Detachment 17 consisted of two branches, one responsible for test development, 

the other for occupational analysis. In 1974, the Air Force-wide 

impact of this organization's missions was recognized when it became the USAF 

Occupational Measurement Center. In October 1990, the unit, which is located 

at Randolph Air Force Base, Texas, was renamed the USAF Occupational Measurement 

Squadron. The USAFOMS Commander also sits on the staff of the Deputy 

Chief of Staff for Technical Training as the Director of Occupational Mea- 

surement. 

The Occupational Test Development Proqram 

In the 1950s and 196Os, pencil-and-paper tests were mainly used in training 

programs, to assess trainee progress. The implementation of WAPS, however, 

made tests a critical factor in enlisted career progression. 

The idea of WAPS was to take the mystery out of the promotion system by 

making every aspect visible to those competing for promotion. Under WAPS, 

airmen compete for promotion to the ranks of staff sergeant (E-51 through 

master sergeant (E-7) with other airmen in the same Air Force specialty (AFS) 

on the basis of a single score. This single WAPS score is the sum of six 

‘component measures (See Table 11, with USAFOMS tests accounting for up to 44% 

of the total. Most airmen take two tests: the Specialty Knowledge Test 

(SKI) measures knowledge of the Air Force specialty and the Promotion Fitness 

Examination (PFE) tests knowledge of general military subjects. Because the 

other, non-test factors typically do little to disperse promotion competi- 

tors, the SKT and PFE are often the deciding factor in determining who gets 

promoted. 

547

Table 1. Weighted Airman Promotion System Factors 

FACTOR 

SKT Score 

PFE Score 

Time in Service 

Time in Grade 

Enlisted Performance Ratings 

Awards and Decorations 

MAXIMUM PERCENTAGE 

POINTS VALUE 

100 22% 

100 22% 

40 9% 

60 13% 

135 29% 

25 5% 

TOTAL 460 100% 

Each promotion test is revised annually in order to prevent compromise and 

keep abreast of technological or procedural changes. The tests are con- 

structed using the content validity strategy of test development. Three 

sources of information are the foundation of content validity for the tests: 

the training standard, which lists the specialty's common duties and tasks; 

the occupational analysis data provided by USAFOMS’s own Occupational Analysis 

Program, which show the relative importance of tasks performed by job 

incumbents; and, most important, the experience and knowledge of the sub- 

ject-matter experts (SMEs) brought in to write the tests. 

The SMEs are senior NCOs selected from throughout the Air Force on the basis 

of their job experience in their respective career fields. "Tests Written by 

Airmen for Airmen" is a slogan which accurately sums up the USAFOMS test 

development philosophy, because these senior NCOs are the heart of the testwriting 

process. They provide the technical expertise and USAFOMS psychologists 

provide the psychometric expertise to produce job-relevant and statis- 

tically sound tests. 

While at USAFOMS, each group of SMEs is assigned a test psychologist to lead 

them through the test development process. A quality control psychologist 

acts as an additional set of eyes, performing exhaustive and minute scrutiny 

of all team output. A group of eight test management psychologists oversees 

the test development effort as a whole: identifying testing requirements for 

their assigned career fields, closely monitoring all events which may affect 

testing, ensuring that qualified SMEs are selected, providing guidance to 

test writers, and ultimately assuming overall responsibility for the tests 

developed. 

SMEs spend from 2 to 6 weeks at USAFOMS, depending on the type of test and 

the extent of test revision involved. During this time, their questions are 

thoroughly researched and reviewed. Each team member has veto power over 

each test item. After the SMEs leave, each test is subjected to an additional 

20 steps of quality control. The final product is a camera-ready test 

manuscript prepared with computerized photocomposition equipment. The manuscript 

is forwarded through the Air Force publications distribution system to 

be printed and disseminated worldwide through the network of Air Force test 

control officers. 

548

In addition to SKTs and PFEs, USAFOMS produces USAF Supervisory Examinations 

(USAFSEs) and Apprentice Knowledge Tests (AKTs). USAFSEs assess general 

supervisory and managerial knowledges and are used in the Senior NC0 Promotion 

Program, a board-based system used to make selections for promotion to 

the ranks of senior master sergeant (E-8) and chief master sergeant (E-9). 

AKTs measure the knowledge required for possession of the 3-skill level (also 

called apprentice level> of training. An airman with documented civilian 

experience in a specialty may be allowed to bypass resident technical training 

with a passing score on the AKT, thus saving the Air Force valuable 

training dollars. 

In 1989, 700 SMEs were sent TDY to USAFOMS to develop a total of 418 tests. 

The Professional Development Proqram 

This program, though not strictly an outgrowth of the field of industrial 

psychology like the others of USAFOMS, has had an important positive effect 

on acceptance of USAFOMS’s promotion tests. It is Air Force policy that 

promotion tests be developed entirely from references that will be available 

to all examinees for study. Before 1980, this was a probiem with USAFOMS’s 

most highly visible tests, the Promotion Fitness Examinations and USAF Supervisory 

Examinations. These tests were written from a variety of references 

which varied in quality and availability. The Professional Development 

Program was established to develop a single, high-quality reference upon 

which these critical promotion tests could be based. 

The reference which evolved was Air Force Pamphlet 50-34. Volume I of the 

pamphlet is now the sole source reference for airmen taking the Promotion 

Fitness Exam to compete for promotion to staff sergeant, technical sergeant, 

and master sergeant. Airmen competing for promotion to senior master ser- 

geant and chief master sergeant study both Volume I and Volume II in preparing 

to take the USAF Supervisory Exam. 

The Occupational Analysis Proqram 

In the early 196Os, research performed by the Air Force was to influence 

profoundly the field of industrial psychology. Occupational analysis had 

been around for many years in various forms, but it was the Comprehensive 

Occupational Analysis Data Analysis Programs (collectively called CODAP) 

developed by the Personnel Research Laboratory which made possible the study 

of jobs on the scale necessary to work with career fields the scope of those 

in the Air Force. In 1967, the Job Specialty Survey Division was formed to 

apply this technology in the operational setting. It was part of what was 

then called Lackland Military Training Squadron until Detachment 17 was 

formed in 1970. 

.People in the Occupational Analysis Program conduct surveys of AF personnel, 

both military and civilian, to learn what tasks they do regularly on the job. 

The Air Force uses the survey results for refining and maintaining occupational 

structures within a classi+ication system, for constructing enlisted 

promotion tests, for adjusting or establishing training programs, and for 

sustaining or modifying other Air Force personnel and research programs. The 

occupational survey process consists of six distinct phases, beginning with 

the receipt of a request for an occupational survey. Requests for surveys 

549

are reviewed by the Priorities Working Group (PWG). In addition to USAFOMS 

personnel, the PWG consists of representatives from the Air Force Deputy 

Chief of Staff for Personnel, the Air Force Human Resources Laboratory 

(AFHRL), the Air Force Military Personnel Center (AFMPC), and the ATC technical 

and medical training staffs. The PWG selects those specialties which 

will be surveyed and assigns relative priorities. 

The next step is the development of a job inventory. The job inventory 

consists of a comprehensive listing of tasks which may be performed in a 

particular occupational field. Inventory developers travel to operational 

bases as well as ATC technical training centers for exhaustive interviews 

with subject-matter experts. From these interviews, they compile the task 

listing and publish it along with background questions as the USAF Job Inventory 

for the occupational field under study. 

The job inventory is then administered to job incumbents, usually through the 

personnel office at each installation. The returned job inventory booklets 

undergo a quality control review to correct or eliminate those which have 

been improperly completed. Each booklet is reviewed for accuracy and com- 

pleteness. This careful quality control of the returned booklets ensures 

that the data received are accurate. 

Once the booklets are quality controlled, data processing personnel use an 

optical scanner to input task responses and background data from returned 

inventories into the computer. Computer programming personnel then apply 

CODAP programs to create job descriptions and other related products to aid 

in data analysis. 

Occupational analysts then spend considerable time analyzing the data and 

reporting significant trends and implications. USAFOMS publishes the find- 

ings and results of the analysis in the form of an Occupational Survey Report 

(OSR). The OSR and related data packages are made available to Air Staff, 

major commands (MAJCOMs), classification and training personnel, and other 

interested Air Force agencies. 

The critical final step in the occupational survey process involves working 

with the users to apply the data to their particular situation. During this 

step, the analyst introduces the user to the data products and gives specific 

guidance on how to use the data printouts in making decisions. Once the data 

have been analyzed and the OSR has been written and released, the data are 

used in a variety of ways. Classification personnel look at career field 

structuring, to validate the present structure or recommend restructuring. 

USAFOMS psychologists rely heavily on the data to establish the content 

validity of enlisted promotion tests. USAFOMS training analysts also use the 

data for systems analyses, task analyses, and assessment of education and 

training requirements. But, perhaps the most visible use of the OSR data to 

date is in determining training requirements. In today’s environment, where 

the training dollar is tight, training must be geared only to what the person 

will need to do the job effectively. In this regard, the emphasis today is 

placed on determining how job incumbents will be used in the first job assignment, 

identifying those tasks for which the probability of performance by 

airmen in their first assignment is high, and providing initial training on 

these tasks. OSR data are the key to designing initial courses that train 

550

only for the first job, as well as providing valuable information for what to 

include in follow-on training. 

The Traininq Development Services Proaram 

The Training Development Services Program was established in 1982 to improve 

Air Force training by using a systematic approach to training development. 

The program goal is to enable customers to provide "Quality Training for a 

Quality Force.” Training analysts are located at Randolph AFB and at each of 

the six technical training centers. Their primary function is to provide 

front-end task and training analysis to support Air Force instructional 

system development (ISD) requirements. The analysis focuses mainly on the 

second step of ISD, "Define Education and Training Requirements.'* The end 

result is an analysis of the training requirements of an Air Force specialty 

and a plan for structuringland integrating all training within that special- 

ty. 

The primary product of the Training Development Services Program is the 

Training Requirements Analysis (TRA). This document consists of three sec- 

tions: 

1) Systems Overview. This section provides the user with background 

information on the specialty with special emphasis on training needs and 

issues. This section lists all training presently available and points out 

anticipated changes within the career field such as the acquisition of new 

equipment. Data for this section comes from the Air Force Military Personnel 

Center, functional managers, training managers, and other staff-level organizations. 

2) Comprehensive Task Analysis. Each important task of the specialty is 

broken down into the skills and knowledge required to do the task. Also 

included are the tools, equipment, references, conditions, and performance 

standards for each task. This information is obtained through extensive 

one-on-one interviews with individuals who are fully qualified in the specialty. 

3) General and Specific Training Recommendations. The general recommen- 

dations relate to broad training issues such as the development of a new 

course or the merger of two or more specialties. On the other hand, specific 

recommendations are given task by task and describe where and when a task 

should be trained based on field data, task analysis data, and occupational 

survey data. The “where” is typically either at a technical training center 

or through on-the-job training. The “when” indicates whether a task should 

be taught during entry-level training or at a later time in a person’s ca- 

reer. Specific training recommendations are often produced in the form of a 

proposed Specialty Training Standard; however, the format is varied to meet 

‘the needs of the user. 

A TRA begins with a request from a specialty representative, usually at the 

Air Staff or MAJCOM level. TRAs may be requested in conjunction with an 

occupational survey (a product of the Occupational Analysis Program) or as a 

follow-on to a previous survey in order to address specific training issues 

and concerns. Approved TRAs are listed in the USAF Program Technical Training 

document . After approval, a team of training analysts is assigned the 

project. Usually analysts from more than one location will work on a 

551

project in order to reduce travel costs. Data gathering involves extensive 

interviews and observation of skilled specialty technicians. Analysts will 

draw from the experience of specialty instructors at technical training 

centers and will also travel to various MAJCOMs and bases that employ personnel 

in the specialty under study. Specific locations are determined through 

meetings with functional managers and will include enough bases to ensure a 

thorough sampling. Travel is confined to the continental United States 

unless unique specialty units are located overseas. The detailed task infor- 

mation gathered during these trips is collected with laptop computers and 

entered into automated files. The data then become the basis for the train- 

ing requirements summarized in the TRA. 

Analysts track the results of each TRA through an extensive external evalua- 

tion program that routinely surveys product recipients. User information 

received over the past two years indicates TRAs are extremely useful and 

serve many purposes. Enlisted personnel account for 80% of the users which 

is understandable since most analyses are conducted on enlisted specialties. 

Civilians account for 15% while officers account for the remaining 5%. TRAs 

are typically used to develop or revise OJT program.s, produce specialty 

training standards and criterion objectives, justify the procurement of 

training resources, and standardize training programs. TRAs have also been 

used to support career field mergers, determine cross utilization of training 

programs, and to support Utilization and Training Workshops (U&TWs). 

Conclusion 

USAFOMS programs impact on virtually every aspect of today’s Air Force:’ 

Determining service entry criteria, setting aptitude requirements for occupational 

'specialties, establishing criteria for job-specific training programs, 

and providing the foundation for a fair and objective promotion system. The 

future holds new challenges as well, including the possibility of an on-line 

occupational information system which permits analysis across weapon systems, 

training information which supports both large- and small-scale programs 

which are job-specific and cost-effective, and continued improvement of the 

promotion system. Key to all future developments is the recognition that 

success has come from the operational application of research in industrial 

psychology and improvements must follow a similar track: research and validation 

prior to implementation. 

552

The Examirler is a sopilisticated computer-based system used irl the tlevelopn~et~t 0I’ t)(llll jJ:klJcr ilild IJellril 

and coniputer-delivered examinations. Over 200 imtallatiorls world-wide make LX ol tile system ill 

applicatiom rmgiiig from traditional classroom tests to the evaluation of sailors iii subni;lrilles al scii. 

Tliis paper will give a brief /ljs[ory of The Exmlincr, describe ttle SlrlKlU’e of IlIe sysIt’r~, a~~rl giw so111e 

sugestetl irllplenleri(atit,ra. 

‘The Evolulio~~ of The Examiner 

A number Of expensive ami complex mairirrame testing syslerns existed ii1 1984 ~Vllell III. SI:Uilt‘y Trolli[) ;Illrl 

I tlecidecl to put our development and programming experience to work to create a rl:icrocolllputcr IGLWYI 

testing syslern. \Ve developed a small prototype system i111d sl~~wetl it to a 11unlber ul’ piospc’c:i*+e cu~tou~r~~ 

in hope of receiving funding to develop it inlo a full-fleclgcd program. 

TIumq$ some hlsilless quaintances in Ellglalid, we leat~lletl tllat tile LOII~IOII Shtk Excll~~ge wit.5 

or’el hulirlg their centuries-old brokerage system arid were rnot.irlg towards ;I cerril’itxl repl.eseltl;l~i~~ \\ \ICIII 

similar IO ullat we Ilave here in the United States. As part ol’ tllis ~II~~~SS 111e Excl~;ulge dccirl?d III~I IIIC~ 

rvantecl a comprehlsi~,e computer-based teslhg syslem tlereloped lo 111ee1 h+ crrtilic;llioIl rret!dr. i’llr 

design criteria tlq specified were: 

The system iiad to be secure. Itell \);ulk ellcryptioll a11d tlatal~;r!:tS ~;LGW~I CI ;IC‘CCS\ II;III IO C’IISUI e 

lliat the test items did liot “escape”. 

The system had to be reliable. Accuracy in testing was important for its ow11 sake. I-her ~egul;itions 

in the UK made it essential that there be IIO errors in recording amwers 01 I~~OI 1i11g gr&s 01 

examinees. 

T!ie system had lo be easy to use. From a clevelop~llelll staiidpt~iiil, IlIt! s~slrlll ~11011l!./ IW LXYil! 

operated by clerical staff. From tlie examillee sta~icipc~iilt. COIIlpU[t’~~ ilcopll~le?; rrlml 1101 Iiriil 111~ 

sol‘twnre impairing their test-lakiiig ability in ally \v;ly. 

Dr. Trollip ar~tl I conviricetl the ExchaJlge ~llat we could develop ;I sul’t~u e p~~ll~cl 1’01 lllt2lil 111~11 LCltlltl IllttCl 

their ~lcetls, a11d that we could develop it for them on budget rultl in time for rileit “Big I3Ulg” c!et C~Uliili0il ill 

tile Fall of 1985. We succeeded, and the software product The Exanlihr ws l’irst ;v;s used iii Octoi,er 01’ 

1985. Since ~hcn, every stockbroker in tile United Kingdom and Ireland has kc11 CC‘I lil’icd using (1i11 S~SIVIII. 

Over Ic)OO tests a year ha\.e been given. 

Produce h0lh ctlnlpUlel-deli~e~.~~j alltl lraditional paper ;irld pelril 1esl.r II 0111 ;i hillgle Cl:Ilal~;i\~ 

Provide a Il~aIllc~o~k tllkiI rilI1 p~ocluce simple spoI quizzes 01 corltplex ~Iu:diliv.ilioll Ic\ls. 

Provide item al’ld test l’eetlback allowing the htegratioll ol’ trainin,0 iill illc ~:i~;lill:lIl~l! 111 o~~kb;. 

Track item statistics for lhe improvement of item tmlk quality. 

Track examince statistics for irlriividual and clans reporting purposes. 

553

_____ - . . . ..- -_-.---. --.-. -~----._----_.-.- --~__ 

Item Editor The item bank is created in this part of the system. Tile structure ol’ 111e rlala~xw is 

developed to allow for accurate testing of different subject areas. 

Exam Editor Once the item bank has been created, the examination editor is used to create “pr~l’iie~” IIMI 

will be trsed a5 telI~~)lilleS to create tesls. 

Exam Delivery Tests are delivered in a secure environruetlt with a user iMerf;tce designed to allow tIlti 

assessment of examinee knowledge rather than test-taking ability. Paper and pencil tests ale 

cleanly printed for maximum clarity and legibility. 

Statistics Completed examination records can be viewed for both examinee results and item allalysis. 

Tile Ilem Bank Structure 

0~ of the great powers of The Exaalincr is that the software naturally leads tl~e rle~loprr inlo ol,giilli/.illg 

item into a logically-structured item banks. This ratiomlly structured item bar~k illl(~wS (1~ c~lrlstruclion 01 

tests that can evaluate specific learning objectives or bloacl knowledge are;t% Examiner

The Examiner 

Multiple Choice 

Multiple Correct 

Dynamic Multiple 

Choice 

Dual 

Short Alpha Answer 

Short Numeric Answer 

Linked 

Parallel 

Up lo 10 aJlern;Ilives are available, each alternative IIaviIIg its O\VII gl.illlillg 

weight with item mastery based 011 achieving a set SUIII ol c~rrecI dIr~IIilIi\es. 

Up to 10 alternatives are available. Ilenis are coIulrurletl \I~ll~\IlliCilll~ ;II 

exaIIIiII;Ilion geIIeralioII lime by selecting a preselected ~turuber ol’ WI reck ;IIKI 

incorrect allerIIalive% 

A special Corm of multiple choice item created al examination getteration litlIe 

from a set of four alternatives, two correct and two incorrect. 

Up to ten words can be judged at one lime. Misspellings, extra words, irtcvrrect 

word order, and capilalizalion errors can be allowed or tlisallouetl at will. 

A floating point number can be judged. Exact matching or plus ;uId mittus a11 

absolute number or percentage of error can be allowed. 

Up to 99 items can be linked, together ittto a “scenario” type of item. Mastery 

of the linked item can be based 011 mastery ol all or [‘ill t ol’ lltr ittclutled items. 

Up to 99 items can be grouped together as a parallel item. At examimrtiott 

generatioti the, lhe sysleni will iwidomly select 01) tlie px;illel items 101. 

inclusion in the examinatiun. 

The means of examination delivery will ol’ten determine the type of questions tlmt 211 e IO be used. 

If computer-delivered or manually-graded paper and pencil examination ;II r’ LO be used, then any of tl~ese 

item types can be used. If machine-graded paper and pencil examinatiotts are anticipated, tlten multiple 

choice is the usual choice. 

Item Entry 

An integrated word-processor allows clerical staff to enter items into The Examiner. A judicious use oI’ preformatting 

enables The Examiner editor lo produce correclly enleretl items every time. 

If an item bank is already in existence, a Taz hport r/rility can be used to load items ~IIIO att Examiner 

database. Mai&ame-based item banks can be successfully migrated into the Exanrirrcr’s ertvirotunertt \r.i~ll in 

minimum of expensive “hands-on” intervention. 

Examination Develonmenl 

Once an item bank has been created, the developer can create examinations from dl OI’ poll 1 ol tl1r 1Wlk. 

Item selection can range from the selection ot’ specific ilems lo m~clorn seleclioIi. 

The Examiner is unique in its use of projifes. A profile is a set of directives that tells llie Examiner testgenerating 

software how to extract items out of the database and create an exarnittaliott. The Examiner 

doesn’t store tesfs. Rather, it stores profiles that allow tests to be created otr-demand by accessing 1l1e 

profile. This mikes Examiner databases totally self-coIllaiIIed, allowing the crealioII of urIiqtIe, ye1 

equivalenl, examinalions at any lime. 

Profiles contain two main sets 01 specificatiotu: 

555

The Examiner - 

Global Specilicatiolls These are sets of parameters that effect things such as the number 01 items fo be 

shown in the examination, the pass mark for the examhation, the dilliculty of the 

examination, and the way that the examimatioll is to appear lo llle cxaulillre. 

selection and random presentation of multiple-choice alternatives. Al the other elltl, the prolile for a 

complex certilication examination can be created that will yield a test of a given dilliculty level to test very , 

specific areaS of knowledge. 

All Questions in Database 

/- I ------s_-----.--- -__- ---. -- . ..-.( 

History (t) 

Gyu$-v b) 

1 .o.o 

. . 

I! 

,~-?--.-Ji 

- - 

Pdople (t) States (*) 

1.1.0 1.2.0 

- -_-.::;7:...;2-.= 

/I 

,-===-:“:..- ; 

_ -: 

Cities(Z) Seas Isilands(+) 

2.1.0 2.2.0 2.3.0 

I’ 

A------ ii 

A,/ 

. . I ._- .-.- _-. _! I 

f-------11 I 

--====Tr------I T-- 1 

Q - 4 

.'.. 

Q Q Q s Q 

1.1.1 1.1.2 121 . . 122 . . 2.L- ‘: 2 2.L 2.2.1 2.2.2 2.3.1 

The above illustration gives an example of the type of sophisticaled iteril scleclioll crilel ia Illat calI bc us~trl ill 

Examiner profiles. In this example, the profile has been designed so that a six item lest will be gellel~i\letl. 

The item selection criteria have been set so that: 

1) 

2) 

3) 

All examinees will get item 1.1.1. 

Two items will be selected from area 2.1.0. In this case, the random selection jMX)ceSS Select4 2.12 

and 2.1.3. It could just as well have be&m two items from 2.1.0. 

Everyone will get item 2.3.1. 

4) The rest ol the test will be completed with items I‘rom 1.2.0. 111 Illis case, items 1.2.1 arid 12.2 \vere 

selected. 

in addition to specifying item selection by objective clzsilication, protiles WI S[JeCify Se~ecIiorl IIy tlil’l’icully, 

item type, and item characteristic. The Exaltliner will attempt to produce a test lllat w close& r11Nci1e~ III~ 

requested characteristics as possible. 

Examination Delivery 

‘l‘hC ikmirler is unique in its ability to produce both cool[)uter-cleliver~~l a11tl p;ll)t’t ;IIKI pencil c.~;\llliil;llioll.s 

from the same item bank. The developer is given the ability to select the delivery 1110tle 11~1 is tl~ 111051 

appropriate for their testing needs. 

556

Paner and Pencil Testing 

The Examiner can easily produce multiple forms of paper and pencil tests. By the illtroduction of r~u~rlom 

selection options into the profile used to generate the test, a user cm produce two statistically ecluivalerrt 

tests on the same subject area. If desired, a unique test could be generated for each examirree. Pil1)e.r ant1 

pencil tests can be scored in three Lshions: 

Manually Student and instructor’s answer keys can be printed wit11 eaclt test. In manly 

instances, hand grading using these forrrls is acceptable. 

Stand-Alone Machine 

Graded 

A “pre-slugged” answer sheet compatible with tile sland-able Scalilraii 888 

optical scalner can be produced nn an IIP Lx?er.let II printer. This allows the 

easy scoring of mulliple examinations on readily available hardware. 

Data-Terminal Scanning ~11 allswer hey is stored internally in The Examiner’s database ant1 can be used 

to grade examinee answer sheets. Complete examiriation and item stalislics 111 e 

stored when data terminal scanners are used. At p~se~ll, he SC~IIUOII 

1300/1400 series scanners are supported. In 1st Quarter 1391, support (01 tile 

Scantron 8000/8200 series scanners will be added. 

Print options of The Examiner ae currently being elkulceti, and these new featur-es will be r~lc;t.setl irl tl~e 

first quarter of 1991. Supported features will be: 

PriNers hitid SUppOrt IOf 20 Ol‘dle IllOSt COr1111~01~ pI’illkrs. on cu.slolllel‘ I’qllesl, Ilk! 1” illlcr~ 

support iiies will be expanded to include atlriilional prillters. 

Fonfs Depending OII the printer, font control will be added. With lile l-1 1’ I-;iserJel prirltt?, 

numerous font cahdges will be supported in addition to liie dei’auil illlet 11ai I’OII~S. 

Highlighting Bold, italic, underlining, superscripts, and subscripts will be available on Ixinlers tl\a~ 

support liiose features. 

Graphics Primed items will include PC-Paintbrusll’” images that are currelitly 01lly available witii 

computer-delivered examinations. Pririring will be limited to tllosr pr~irrlers Iilal SUppul’l 

graphic printing. 

Sr~ggeslcd Implementations 

Paper and pencil tests can be made available to examinees under a Ilumber of tliI’l’ereIlI delivery ellvirollmerits. 

Tile “unbundlirlg” of p,arrs of Tire Examiner makes it possible 10 Ilave ~lol~-lecl~llical clcric~~l s~l’l 

produce on-demand tests at remote sites. Three possible test creation/delivery sceri;irios WC: 

Local Control Tests are created using the main Examiner system and are gritlieti at the rlevel~pme~~t sile. 

Copies of the item bank remain secure in one place and test creation WWOI is tiglrlly 

limited. Tests cal be mailed to remote sites and tilell returned to tile cetl~ral sile ~OI 

grading. 

Networked Using the network versioll of the stand-alone examillatiorl generalor. wniole sires c;in loc;~ll~ 

generate tests and score Illem. Witllout access to (lie eclitillg 13io~;~‘~ll~Ih, IlIe s;ccclriry of’ IlIt! 

database is maintained wicle provicliq tile collvenience ol’ simuil:uleous multiple xwss to 

the items. 

557

. 

Remote Copies of rhe database are distributed to the remte sites, 3lld the St~lIld-~l0lle ex~IlJliIliili~Jll 

generator is used to produce the tests. Grading is done locally. 

Of course, numerous variations on these biUiC Iliemes are possible lo ol’~ef~ a clclik,ery cII~ir.oIIItIrIIt 

appropriate for the unique delivery requirements. 

Comouter-Delivered Testinp 

Examinations can be delivered via computer using The Examiner’s administration software. Sa~nplc 

examinations allow the examinee lo become fanlihr With the testing software so thal the at~IlliIlistralioIl 

system tests knowledge rather than computer test-taking ability. 

Basic options available within the administration system are: 

Sequencing Examinees can be required to answer each item before they see Ilie next item, and are IlOt 

allowed KJ review and change their items. Or, exanIiIJees calI move willIiII the exaIIJiIIali~~J1 

at will, chlgirlg their answers tlrltil complete examination scoriIlg is rquesIcd. 

Feedback Real-titlle student mastery feedback cm rarlge from 11one at all to tlrtitiled Itedback ill ille 

muItiple-choice alterliarive level. Full-test mastery crileria a~~tl r.esrllls ciitl be aclivaleil wllell 

appropriate. 

Randomization Item presentation order can be random or lixed. Within items, mulliple choice alterIIaIive 

order can be randomized. 

Examiner tests are DOS fries and car1 be moved f~‘orn one nlachhe to anoliier by ilIly Ilumber of lllt!thl~c~S 

Floppy disks, local area networks, and distributed cornlnullicaliol1 networks are all p

The Exanliwr is ii sopllis[ic;l(etj conlpu~er-l)~t~etl examin;ltion system IIlirl GUI mrcl IIIOSI ltdiilg I\cch (11 

bodl large and small organizations. Its ilbilily 10 deliver l.Wlil pilp” illltl pcrlcil iIllCl ~~~llll~~ll~~~l~il.S~ll 

examinations from tile sane &i&are give it a unique power in arl areai where lesliilg lltxcls ciul soillclilnri 

change with great rapidity. ~v&ing to meet user’s needs, The Examiner is a cosl-efleclive ol’f-tile-sl~zll’ 

solutiori for the evaluahri of examirlees arld students. 

For informntioii corilacl: 

Media Compuler Enrerprises, Ltd. 

880 Sibley Memorial Highway, Suite 102 

Mendola Heights, MN 55118-1708 USA 

Phone: 612-451-7360 

FAX: 612-451-6563 

559

. . ..--.-_---_- .-_ 

32ND ANNUAL CONFERENCE OF THE MILITARY TESTING ASSOCIATION 

ORANGE BEACH, ALABAMA 

5-9 NOVEMBER 1990 

Minutes of the Steering Committee Meeting 

5 November 1990 

The meeting of the Steering Committee for the 32nd Annual 

Conference of the Military Testing Association was held In the 

Sand Castle (1) Room of the Perdido Hilton Hotel, Orange Beach, 

Alabama, 

MEMBERS AND ATTENDEES. S,ee the List of Steering Committee 

Meeting Attendees, which follows these minutes as Attachment 1. 

1. The meeting was called to order at 0930 hours by CDR M. R. 

Adams, 1990 Chairperson. 

2. The financial report reflected a sharp impact from the recent 

economic budget problems: last year over 350 people were 

registered and for 1990, 300 were expected. Because of the 

sizeable funds passed to NETPMSA from the 1989 hosts and the 

expected number of attendees, the registration costs were 

substantially reduced. There have been many cancellations and 

the current estimate of registered attendees Is 140. (As of 9 

November, there were 165 registered attendees.) 

3. Future conference locations were discussed: 

(a) The Federal Republic of Germany received a letter signed 

by OASD (FM&P) stating that support for U.S. participation at the 

'91 MTA Conference would be given. However, NETPMSA, as the '90 

MTA host, recommended review of the next site due to the budget 

difficulties encountered and the financial planning problems for 

the host. Several discussions followed regarding funding 

cutbacks expected and difficulties In promising good attendance 

numbers In Germany. Until the testing/research budgets finalize, 

it was agreed to defer Germany as the quest host, A Spring 

versus Fall time period was discussed also but the qroup decided 

to leave the conference as an expected Fall budget Item. even if 

attendance went down. The USAF Occupational Measurement Squadron 

tentatively agreed to host the 1991 MTA Conference in San 

Antonio, Texas, site of the 1989 conference. (That 1991 site was 

confirmed on 6 November 1990). 

(b) The Navy Personnel Research and Development Center will 

host the 1992 MTA Conference in San Diego, California. 

560

(c) The Coast Guard will host the 1993 MTA Conference in 

Williamsburg, Virginia. 

(d) The Federal Republic of Germany will host the 1994 MTA 

Conference in Germany in conjunction with Naval Research in 

London, England, 

(e) Canada will host the 1995 MTA Conference. 

4. There was general discussion on the submission of abstracts 

and the difficulty In getting them in a timely way. Many members 

felt the Steering Committee members should be more forceful in 

the association and possibly require a committee member screen on 

presentations. This would assist with quality and timeliness. 

Some members felt there should be greater recruitment for topics 

from the production/development areas since research is already 

so well represented. 

5. Regarding the Harry H. Greer Award, there was discussion 

about an Awards Committee being established, as mentioned in the 

charter, to provide more structure and coverage in getting more 

good nominations. The general opinion was that the current 

method of presenting nominations to the current chairman for 

further opinion is sufficient. However, the committee members 

all agreed that nominations should be specific In detail 

regarding the currency and degree of the nominee's involvement 

with the Military Testing Association: professional contributions 

in research/production; published material, etc. 

6. There was general agreement that the 1989 carry-over topic of 

"MTA name change" should be dropped from future MTA Steering 

Committee meeting agendas. This has been a repeated item and the 

historic continuity value of the current title is most important. 

M. R. ADAMS, CDR, USN 

1990 Chairperson 

561

__-.-_ ___rr_l .__-__ __^__ .__...__.. - .___..____-.__._ _ _-.. . _ 

Canadian Forces Personnel Applied 

Research Unit 

National Defence Headquarters 

Canadian Forces Directorate of 

Military Occupational Structures 

Federal Ministry of Defense 


MOD Science 3 (AIR) 

Ministry of Defence 

United Kingdom 

Royal Australian Air Force 

Royal Netherlands Army 

SEC PSY OND/CRS 

Belgian Armed Forces 


Management Support Activity (NETPMSA) 

Naval Military Personnel Command 

Navy Occupational Development and 

Analysis Center (NODAC) 

Navy Personnel Research and 

Development Center (NPRDC) 

Defense Activity for Non-Traditional 

Education Support (DANTES) 

U.S. Air Force Human Resources Laboratory 

U.S. Air Force Occupational Measurement 

Squadron 

U,S. Army Research Institute (PERI-RG) 

U.S. Coast Guard Headquarters (G-PWP-2) 

OBSERVERS: 

Air Traffic Services Transport Canada 

Chief of Naval Operations 

1990 MTA STEERING COMMITTEE MEETING ATTENDEES 

562 

_ .._._ _-. _-. __.. .- __._ ---_-.__ __,. _ _. . _. _ 

CDR Frederick F.P. Wilson 

COL James C. Fleming 

Mr. G. J. (Jeffi Higgs 

COL Terry J. Prociuk 

Martin L. Rauch 

(Represented by 

LTCOL John Blrkbeck, 

MOD A ED 4) 

Squadron Leader John S. Price 

COL Dr. Ger J.C. Roozendaal 

CAPT Francois J.M.E. Lescreve 

CDR Mary R. Adams 

CAPT Edward L. Naro 

Dr. Alain Hunter 

Mr. William A. Sands 

Roger G. Goldberg 

Dr. Lloyd D, Burtch 

J. S. Tartell 

Dr. Timothy W. Elig 

Richard S. Lanterman 

J. R. Dick Campbell 

Mr. Charles R. Hoshaw 

Attachment 1

ORGANIZATION 

ROYAL WSTRALIAR AIR FORCE: 


U.S. Air Force Human Resources 

Laboratory (AFHRLIMOD) 


U S A 

A/V 240-3640 COM: (512) 536-3648 

BELGIAN ARMED FORCES, 

SEC PSY OND/CRS: 

SEC PSY OND/CRS 

Bruynstraat 

B-1120 Brussels 

Belgium 

2 2680050, Ext. 3279 

CAK4DIA.N FORCES DIRECTORATE OF 

MILITARY OCCUPATIONAL STRUCTURES: 

Canadian Forces Directorate of 

Military Occupational Structures 


101 Colonel By Drive 

Ottawa, Ontario KlA OK2 

Canada 

MILITARY TESTING ASSOCIATION 

STEERING COMMITTEE MEHBERS 

AUSTRALIA 

BELGIUM 

CANADA 

563 

1990 REPRESENTATIVE 


(Squadron Leader Kerry J. McDonald w 

be 1991 representative) 

CAPT Francois J.M.E. Lescreve 


(A/V 642-3507 COM: (613) 992-3507) 

Mr. G. J, (Jeff) Higgs 

(A/V 842-7069 COM: (613) 922-7069)

--,..--c__ 

____ _____....__.____ - .-._ _.._ _-__--.l-- _--.. -- . ..-- ~. ..-.. 

.-- 

--- --- 

--~----- 

._~ _.... ~- _.._.~ _. 

ORGANIZATION 1990 REPRESENTATIVE 

DIRECTOR OF PERSONNEL PSYCHOLOGY AND SOCIOLOGY: 

Director of Personnel Psychology and 

Sociology 

National Defense Headquarters 


Canada 

Canadian Forces 

A/V 842-0244 COM: (613) 992-0244 

CANADIAN FORCES PERSONNEL APPLIED RESEARCH UNIT: 


Research Unit 

4900 Yonge St., Suite 600 

Willowdale, Ontario M2N2Z4 

Canada 


COM: (416) 224-4964 

FEDERAL NINISTRY OF DEFENSE: 

COL Terry J. Prociuk 


FEDERAL REPUBLIC OF GERMANY 

Federal Ministry of Defense, P II 4 

Postfach ‘1328 

5300 Bonn 1 


Federal Ministry of Defense 

COM: 49-228-128543 

FEDERAL REPUBLIC OF GERMANY AIR FORCE: 

Federal Republic of Germany Air Force 

Wehrbereichsverwaltung II 

V-4-Psychology Angelegenheiten 

Hans-Blocher-Alee 16 3000 Hannover 

05 11-531-26 08126 03 

564 


Wolfgang Weber 

.-. .


ROYAL NETHERLANDS ARMY: 

Royal Netherlands Army 

DPKL/AFD GW 

Postbus 90701 

2509 LS The Haque 

The Netherlands 

COM: 31-71-6135450 

MOD SCIENCE 3 (AIR): 

MOD Science 3 (AIR) 

Lacon House 

Theobalds Road 

London, WCIX 8RY 

England 

U.S. AIR FORCE 

THE NETHERLANDS 

L 

THE UNITED KINGDOM 


Eugene F. Burke 

(Represented in 1990 by 

COL John Birkbeck 

MOD A ED 4 

Court Road 

Eltham 

London SE9 5NR 

United Kingdom) 

UNITED STATES OF AMERICA 

U.S. AIR FORCE HUMAN RESOURCES LABORATORY 

(AFHRL): 


(AFHRL/PR) 


USA 

(A/V 240-3011 COM: (512) 536-3611) 


(AFHRLICC) 


USA 

565 

Dr. Lloyd D. Burtch 

COL Harold G. Jensen

-.e-..- -_-._.- -I-..-_ __..,......_..... 

..-. ---- --- 


U.S. AIR FORCE OCCUPATIONAL MEASUREMENT SQUADRON 

(OMY): 

U.S. Air Force Occupational 

Measurement Squadron (OMY) 

Randolph AFB, TX 78150-5000 

USA 

DSN 487-6623 COM: (512) 652-6623 

U.S. ARMY RESEARCH INSTITUTE (PERI-RG): 

U.S. Army Research Institute (PERI-RG) 

5001 Eisenhower Avenue 

Alexandria, VA 22333-5600 

USA 

A/V 354-5786 COM: (703) 274-5610 

U.S. COAST GUARD 

U.S. COAST GUARD HEADQUARTERS: 

U.S. Coast Guard Headquarters 

Chief, Occupational Standards 

(G-PWP-21, Room 4111 

2100 Second St., S.W. 

Washington, DC 20593-0001 

IJSA 

COM: (202) 267-2986 

U,S, NAVY 

NAVAL EDUCATION AND TRAINING PROGRAM 

MANAGENENT SUPPORT ACTIVITY (NETPHSA): 


Management Support Activity (NETPMSA) 

(Code 03) 

Pensacola, FL 32509-5000 

USA 

A/V 922-1685 COM: (904) 452-1685 

.._r- -.._-“_.----~~~~r.-._ .c--. 

566 

J S. Tartell 

Dr. Timothy W. Ellq 


CDR Mary Adams 

Dr. James M. Lent2 

. _.__.. _ .~ --

ORGANIZATION 

NAVAL HILITARY PERSONNEL CONHAND NAVY OCCUPATIONAL 

DEVELOPHENT AND ANALYSIS CENTER (NODAC): 


Navy Occupational Development and 

Analysis Center (NODAC) 

Bldg. 150, Wny, (Anacostia) 


USA 

NAVY PERSONNEL RESEARCH AND DEVELOPNENT CENTER 

(NPRDC): 



Testing Systems Department (Code 13) 

San Diego, CA 92152-6800 

USA 

A/V 553-9266 COM: (619) 553-9266 

DEFENSE ACTIVITY FOR NON-TRADITIONAL EDUCATION 

SUPPORT (DAMTES): 

Defense Activity for Non-Traditional 

Education Support (DAMTES) 


USA 

A/V 922-106411745 COM: 904-452-1063 

OFFICE OF ASSISTANT SECRETARY OF DEFENSE FORCE 

MANAGEMENT AND PERSONNEL (FN&P): 

Office of Assistant Secretary of Defense 

Force Management and Personnel (FM&P) 

Washington, DC 20301 

USA 

A/V 227-4166 COM: (202) 697-4166 

567 

-_ 

1990 REPRESENTATIVE 


(A/V 288-5488 COM: (202) 433-5488) 


(A/V 288-4620 COM: (202) 433-4620) 


Roger G. Goldberg 

Dr. W. S. Sellman

_-----_ 

.- ___I_._-____ .I_..__.__ - -.-. I . ..-..-.. -. 

HY-LAH’SOFTHE RlII.IT.4KY TklS’l’INC ASSOCIATION 

Article I - Name 

The name of this organization shall be the Military Testing Association. 

Article Ii - Purpose 

The Purpose of this Association shall be to: 

A. Assemble representatives of the various armed services of the United States 

and such other nations as might request to discuss and exchange ideas concerning 

assessment of military personnel. 

B. Review, study, and discuss the mission, organization, operations. and research 

activities of the various associated organizations engaged in military personnel 

assessment. 

C. Foster improved personnel assessment through exploration and presentation 

of new techniques and procedures for behavioral measurement, occupational 

analysis, manpower analysis, simulation modeIs, training programs, selection 

methodology, survey and feedback systems. 

D. Promote cooperation in the exchange of assessment procedures, techniques 

and instruments. 

E. Promote the assessment of military personnel as a scientific adjunct to 

modern military personnel management within the military and professional 

community. 

Article I1 I - Participation 

The following categories shall constitute membership within the MTA: 

A. Primary Membership. 

1. All active duty military and civilian personnel permanently assigned to an 

agency of the associated armed services having primary responsibility for assessment 

for personnel systems. 

2. All civilian and active duty military personnel permanently assigned to an 

organization exercising direct command over an agency of the associated armed 

services holding primary responsibility for assessment ofmilitary personnel. 

B. Associate Membership. 

1. Membership in this category will be extended to permanent personnel of 

governmental organizations engaged in activities that parallel those of the primary 

membership. Associate members shall be entitled to all privileges of primary 

members with the exception of membership on the Steering Committee. This 

restriction may be waived by the majority vote of the Steering Cornmi ttee. 

568

C. Non-Member Participants. 

1. Non-members may participate in the annual conference. present papers 

and participate in symposium ‘panel sessions. Xon-members will not attend the 

meeting of the Steerrng Committee nor have a vote in association affairs. 

Article IV - Ilues 

No annual dues shall be levied against the participants. 

Article V - Steering Committee 

A. The governing body of the Association shall be the Steering Committee. The 

Steering Committee shall consist of voting and non-voting members. I’oting 

members are primary members of the Steering Committee. Primary membership 

shall include: 

1. The Commanding Officers of the respective agencies of the armed services 

exercising responsibility for personnel assessment programs. 

2. The ranking civilian professional employees of the respective agencies of 

the armed service exercising primary responsibility for the conduct of personnel 

assessment syst.ems. 

3. Each agency shall have no more than two (2) representatives. 

B. Associate membership of the Steering Committee shall be extended by 

majority vote of the committee to representatives of various governmental organizations 

whose purposes parallel those of the Association. 

C. The Chairman of the Steering Committee shall be appointed by the President 

of the Association. The term of office shall be one year and shall begin the last day of 

the annual conference. 

D. The Steering Committee shall have general supervision over the affairs of the 

Association and shall have the responsibility for all activities of the Association. 

The Steering Committee shall conduct the business of the Association in the interim 

between annual conferences of the Association by such means of communication as 

deemed appropriate by the President or Chairman. 

E. Meeting of the Steering Committee shall be held during the annual 

conferences of the Association and at such times as requested by the President of the 

Association or the Chairman of the Steering Committee. Representation from the 

majority of the organizations of the Steering Committee shall constitute a quorum. 

Article VI - Officers 

1 consist of a President, Chairman of the 

A. The officers of the Association shal 

Steering Committee and a Secretary. 

569

B. The President of the Association shal1 be the Commanding Officer of the 

armed services agency coordinating the annual conference of the Association. The 

term of the President shall begin at the close of the annual conference of the 

Association and shall expire at the close of the next annual conference. 

C. It shall be the duty of the President to organize and coordinate the annual 

conference of the Association held during his term of office, and to perform the 

customary duties of a president. 

D. The Secretary of the Association shall be filled through appointment by the 

President of the Association. The term of office of the Secretary shall be the same as 

that of the President. 

E. It shall be the duty of the Secretary of the Association to keep the records of 

the association, and the Steering Committee, and to conduct official correspondence 

of the association, and to insure notices for conferences. The Secretary shall solicit 

nominations for the Harry Greer award prior to the annual conference. The 

Secretarv shall also perform such additional duties and take such additional 

responsibilities as the President may delegate to him. 

Article 1’11 - Rleetings 

A. The Association shall hold a conference annually. 

B. The annual conference of the Association shall be coordinated by the agencies 

of the associated armed services exercising primary responsibility for military 

personnel assessment. The coordinating agencies and the order of rotation will be 

determined annually by the Steering Committee. The coordinating agencies for at 

least the following three years will be announced at the annual meeting. 

C. The annual conference of the Association shall be held at a time and place 

determined by the coordinating agency. The membership of the Association shall be 

informed at the annual conference of the place at which the following annual 

conference will be held. The coordinating agency shall inform the Steering 

Committee of the time of the annual conference not less than six (6) months prior to 

the conference. 

D. The coordinating agency shall exercise planning and supervision over the 

program of the annual conference. Final selection of program content shall be the 

responsibility of the coordinating organization. 

E. Any other organization desiring to coordinate the conference may submit a 

formal request to the Chairman of the Steering Committee, no later than 18 months 

prior to the date they wish to serve as host. 

Article VIII - Committee 

A. Standing committees may be named from time to time, as required, by vote of 

the Steering Committee. The chairman of each standing committee shall be 

appointed by the Chairman of the Steering Committee. Members of standing 

committees shall be appointed by the Chairman of the Steering Committee in 

consultation with the Chairman of the committee in question. Chairmen and 

570

committee members shall serve in their appointed capacities at the discretion of the 

Chairman of the Steering Committee. The Chairman of the Steering Committee 

shall be ex officio member of all standing committees. 

B. The President, with the counsel and approval of the. Steering Committee, may 

appoint such ad hoc committees asare needed from time to time. An ad hoc 

committee shall serve until its assigned task is completed or for the length of time 

specified by the President in consultation with the Steering Committee. 

C. All standing committees shall clear their general plans of action and new 

policies through the Steering Committee, and no committee or committee chairman 

shall enter into relationships or activities with persons or groups outside of the 

Association that extend beyond the approved general plan of work without the 

specific authorization of the Steering Committee. 

D. In the interest of continuity, if any officer or member has any duty elected or 

appointed placed on him, and is unable to perform the designated duty, he should 

decline and notify at once the officers of the Association that he cannot accept or 

continue said duty. 

Article 1X - Amendments 

A. Amendments of these By-Laws may be made at any annual conference of the 

Association. 

B. Amendments of the By-Laws may be made by majority vote of the assembled 

membership of the Association provided that the proposed amendments shall have 

been approved by a majority vote of the Steering Committee. 

C. Proposed amendments not approved by a majority vote of the Steering 

Committee shall require a two-thirds vote of the assembled membership of the 

Association. 

Article X - Voting 

All members in attendance shall be voting members. 

A. Selection Procedures: 

Article XI - Harry H. Greer Award 

1. Recipients of the Harry H. Greer Award will be selected by a committee 

drawn from the agencies represented on the MTA Steering Committee. The CO of 

each agency will designate one person from that agency to serve on the Awards 

Committee. Each committee member will have attended at least three previous 

MTA meetings. The member from the coordinating agency will serve as chairman of 

the committee. 

2. Nominations for the award in a given year will be submitted in writing tc\ 

the Awards Committee Chairman by 1 July of that year. 

571

3. The Chairman of the committee is responsible for canvassing the other 

committee members to arrive at consensus on the selection of a recipient of the 

award. 

4. No more than one person is to receive the award each year, but the award 

need not be made each year. The Awards Committee may decide not to select a 

recipient in any given year. 

5. The annual selection of the person to receive the award, or the decision not 

to make an award that year, is to be made at least six weeks prior to the date of the 

annual MTA Conference. 

B. Selection Criteria: 

The recipients of the Harry H. Greer Award are to be selected on the basis of 

outstanding work contributing significantly to the MTA. 

C. The Award: 

The Harry H. Greer Award is to be a certificate normally presented to the 

recipient during the Annual MTA Conference. The awards committee is responsible 

for preparing the text of the certificate. The coordinating agency is responsible for 

printing and awarding the certificate. 

Article XII - Enactment 

These By-Laws shall be in force immediately upon acceptance by a majority of the 

assembled membership of the Association andor amended (in force 5 November 

1990). 

572

CDR Mary Adams 


Management Support Activity (Code 03) 


USA 

A/V 922-1685 COM: (904) 452-1685 

Walter G. Albert 

Air Force Human Resources Laboratory/MOD 


USA 

COM: (512) 240-3677 

; 

Dr. Cathie E. Alderks 




USA 

A/V 284-8293 COM: (703) 274-8293 

LTCOL Drs Pleter S. Andriesse 

Ministry of Defence 

Directorate of RNLAF/Personnel 

P.O. Box 20703 

2500 ES the Dhague 


Jane M. Arabian, Ph.D. 

Commander, U.S. Army Research Institute 

Attn: PERI-RR) 


USA 

A/V 284-8275 COM: (703) 274-8275 

Klaus Arndt 

German Federal Armed Forces Admin Office 

Bonner Talweg 177 

D-5300 Bonn 


PH: (Germany) 228-122076 

MAJ Robert L. Ashworth, Jr, 


Boise Element 

1910 University Drive 

Boise, ID 83725-1140 

USA 

COM: (208) 334-9390 

LIST OF 

CONFERENCE REGISTRANTS 

573 

Annette G. Baisden 

Naval Aerospace Medical Inst. (Code 4 

Naval Air Station 

Pensacola, FL 32508-5600' 

USA 

A/V 922-2516 COM: (904) 452-2516 

Louis E. Banderet, Ph.D. 

U.S. Army Institute of Environmental 

Medicine 

Health and Performance Division 

Natick, MA 01760-5007 

USA 

A/V 256-4858 COM: (508) 651-4858 

Dr. David W. Bessemer 


Field Unit - Ft. Knox 

Ft. Knox, KY 40121-5620 

USA 

A/V 464-4932 COM: (5@2) 624-4932 

LTCOL John Birkbeck 

MOD A Ed 4 

Court Road 

Eltham 

London SE9 5HR 

United Kingdom 

Dr, Walter C. Borman 

Department of Psychology 

University of South Florida 

Tampa, FL 33620-8200 

USA 

Michael J. Bosshardt 

Personnel Decisions Research Institute 

43 Main St., S.E. 

Suite #SO5 

Minneapolis, MN 55414 

USA 

COM: (612) 331-3680 

CAPT J. Peter Bradley 


Personnel Applied Research Unit 


Willowdale, Ontario, M2N 697 

Canada 

A/V 827-4239 CCM: i416) 224-4972 

12)

Dr. Elizabeth J. Brady 


Attn: PERI-RS, 5001 Eisenhower Avenue 


USA 

A/V 284-0215 COM: (703) 274-8275 

David E. Brown, Jr. 

Metrica, Inc. 

8301 Broadway, Suite 215 

San Antonio, TX 78209 

USA 

LTCOL David E. Brown, Sr. 

AFHRLIMOM 


USA 

A/V 240-3942 COM: (512) 536-3942 

Lawrence S. Buck 

Planning Research Corporation 

1440 Air Rail Avenue 

Virginia Beach, VA 23455 

USA 

COM: (804) 460-2276 

Dr, Lloyd D. Burtch 

Air Force Human Resources LaboratorylPR 


USA 

A/V 240-3011 COM: (512) 536-3611 

Dr. Henry H. Busciglio 


Attn: PERI-RS, 5001 Eisenhower Avenue 


USA 

A/V 284-8275 COM: (703) 274-8275 

Charlotte H. Campbell 


295 W. Lincoln Trail Boulevard 

Radcliff, KY 40160 

USA 

J. R. Dick Campbell 

Air Traffic Services Transport Canada 

1574 Champneuf Dr. 

Orleans, Ontario KlC 6B5 

Canada 

COM: W(613) 998-6617 H(613) 837-0440 

574 

Roy C. Campbell 


295 W. Lincoln Trail Boulevard 

Radcliff, KY 40160-2042 

USA 

CAPT William J. Carle 

6435 Crestway Dr., #174 


USA 

A/V 487-3694 COM: (512) 652-3694 

Norman A. Champagne 




USA 

A/V 922-1355 COM: (904) 452-1355 

Dr. Herbert J. (Jim! Clark 

3410 Prince George 


USA 

A/V 240-3169 COM: (512) 536-3611 

Harry A. Clark III 

8265 Campobello 


USA 

A/V 487-5234 COM: (512) 652-5234 

Dennis D. Collins 

HQDA, DAPE-MR, Rm. 2C733 

The Pentagon 


USA 

A/V 225-9213 COM: (202) 695-9213 

Dr. Harry B. Conner 

Navy Research and Development Center 

Code 142 


USA 

A/V 553-6675 COM: (619) 553-6675 

MAJ Anthony J. Cotton 

1 Psych Research Unit 

P.O. Box E33 

Queen Victoria Tee 

Barton Act 2600 

Australia

Jack R. Dempsey 


1100 South Washington Street 

Alexandria, VA 22314 

USA 

COM: (703) 549-3611 

Dr. Grover E. Diehl 

Eval. & Research Branch 

USAF Extension Course Inst. 

Gunter AFB, AL 36118-5643 

USA 

A/V 446-3641 COM: (205) 279-3641 

CAPT Joseph M. Donnelly 

46 Walcheren Loop 

Borden, Ontario LOM ICO 

Canada 

A/V 270-3917 COM: (705) 423-3917 

David A. DuBois 


43 Main St., S.E, 

Suite #405, Riverplace 


USA 

COM: (612) 331-3680 

MAJ R. Eric Duncan 

10109 Trapper's Ridge 

Converse, TX 78109 

USA 

Dale R. Eckard 




USA 

A/V 922-1792 COM: (904) 452-1792 

Jack E, Edwards 

Navy Personnel R&D Center (Code 121) 


USA 

A/V 553-7630 COM: (619) 553-7630 

. - - ~--. --- ---- 

Dr. Timothy W. Elis 

U.S. Army Research-Institute (PERI-RG) 



USA 

A/V 354-5786 COM: (703) 274-5610 

MAJ Philip J. Exner 

Manpower Analysis, Eva1 & Coordination 

Headquarters, U.S. Marine Corps 


USA 

A/V 224-4165 COM: (703) 614-4165 

Frank Fehler 

Flugplatz 

3062 Bueckeburg 


PH: (Germany) 05722-4001, Ext 307 

Dr. Daniel B.. Felker 

3333 K Street, NW 


USA 

COM: (202) 342-5000 

Dr. Fred E. Fiedler 

Department of Psychology 

University of Washington 

Seattle, WA 98195 

USA 

A/V 88-473-2032 COM: (512)671-2032 

Dorothy L. Finley 


Bldg. 41203 Attn: PERI-IG (Finley) 

Ft. Gordon, GA 30905-5230 

USA 

A/V 780-5523 COM: (404) 791-5523 

CAPT David C. Fischer 

HQ AFOTEC/OAH2 

Kirtland AFB, NM 87117-7001 

USA 

A/V 244-4201 COM: (505) 846-4201 

Dr. Jqhn C. Eggenberger Dr. Max H. Flach 

Director, Pers Applied Research & Tng Bundesminlster der Verteidigung 

SNC Defence Products Ltd. -FuSI8- 

Heritage Place, 155 Queens Street, #132 Postfach 1328 

Ottawa, Ontario KlP 6Ll D- 5300 Bonn 1 

Canada Federal Republic of Germany 

COM: (613) 238-7216 

575




Ottawa, Ontario KIA OK2 

Canada 

A/V 642-3507 COM: (613) 992-3507 

Mr. John W,K. Fug111 

49 Dalton Rd. 

St. Ives, NSW 2075 

Australia 

(02) 4009243 

LTCOL Frank C. Gentner 

ASDIALHA 

MPT Analysis & Info System Division 

Wright-Patterson AFB, OH 45431 

USA 

Alice Gerb 

Director, Military Programs Office 

Educational Testing Service 

Rosedale Road 

Princeton, NJ 08541 

USA 

COM: (609) 921-9600 

Constance A. Glllan 

550 West Pennsylvania Avenue lt8 

San Diego, CA 92103 

USA 

A/V 735-7195 COM: (619) 545-7195 

Chrlsta A. Grier 




USA 

A/V 922-1765 COM: (904) 452-1765 

Wulf Gronwald 

D23 

Infanteriestr. 17 

8000 Munchen 40 


(089) 3069-2417 

2LT Jody A. Guthals 

Air Force Human Resources Lab (MOD) 

MPT Technology Branch 

Brooks APB, TX 78235-5601 

USA 

A/V 240-3677 COM: (512) 536-3677 

576 

Dr. Michael W. Habon 

Post Fach 1420 

Dornier GmbH, Dept. E&WI 

D-7990 Friedrichshafen 


MAJ Martin P. Hankes-Drielsma 


Ottawa, Ontario KlS 3A8 

Canada 

Dr. Dieter H.D. Hansen 

MOD (Armed Forces Staff) 

Postbox 1328 

D-5300 Bonn 1 


Telefax (0228) 12 9059 

Mary Ann Hanson 


43 Main St., SE 

Suite #405 


USA 

COM: (612) 331-3680 

CAPT Johnnie C. Harris 

USAF Occupational Measurement Center 

Attn: OMVD 


USA 

.Mary Ellen Hartmann 

Questar Data Systems, Ince 

2905 West Service Road 

Eaqan, MN 55121-2199 

USA 

CDR Robert B. Hawkins 

Chief of Naval Education and Training 

Code N31T, Naval Air Station 


USA 

Dr. Charles W. .Hesse 

Naval Aviation and Training Program 



USA 

A/V 922-1579 COM: (904) 452-1579

c _.. -. 

_ ..-___ 

Mr. G. J. (Jeff) Higgs 


101 Colonel By Drive (Attn: DMOS 3) 

Ottawa, Ontario KlA OX2 

Canada 

A/V 842-7069 COM: (613) 992-7069 

CAPT D. Wayne Hlntze 

Manpower Analysis, Eva1 & Coordination 

HO, U.S. Marine Corps (Code MA) 


USA 

A/V 224-4165 COM: (703) 614-4165 

Mr. Charles R. Hoshaw 

5920 Brookview Drive 


USA 

COM: (703) 694-5511 

Janis S. Houston 


43 Main St., S.E. 

Riverplace Suite #405 


USA 

COM: (612) 331-3680 

Dr. DeLayne R. Hudspeth 

College of Education (EDB406) 

The University of Texas 

Austin, TX 78712 

USA 

COM: (512) 471-5211 


Technical Director 

NMPC DET NODAC 

Bldg. 150 WNY (Anacostia) 


USA 

A/V 288-4620 COM: (202) 433-4620 

Barbara A. Jezior 

U.S. Army Natick R,D,E Ctr-STRNC-YB 

Kansas Street 

Natick, MA 01760-5020 

USA 

A/V 256-5523 COM: (508) 651-5523 

.f 

577 

Wayne E. Keates 

Personnel Applied Research & Training 

Division - SNC Defense Products, Ltd. 

155 Queen St,, Suite 1302 

Ottawa, Ontario KlPCLl 

Canada 

COM: (613) 238-7216 

Dr. Robert S. Kennedy 

Vice President, Essex Corporation 

1040 Woodcock Road, #227 

Orlando, FL 32803 

USA 

COM: (407) 894-5090 

CDR Robert H. Kerr 

Canadian Forces Fleet School 

FM0 Halifax 

Nova Scotia B3K 2X0 

Canada 

A/V 447-8054 COM: ;(902) 427-8054 

Rex G. Kinder 

Rexton Consulting Services, Pty Ltd. 

P.O. Box 382, Manly 

NSW 2095 

Australia 

Robert W. King 

4055 Bedevere Dr. 

Pensacola, FL 32514 

USA 

A/V 922-1663 COM: (904) 452-1663 

Thomas P. Kirchenkamp 

Dornier GmbH 

P.O. Box 1420. Dept. WTWI 

D-7990 Friedrichshafen 1 


49-7545-5775 

Dr., Paul Klein 

Sozialwissenschaftliches Inst. 

Der Bundeswehr, Winzererstr. 

52D - 8000 Munchen 40 

089 12003233 

Wolf Knacke 

Streitkrafteamt (Armed Forces Office) 

- I 7 / Militarpsychologie - 

Postfach 20 50 03 

D- 5300 Bonn - 2 

Federal Republic Of GErmanY

Dr. John L. Kobrick 


of Environmental Medicine 

Kansas Street 

Natick, MA 01760 

USA 

A/V 256-4885 COM: (508) 651-4885 

Fay J. Landrum 


Managment Support Activity (Code 3161) 


USA 

A/V 922-1736 COM: (904) 452-1736 


U.S. Coast Guard HQ (G-PWP-2), Room 4111 

2100 Second St,, S.W. 


USA 

COM: (202) 267-2986 

Dr. James M. Lentz 




USA 

A/V 922-1685 COM: (904) 452-1685 

Dr. Carl W. Lickteig 


Field Unit Ft Knox (Attn: PERI-IK) 

2423 Morande Street 

Fort Knox, Ky 40121-5620 

USA 

A/V 464-7046 COM: (502) 624-7046 

COL Michael Lindquist 

Military Education Divlson J-7 

Pentagon, Room lA724 


USA 

Dr. Suzanne Lipscomb 

AFHRLIPRP 


USA 

Richard M. Lopez 




USA 

A/V 922-1357 COM: (904) 452-1357 

578 

Donald F. Lupone 




USA 

A/V 922-1777 COM: (904) 452-1777 

Dr. Fred A. Mae1 




USA 

A/V 284-8275 COM: (202) 274-8275 

Dr. Rolland R. Mallette, Major (Retj 

Industrial Psychologist 

Ontario Hydro-700 University Ave (H3-G27 

Toronto, Ontario M5G 1X6 

Canada 

COM: (416) 592-7038 

LTC Ken A. Martell 

6308 Falling Brook Drive 

Burke, VA 22015 

USA 

A/V 225-456012225 COM: 202-695-4560 

Ms. Nora E. Matos 




USA 

A/V 922-1388 COM: (904) 452-1388 

Dr. James R. McBride 

6430 Elmhurst Dr. 


USA 

COM: (619) 582-0200 

Dean C. McCallum 




USA 

A/V 922-1648 COM: (904) 452-1648 

Donald E. McCauley, *Jr. 

Office of Rsch & Development 

Room 6451 

Office of Personnel Management 


USA 

COM: (202) 606-0880

Deborah L. McCormick 

Chief,of Naval Technical Training 

Attn: N6211 

NAS Memphis 

Millington, TN 38054-5056 

USA 

A/V 966-5865 COM: (901) 873-5865 

Harold M. McCurry 

1919 Baldwin Brook Dr. 

Montgomery, AL 36116 

USA 

COM: (205) 279-5382 

Edward McFadden 

Atlanta Military Entrance Proce&ing St 

M.L. King Federal Annex; Ground Floor 

77 Forsyth Street SW 

Atlanta, GA 30303-3427 

USA 

Dr. Albert H. Melter 

Personalstammamt der Bundeswehr 

Koelner Str. 262 

D-5000 Koeln 90 


Central Personnel Office 

German Federal Armed Forces 

PH: (Germany) (02203) 12021 472 

MAJ Harold C. Mendes 

520 Larochelle 

Saint Jean- 

Quebec J3B lJ5 

Canada 

LT Mark R. Miller 

12474 Starcrest #210 


USA 

A/V 240-3222 COM: (512) 536-3222 

William M. Minter 

.ECI/EDC 

U.S. Air Force 

Gunter AFB, AL 36118 

USA. 

A/V 446-4151 COM: (205) 279-4151 

579 

Dr. Angelo Mirabella 




USA 

A/V 284-8827 COM: (202) 274-8827 

Dr. Jimmy L. Mitchell 

McDonnel Douglas Missile Systems Co, 

8301 North Broadway, Suite 211 


USA 

COM: (512) 826-8664 

William E. Montague 

Training Technology 

Navy Personnel R&D Center (Code 15A) 


USA 

A/V 553-7849 COM: (619) 553-7849 

LCDR Tom Morrison 

Naval Aerospace Medical Institute 

Code 412 



USA 

A/V 922-2615 COM: (904) 452-2615 

Dr. C. Jill Mullins 

Chief of Naval Education 6 Training 

N-11, Bldg. 628 



USA 

A/V 922-4207 CC’M: (904) 452-4207 

James Gerald Murphy 


Management Support Activity (Code 03171 


USA 

A/V 922-1414 COM: (904) 452-1414 



Navy Occupational Dev. d Analysis Center 

Bldg. 150, WNY, (Ai?,acQstia) 

Washington, DC 20374 -1501 

USA 

A/V 288-5488 COM: (202) 433-5488

Joe H. Neidig 




USA 

A/V 922-1729 COM: (904) 452-1729 

Ms. Mary L. Norwood 

U.C. Coast Guard HQ (G-PWP-21, Room 4111 

2100 Second St., SW 


USA 

Dr. Lawrence H. O'Brien 

Dynamics Research Corporation 

60 Concord Street 

Wilmington, MA 02174 

USA 

COM: (508) 658-6100 


US Office of Personnel Management 

Room 6451 

1900 E Street, N, W. 


USA 

COM: (202) 606-0880 

Robert C. Pallme 




USA 

A/V 922-1728 COM: (904) 452-1728 

Dr. Dale R. Palmer 




USA 

A/V 284-8275 COM: (703) 284-8275 

Stephen W 

NPRDC 

Code 15 

San Diego 

USA 

A/V 553-77 

Parchman 

CA 92152-6800 

94 COM: (619) 553-7794 

580 

Randolph Park 

American Institutes for Research 

3333 K St., NW 

Washington, DC 

USA 

COM: (202) 342-5000 

Dr. John J. Pass 

927 Nautilus Isle 

Dania, FL 33004 

USA 

Robert H. Pennington 




USA 

A/V 922-1547 COM: (904) 452-1547 

Carlene M. Perry 

United States Air Forces Academy 

P.0, Box 4269 

US Air Force Academy, CO 80841 

USA 

COM: (719) 472-4551 

Dr. Mark G. Pfeiffer 

NAVTRASYSCEN Training Analysis & Evaluat 

Code 121 

12350 Research Parkway 

Orlando, FL 32826-3224 

USA 

A/V 960-4132 COM: (407) 380-4132 

William J. Phalen 

AFHRL/MOD 

Brooks AFB. TX 78235-5600 

USA 

A/V 240-3677 COM: (512) 536-3677 



U.S. Air Force Human Resources 

Laboratory (AFHRL/MOD) 


USA 

A/V 240-3648 COM: (512) 536-3648

COL Terry J. Procluk 

Director of Personnel Psychology and 

sociology 



Canada 

A/V 042-0244 COM: (613) 992-0244 

Dr. Wlebke Putz-Osterloh 

Lehrstuhl Psychologle 

Universitate Bayreuth 

Postfach 10151., D 8580 Bayreuth 


University of Bayreuth 

(0921) 55700 


Federal Ministry of Defense, P II 4 

Postfach 1328 

5300 Bonn 1 


COM: 49-228-128543 

LT Daniel T. Reeves 


Personnel Applied Research Unit 


Willowdale, Ontario, M2N 6B7 

Canada 

A/V 027-4239 COM: (416) 224-4968 

Beatrice Julie Rheinstein 

Office of Personnel Management 

1900 E Street, Room 6451 


USA 

COM: (202) 606-2694 

William M. Ritchie 




Canada 

Dr. Gwyn Robson 

Marine Corps Institute 

P. 0. Box 1775 

Arlington, VA 22222-0001 

USA 

A/V 288-4109 COM: (202) 433-4109 

581 

Dr. Gerd Rode1 

Freiwllligenannahmezentrale der Marine 

Ebkeriege 35191 

D-2940 Wilhelmshaven 

Federal Republic of Germany * 

COM: (04421) 792124 

Earl F. Roe 



Pensacola, Fl 32509-5000 

USA 

A/V 922-1335 COM: (904) 452-1335 

CIC Diane L. Romaglia 

United States Air Force Academy 

P.O. Box 4405 

U.S. Air Force Academy, CO 80841 

USA 

A/V 259-4537 COM: (719) 472-4533 

Kendall L. Roose 

Training Department 

Training Air Wing Five 

NAS Whiting Field 

Milton, FL 32570-5100 

USA 

A/V 868-7266 COM: (904) 623-7266 


DPKL/afd GW 

Postbus 90701 

2509 LS The Haque 


COM: 31-71-6135450 

Sandra A. Rudolph 


Bldg C-l, Code N632 

NAS Memphis 


USA 

A/V 966-5591 COM: (901) 873-5591 

Roberto B. Salinas 

USAFOMSQ/OMYO 

Randolph AFB, TX 78150 

A/V 487-6811 COM: (512) 652-6811

MAJ Charles A, Salter 

Natick Research, Dev. h Eng. Center 

10 East Militia Hts. 

Needham, MA 02192 

USA 

A/V 256-4901 COM: (508) 651-4901 




Testing Systems Department (Code 13) 


USA 

A/V 553-9266 COM: (619) 553-9266 

Jerry Scarpate 

DEOMI/DR 

Patrick AFB, FL 32925-6685 

USA 

Sibylle B. Schambach 

c/o Federal Armed Forces Admin Office 

Bonner Talweg 177 

D-5300 Bonn 


PH: (Germany) 228-122099 

Dr. Amy C. Schwartz 




USA 

A/V 284-0275 COM: (703) 274-0275 

LCDR James W. Shafovaloff 

Commandant (G-PWP-2) 

U.S. Coast Guard Headquarters 

2100 2nd St., S.W., Room 4111 


USA 

FTS 8-267-1954 COM: (202) 267-1954 

Dr. Joyce Shettel-Neuber 

NPRDC 


USA 

A/V 553-7940 

Dr. Guy L. Siebold 

US Army Research Institute, (PERI-IL) 



USA 

A/V 204-8293 COM: (703) 274-0293 

. 

582 

Brian W. D. Slack 

Ontario-Hydro 

P.O. Box 338 

Orangeville, 

Ontario L9W 227 

Canada 

COM: (519) 941-4620 

LT Wilfried A. Slowack 

Bruynstraat 

CRS Set Psy Ond 

Bruynstraat B-1120 

Brussels 

Belgium 

PH: (Belgium) 02-2680050, Ext. 3279 

Dr. Robert M. Smith 

Naval Aviation Schools Command 

Code 12, Bldg. 633, Room 137 



USA 

A/V 922-4120 COM: (904) 452-4120 

Dr. J. Michaei Spector 

AFHRLfIDC 


USA 

A/V 240-3036 COM: !512j 536-3036 

Yvonne W. Squires 

10968 Portobelo Dr. 


USA 

A/V 553-8264 COM: (619) 553-0264 

Herb C. Stacy 


Bldg C-l, Code N5A 

NAS Memphis 


USA 

A/V 966-5984 COM: (901) 873-5984 

Michael R. Staley 

7521 126th Avenue 

Kirkland, WA 98033 

USA 

COM: (206) 869-5501

Paul P. Stanley.11 

USAFOMC/OMD 

Randolph AFB, TX 

18150-5000 

USA 

A/V 481-5234 COM: (512) 652-5234 

Dr. Alma G. Steinberg 


Attn: PERI-IL 



USA 

A/V 284-8293 COM: (103) 214-8293 

Stanley D. Stephenson 

Dept of Computer Information 

System & Admin Science 

Southwest Texas State Univ. 

San Marcos, TX 18666-4616 

USA 

COM: (512) 245-2291 

Dr. Lawrence J. Strlcker 

Educational Testing Service 

Princeton, NJ 08541-0001 

USA 

COM: (609) 134-5551 

J, S. Tartell 

USAF OMSQ/OMY 


USA 

DSN 481-6623 COM: (512) 652-6623 

John W. Thaln 

583 Cypress 

Monterey, CA 93940 

USA 

A/V 818-5164 COM: (408) 641-5764 

William J. Tharion 

Health & Performance Division 

USA Research Inst of Env Med 

Natick, MA 01160-5001 

USA 

A/V 256-4115 COM: (508) 651-4115 

Philip A. Thornton 

U.S. Coast Guard HQ (G-PWP-21, Room 4111 

2100 Second Street S.W. 


USA 

COM: (202) 261-1954 

583 

LCDR Barbara T. Transki 

Navy Occupational Dev d Analysis Center 

Bldg 150. WNY Anacostia 


USA 

A/V 288-4633 COM: (202) 433-4633 

Thomas Trent 

Testing System Dept., Code 132 

Navy Personnel R&D Center 


USA 

A/V 553-7637 COM: (619) 553-1631 

Ms. Susan Truscott 

Dir of Social & Economic Analysis 

Operational Research/Analysis Estabiish 

101 Colby Drive 


Canada 

Dr. James W. Tweeddaie 

Chief of Naval Education and Training 

NROTC Division 

NAS Pensacola 

Pensacola, FL 32508 

USA 

A/V 922-4983 COM: (904) 452-4903 

Dr. Lloyd W. Wade 

Special Programs Department 

Marine Corps Institute 

Arlington, VA 22222-0001 

USA 

A/V 288-2612 COM: (202) 415-9229 

Dr. Raymond 0. Waldkoetter 

U.S. Army Soldier Spt Center 

Attn: ATSG-DDN iBldg 401) 

Fort Harrison, IN 46216-5700 

USA 

A/V 699-3819 COM: (311) 542-3879 

Aubrey E. Walker 

U.S. Army Infantry School 

Attn: ATSH-TDT-I 

Fort Bennlng, GA 31905-5593 

USA 

Clarence L. Walker 

Rt. 1, Box 593 

Purcellville, VA 22132 

USA 

COM: (103) 669-6427

Dr. Brian K. Waters 

HumRRO 

1100 South Washington Street 


USA 

COM: (703) 706-5647 

Johnny J. Weissmuller 

Metrica, Inc. 

8301 Broadway, Suite.215 


USA 

COM: (512) 822-6600 

LTCOL Karol W,J. Wenek 

Military Leadership & Management Dept 

Royal Military College of Canada 

Kingston, Ontario KlK 5L0 

Canada 

COM: (613) 541-6304 

_-.-._ 

James D. Wiggins 




USA 

A/V 922-1323 COM: (904) 452-1323 

584 



Research Unit 


Willowdale, Ontario M2N224 

Canada 

COM: (416) 224-4964 

Dr. Lauress L. Wise 

DMDC 

99 Pacific St., #155A 

Monterey, CA 93940-2453 

USA 

COM: (408) 655-4000 

Dr. Martin F. Wiskcff 

307A Mar Vista Drive 

Montery, CA 93940 

USA 

COM: (408) 373-3073 

Darrell A. Worstine 

Commander, USAPIC 

Attn: ATNC-MO 

200 Stovall Street 


USA 

A/V 221-3250 COM: (703) 325-3250 

Timothy C. Zello 

U.S, Army Ordnance Center 

Attn: ATSL-MD 

Aberdeen Proving Ground, MD 21005-5201 

USA 

A/V 298-4115 COM: (301) 278-4115

Abrahams, N. M., 486 

Albert, W, G., 310, 316 

Alderks, C. E., 432 

Alley, F., 292 

Arabian, J. M., 226 

Arndt, K,, 104 

Ashworth, MAJ R. L., Jr., 199 

Baker, H., 292, 298, 304 

Banderet, L. E., 334, 339 

Bayes, A. H., 425 

Bennett, W. R., 116 

Bergquist, Maj T. M., 156 i 

Bessemer, D. W,, 150 

Borman, W. C., 268, 492, 498, 504 

Bosshardt, M., 504, 505, 516 

Bowler, E. C., 535 

Bradley, Capt. J. P., 262 

Brady, E. J., 322 

Brooks, J. T., 541 

Brown, G. C., 553 

Buck, L. S., 274 

Buckenmyer, D. V., 116 

Burch, R. L., 486 

Busciglio, Henry H., 380 

Campbell, C. H., 528, 541 

Campbell, R. C., 528, 529, 541 

Carle, W. J., 541 

Clark, H. J., 460 

Collins, D. D., 414 

Conner, Dr. H. B., 312 

Crafts, J. L., 535 

Crawford, K., 504, 516 

Crawford, R. L., PhD, 167 

Cymerman, A., 408 

Dart, 1Lt T. S., 156 

Dauphinee, SSG D. T., 339 

Dempsey, J. R., 25 

Dhammanungune, S., 304 

Dlehl, G. E., 128 

Dittmar, Me J., 316 

Doyle, E. L. ,529 

Dubois, D., 504, 505, 516 

Dunlap, W. P., 220 

Edwards, J. E., 31, 486 

Eggenberger, J; C., PhD, 167 

Elig, T. W., 19 

Ellis, J. A., 132 

Evans, R. M.? 191 

Exner, Maj P. J., 535 

Fayfich, P. R., 70 

Fehler, F., 180 

Felker. D. B. , 535 

- - 

INDEX OF AUTHORS 

585 

Fiedler, E., 392 

Finley, D. L., 94, 99 

Fowlkes. J. E., 220 

Goldberg, E. L., 474 

Greene, C. A., 241 

Guthals, 2Lt J. A., 76, 156 

Hand, D. K., 82, 316 

Hansen, H. D., 351 

Hanson, M. A., 268, 498 

Harris, D. A., 25 

Harris, J. C., 547 

Harris, J. H., 528 

Hawkins R. B 

Heslin 'Captain ;8lF 174 

Houston, J., 504,'52;' 

Hoyt, R., 408 

Hudspeth, Dr. D. R., 70 

Ince, V,, 241 

Jezior, B. A., 241 

Johnson, R. F., 210 

Jones, M. B., 419 

Jones, P. L., 122 

Kennedy. R. S., 220, 419 

Kittredge, R., 408 

Klein, P., 88 

Knight, J. R.. li6 

Kobrick, J. L., 210 

Koger, Major M. E., 174 

Laabs, G. J., 398 

Leaman, J. A., 455 

Lescreve, F., 216 

Lesher, L. L., 241 

Lester, L. S.. 280 

Lickteig, C. W., 174 

Lieberman, H. R.. 334 

Lindsay, T. J., 438 

Luisi, T. A., 280 

Luther, S. M., 280 

Mael, F. A., 286 

Marlowe, B. E., 408 

Martell, LTC K. A., 6 

Mayberry. P. W., 535 

McCauley, Jr., D. E., 51, 58, 64 

McCormick, D. L., 122 

McGee, S. D., 404 

McMenemy. D. J.. 210 

Melter, A. H., 35? 

Menchaca, Capt J.* .Jr., 76 

Mentges, W. 357 

Mirabella, A. .I62 

Mitchell,

Muraida, D. J. 185 

O’Brien, L. H. 251 

O’Leary, B. S. 51, 58, 64 

O’Mara, M. 339 

Olivier, L. 76 

Owens-Kurtz, C, K. 492 

Palmer, D. R. 328 

Parchman, S. W. 132 

Paullin, C. 498 

Perez, CPT P. J. 334 

Perry, C. M. 235 

Pfeiffer, G. 76 

Pfeiffer, M. G. 191 

Phalen, W. 3. 82, 310, 316 

Phelps, Dr. R. H. 199 

Pimental, N. A. 339 

Popper, R. 241 

Price, J. S., Squadron Leader 70 

Putz-Osterloh, W. 362 

Quenette, M. A. 398 

Reeves, Lt(N) D. T. 12 

Rheinstein, J. 51, 58, 64 

Riley, SGT R. H. 339 

Rodel, G. 368 

Romaglla, CIC D. L. 345 

Roozendaal, Col. G. J. C. 466 

Rosenfeld, P. 31 

Rudolph, S. A. 204 

Rumsey, M. G. 322 

Rushano, T. M. 386 

Russell, T. L. 492 

Salter, MAJ C. A. 280 

Sands, M. 298 

Sands, W. A. 245 

Schambach, S. B. 110 

Schwartz, A, C. 226, 256 

Sheposh, J. P. 474 

Sherman, F. 504, 522 

Shettel-Neuber, J. 474 

INDEX OF AUTHORS 

586 

Shukitt-Hale, B. L. 334 

Siebold, G. L. 438, 444 

Silva, J. M. 256 

Simpson, LTC R. L. 314 

Skinner, J. 345 

Slowack. W. 216 

Spector, .J. M. 185 

Spier, M. 304 

Spokane, A. 298 

Stanley II, P. P. 235, 547 

Steinberg, A. G. 455 

Stephenson, J. A. 138 

Stephenson, S. D. 136. 144 

Swirski, L. 292..:jOs 

Tartell, J. S. 547 

Thain, J. W. 231 

Tharion, W. J. 408 

Thomas, P. J. 31 

Toyota, SGT R. M. 339 

Trent, T. 398 

Truscott, 5. ! 

Turnage, J. 1. 229, 4 1 3 

Tweeddale, J. W. 480 

Vandivier, P. L. 453 

Van Hemel, S. 252 

Vaughan, D. S. 116 

Waldkoetter, R. 0. 450 

Walker, C. L. 37 

Waters, B. K. 25 

White, L. A. 328 

White, W. R., Sr. 450 

Williams, J. E. 235 

Winn, LTC D. H. 6 

Wiskoff, M. 594, fC5, 51i, 522 

witt, SSG c. E . .!33 

York, W. .J., Jr, 94, 39 

Young, M. C. 328 

Zimmerman, R. A. 504, 5il 

_--._--.- __.__ -I_ -___ -__--.--. -1

I__. - International Military Testing Association

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?